627635 - (localStorageIO) Make remote localStorage not block the UI main thread on disk IO

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Description

•

14 years ago

It came out in bug 622112 that our localStorage implementation blocks the content-process main thread on the chrome-process main thread, which is not great but OK. But, the chrome-process main thread then appears to be blocked on the localStorage request, which involves grabbing the SQLite mutex, which means blocking on disk. That's not palatable for fennec.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 1

•

14 years ago

Allowing web content to block fennec's UI thread on disk IO is ... not palatable. I think we either need to not do this or disable localStorage for 4.0. We've never shipped a fennec with it enabled, and it's being phased out by IndexedDB, so I wouldn't lose too much sleep over punting it to a 4.x or later. I don't know what's involved in avoiding UI-thread disk IO.

tracking-fennec: --- → ?

Summary: Make remote localStorage not block the UI main thread on disk → Make remote localStorage not block the UI main thread on disk IO

Josh Matthews [:jdm]

Comment 2

•

14 years ago

Honza, the following stack from bug 622112 is just the flushing timer running. It looks like a prime candidate to move to a background thread, but I don't know anything about whether storage connections can be used off the main thread. #0 mozilla::storage::SQLiteMutex::lock (this=0x7fffe08c6928) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:128 #1 0x00007ffff4a40188 in SQLiteMutexAutoLock (this=0x7fffffffd290, aMutex=...) at /home/cjones/mozilla/mozilla-central/storage/src/SQLiteMutex.h:183 #2 0x00007ffff4a3eaec in mozilla::storage::Connection::BeginTransactionAs (this=0x7fffe08c68f0, aTransactionType=0) at /home/cjones/mozilla/mozilla-central/storage/src/mozStorageConnection.cpp:930 #3 0x00007ffff37db6ee in mozStorageTransaction (this=0x7fffffffd300, aConnection=0x7fffe08c68f0, aCommitOnComplete=0, aType=0) at ../../dist/include/mozStorageHelper.h:77 #4 0x00007ffff417fa46 in nsDOMStoragePersistentDB::FlushTemporaryTables (this=0x7fffe08c4810, force=false) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStoragePersistentDB.cpp:535 #5 0x00007ffff4179a6b in nsDOMStorageDBWrapper::FlushAndDeleteTemporaryTables (this=0x7fffe08c4810, force=false) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorageDBWrapper.cpp:104 #6 0x00007ffff416dcc9 in nsDOMStorageManager::Observe (this=0x7fffe02cac00, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", aData=0x0) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:486 #7 0x00007ffff4f0f771 in nsObserverList::NotifyObservers (this=0x9db7c8, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", someData=0x0) at /home/cjones/mozilla/mozilla-central/xpcom/ds/nsObserverList.cpp:130 #8 0x00007ffff4f1140e in nsObserverService::NotifyObservers (this=0x7fffe00f5160, aSubject=0x0, aTopic=0x7ffff58f2205 "domstorage-flush-timer", someData=0x0) at /home/cjones/mozilla/mozilla-central/xpcom/ds/nsObserverService.cpp:182 #9 0x00007ffff416d837 in nsDOMStorageManager::Observe (this=0x7fffe02cac00, aSubject=0x7fffe0c1de70, aTopic=0x7ffff5b5393d "timer-callback", aData=0x0) at /home/cjones/mozilla/mozilla-central/dom/src/storage/nsDOMStorage.cpp:449

Honza Bambas (:mayhemer)

Comment 3

•

14 years ago

Josh, I commented on bug 622112 comment 30, but those suggestion are just a partial fix. AFAIK, sqlite is not very thread safe, or at least wasn't in times I last witnessed attempts to use it on multiple threads. I would more tend to use async API for this, that would accomplish the premiss of not having IO on the main thread. But first I'd like to know why it takes so long time to acquire the mutex, when there is just a single connection bound to a single database file on disk and that all accessed just on the main thread.

Shawn Wilsher :sdwilsh

Comment 4

•

14 years ago

SQLite is threadsafe (in that it serializes access to the database). The mozIStorageConnection object is also threadsafe, but I'd prefer you to use an asynchronous statement to do background thread work (this lets Storage manage the thread for you). I don't have any comments as to why it takes so long to get the mutex.

Doug Turner (:dougt)

Comment 5

•

14 years ago

aza, can you take a look?

Assignee: nobody → azakai

tracking-fennec: ? → 2.0+

Doug Turner (:dougt)

Comment 6

•

14 years ago

^azakai, of course.

Honza Bambas (:mayhemer)

Comment 7

•

14 years ago

Alon, also check bug 622112 comment 30, though those suggestion are only a partial solution.

Alon Zakai (:azakai)

Comment 8

•

14 years ago

Some questions: 1. How common is localStorage on the web? 2. How big of a delay is caused? (hard for me to tell from bug 622112 whether localStorage is the main issue or not) 3. Can we estimate how much of a speedup the fixes in bug 622112 comment 30 would get us?

Honza Bambas (:mayhemer)

Comment 9

•

14 years ago

(In reply to comment #8) > 3. Can we estimate how much of a speedup the fixes in bug 622112 comment 30 > would get us? Avoiding call to GetTransactionInProgress in MaybeCommitInsertTransaction will save acquiring the lock for every GetItem not preceded by SetItem. The second optimization should actually be: don't open a transaction until a scope whom data has been modified has been found. That will save acquiring when a page only reads localStorage. In general, I don't think we can do a lot here... localStorage API is intended to be fully synchronous so to satisfy all request for Get and Set we have to block. I just don't know sqlite well to say why there is one global lock for all IO ops that, when accessing one file, prevents accessing other completely separated files. It just doesn't make sense to me.

Doug Turner (:dougt)

Comment 10

•

14 years ago

can we do the sets+gets to an in-memory hash, then flush to sql when the tab closes?

Shawn Wilsher :sdwilsh

Comment 11

•

14 years ago

(In reply to comment #10) > can we do the sets+gets to an in-memory hash, then flush to sql when the tab > closes? I think that's an awfully big change this late in the 2.0 release cycle. I sure wouldn't like to see that go in now.

Doug Turner (:dougt)

Comment 12

•

14 years ago

cjones how bad is this? is it worth just pulling the plug on localstore like we did with indexedDB?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 13

•

14 years ago

What I don't know is, how many sites rely on localstorage. We know some do, like tbpl. google.com uses it, but I doubt it's broken without it. Comment 8 asks this too. What we're trading for breaking those sites is potentially huge UI latency, O(second) judging by sync-disk-IO problems we've had elsewhere. This will happen if the disk is busy and/or we're on a file system like the Galaxy S***'s. The possibility of this happening when loading a site as common and innocuous as google.com concerns me. This UI latency is also the under the control of web content. The more web content uses localstorage, the more lags we get. That makes fennec look bad. I don't know what we gain in terms of site compat by leaving localstorage enabled. That's the key bit of data we'd need to make a good decision. Short of that, ... I guess we'd have to browse a bunch of sites known to use localstorage and see how bad things are in the common case, and if the experience is acceptable, hope we don't hit bad cases too much in the wild.

Honza Bambas (:mayhemer)

Comment 14

•

14 years ago

(In reply to comment #10) > can we do the sets+gets to an in-memory hash, then flush to sql when the tab > closes? Please don't. There is already a hash table used to store values, mirroring data, but the final goal is to get rid of it and make all processes over the data be fully under the sqlite control. We need more then just a simple hash table to have all capabilities of localStorage (DOM storage in general) working. The fix should figure out why we block and try to avoid the block as much as possible. I can do some research with optimizations I suggested earlier.

Alon Zakai (:azakai)

Comment 15

•

14 years ago

I tried to find websites that use localStorage. The following do: google.com does (but not mobile) espn (but not mobile) cnn (but not mobile) orbitz lonelyplanet sfchronicle but it looks like the majority of popular websites don't use it. And even when they do, their mobile versions don't. As for how important localStorage actually is to those websites, it's hard to tell. I don't notice any immediate difference when using the websites mentioned above without localStorage - they don't obviously break - but possibly some minor functionality (that is noticed later on) is lost. Given that, my recommendation is to disable localStorage for Fennec for 4.0. Definitely not worth a perf hit, and probably not worth the risk of any serious work (unless there is something simple that can completely fix this).

Josh Matthews [:jdm]

Comment 16

•

14 years ago

Any blog that uses disqus for its commenting engine uses localstorage, but I also don't know how it's used therein.

Honza Bambas (:mayhemer)

Comment 17

•

14 years ago

Attached patch experimental1 (obsolete) — Details — Splinter Review

This is an experimental patch that should lower the chance of hitting the lock. If anyone can test with it let me know. I can also provide a try build with this patch if anyone wants.

Honza Bambas (:mayhemer)

Comment 18

•

14 years ago

Attached patch experimental1.1 — Details — Splinter Review

Slightly simplified and a bit fixed from the previous one.

Attachment #507715 - Attachment is obsolete: true

Alon Zakai (:azakai)

Comment 19

•

14 years ago

I'm not sure how to test this. I can create a page with localStorage and a page without, but I don't notice any difference in performance. I tried the websites mentioned in the parent bug as well, but 2 out of the 3 websites mentioned (bugzilla and facebook) don't even use localStorage. Are we sure that localStorage is actually a significant part of the ~1 second hang mentioned in the parent bug?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 20

•

14 years ago

(In reply to comment #19) > I'm not sure how to test this. I can create a page with localStorage and a page > without, but I don't notice any difference in performance. On device? What access pattern does your page with localStorage use? > Are we sure that localStorage is actually a significant part of the ~1 second > hang mentioned in the parent bug? Not the least bit, sorry if I implied otherwise. It's synchronous, UI-thread disk IO though, and that's guaranteed to bite us. The question is will it bite us infrequently enough to leave it enabled for 4.0.

Shawn Wilsher :sdwilsh

Updated

•

14 years ago

Keywords: main-thread-io

Alon Zakai (:azakai)

Comment 21

•

14 years ago

(In reply to comment #20) > (In reply to comment #19) > > I'm not sure how to test this. I can create a page with localStorage and a page > > without, but I don't notice any difference in performance. > > On device? What access pattern does your page with localStorage use? Galaxy S and Linux desktop. My test page does a lot of writes and reads to localStorage. > > > Are we sure that localStorage is actually a significant part of the ~1 second > > hang mentioned in the parent bug? > > Not the least bit, sorry if I implied otherwise. It's synchronous, UI-thread > disk IO though, and that's guaranteed to bite us. The question is will it bite > us infrequently enough to leave it enabled for 4.0. Yes, good question.

Alon Zakai (:azakai)

Comment 22

•

14 years ago

IPC profiling data from an android device can be seen in bug 629513 comment 6 / bug 629513 comment 8. Web storage does indeed lead to a significant slowdown on google.com (nonmobile).

Honza Bambas (:mayhemer)

Comment 23

•

14 years ago

Alon, I'm curious why SetValue and RemoveValue have so short latency compared to GetValue. Could it be, that google first calls to GetValue that leads to initial DB load for the whole domain to a temp table and only this first call is therefor that slow? Or is that just called much more often then Set and RemoveValue? If the former, we could pre-cache the DB on the chrome process somehow, or search for an opt in a more narrowed space.

Alon Zakai (:azakai)

Comment 24

•

14 years ago

(In reply to comment #23) > Alon, I'm curious why SetValue and RemoveValue have so short latency compared > to GetValue. Could it be, that google first calls to GetValue that leads to > initial DB load for the whole domain to a temp table and only this first call > is therefor that slow? Or is that just called much more often then Set and > RemoveValue? > The last column shows the amount of times it is called. So GetValue was called 11 times, SetValue 2 times and RemoveValue 3 times. However, even accounting for that, GetValue takes much more time - so your first explanation seems most likely. However, it is also possible there was some noise in that run, so we should get more data. I am working on a better profiling tool now.

Alon Zakai (:azakai)

Comment 25

•

14 years ago

Attached file profiling data — Details

Better profiling data, of google.com load (nonmobile). Again GetValue is quite high, ~0.5 seconds.

Honza Bambas (:mayhemer)

Comment 26

•

14 years ago

And what is the distribution of duration among each call?

Honza Bambas (:mayhemer)

Comment 27

•

14 years ago

Unfortunatelly I don't have a test env for this. Android port for my Freerunner needs some fixes to be even able to connect wifi. I don't know when I get to make it work, so please give me the most information you can to move this forward.

Brad Lassey [:blassey] (use needinfo?)

Comment 28

•

14 years ago

Honza, we're in the end game for Fennec. What's the plan for this? Also, what do you need to move forward?

Whiteboard: [at-risk]

Shawn Wilsher :sdwilsh

Comment 29

•

14 years ago

I don't think it's wise to take this in 2.0, FWIW. This is going to be a big enough change that I'd think it would need beta coverage, but we just tagged our last beta.

Brad Lassey [:blassey] (use needinfo?)

Comment 30

•

14 years ago

based on Shawn's comment, moving this to 2.0next

tracking-fennec: 2.0+ → 2.0next+

Shawn Wilsher :sdwilsh

Comment 31

•

14 years ago

(In reply to comment #30) > based on Shawn's comment, moving this to 2.0next Although, mobile may want to disable localStorage like in the past.

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

14 years ago

Blocks: 622112

Alon Zakai (:azakai)

Comment 32

•

14 years ago

(In reply to comment #26) > And what is the distribution of duration among each call? I am seeing less of a problem now, not sure why. But here is some data from loading google.com (nonmobile): I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00000 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.18600 seconds, mainThread == 1 I/Gecko ( 9716): |IPC 'RecvGetValue' took 0.00100 seconds, mainThread == 1 So just one of those ended up significantly slow.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 33

•

14 years ago

FWIW, loading my gmail account on my galaxy s*** is very painful, many several-second UI pauses. Didn't profile, but localStorage would be my first guess at why UI blockage. The galaxy s*** is kind of a worst-case though. It would be interesting for other folks to try gmail on their less s***ty phones.

Doug Turner (:dougt)

Updated

•

14 years ago

tracking-fennec: 2.0next+ → 2.0+

Doug Turner (:dougt)

Comment 34

•

14 years ago

after discussing with blassey and stuart, this is something that is serious for many of the popular devices we are targeting.

Doug Turner (:dougt)

Comment 35

•

14 years ago

oh hell. 2.0next. wrong bug.

tracking-fennec: 2.0+ → 2.0next+

Alon Zakai (:azakai)

Comment 36

•

14 years ago

(In reply to comment #33) > FWIW, loading my gmail account on my galaxy s*** is very painful, many > several-second UI pauses. Didn't profile, but localStorage would be my first > guess at why UI blockage. cjones, I am trying gmail on galaxy s to try to figure out this issue. I don't think I see the same degree of pain as you do so I may be doing something different. I tried both mobile and standard HTML views, and both accounts with no emails in the inbox and with 20-30 emails. The only interesting thing I see in the profiles I've generated is that we spend a lot of time in nsRefreshDriver, especially when there are lots of emails in the account (1.3130 seconds over 4 page loads). But even that doesn't seem so bad, no several-second UI pauses in particular. Are you still seeing those?

Alon Zakai (:azakai)

Comment 37

•

14 years ago

(I am hoping fixing bug 634666 got rid of the UI pauses.)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 38

•

14 years ago

Should note that I saw the pain on 2.1. Haven't tried since I upgraded to 2.2; other stuff seems less s***ty with 2.2 so I wouldn't be surprised if gmail is too.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

tracking-fennec: 2.0next+ → 6+

Alon Zakai (:azakai)

Comment 39

•

14 years ago

Are we certain this is still an issue? Or should we collect new data?

Assignee: azakai → nobody

Wesley Johnston (:wesj)

Comment 40

•

14 years ago

Ran the profiling code on this and saw: IPC|RecvGetValue|main 0.0280 total seconds, 0.0031 mean seconds, 9 counted when loading google.com (desktop site) in Fennec on a Droid X.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 41

•

14 years ago

(In reply to comment #40) > Ran the profiling code on this and saw: > > IPC|RecvGetValue|main 0.0280 total seconds, 0.0031 mean seconds, 9 counted > > when loading google.com (desktop site) in Fennec on a Droid X. Seems to not be a problem, if I reading that right.

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

14 years ago

tracking-fennec: 6+ → -

Dietrich Ayala (:dietrich)

Updated

•

13 years ago

Blocks: 698500

Dietrich Ayala (:dietrich)

Comment 42

•

13 years ago

I'm seeing animation jank on desktop Firefox that correlates with writes to webappsstore.sqlite. If confirmed, this is a P1 for Firefox. Honza, do you have time to pick this back up again?

Dietrich Ayala (:dietrich)

Updated

•

13 years ago

OS: Linux → All

Hardware: x86_64 → All

Honza Bambas (:mayhemer)

Comment 43

•

13 years ago

(In reply to Dietrich Ayala (:dietrich) from comment #42) > Honza, do you have time to pick this > back up again? First I'd like to investigate if we could move all this stuff to LevelDB and see how that would work (bug 687566). If that would have take too long to finish, I would try to get to this soon, but not before Aurora merge.

Dietrich Ayala (:dietrich)

Updated

•

13 years ago

Blocks: 682602

Dietrich Ayala (:dietrich)

Updated

•

13 years ago

Alias: localStorageIO

Depends on: 687566

Whiteboard: [at-risk] → [at-risk][see comment #43 for plan]

Dietrich Ayala (:dietrich)

Comment 44

•

13 years ago

Thanks Honza. How would moving to LevelDB specifically reduce the main thread IO caused by localStorage? Are all writes done off-thread? Or does it mitigate the problem by allowing us to write less and less often?

Marco Bonardo [:mak]

Updated

•

13 years ago

Blocks: StorageJank

Honza Bambas (:mayhemer)

Comment 45

•

13 years ago

(In reply to Dietrich Ayala (:dietrich) from comment #44) > Thanks Honza. > > How would moving to LevelDB specifically reduce the main thread IO caused by > localStorage? Are all writes done off-thread? Or does it mitigate the > problem by allowing us to write less and less often? I'm adding Matthew who's working on LevelDB integration to answer at least the off-main-thread question. I personally don't remember LevelDB would use background threads, but can't say for sure for my self. According the tests it seems that LevelDB should simply be much faster. I want to investigate first, but unfortunately we still don't have even a prototype of LevelDB implementation in the mozilla tree to start playing with it. If that would take even longer time, I would have to look for conventional solutions that would go against my future plans, which was reworking the code to be both much simpler and faster.

Shawn Wilsher :sdwilsh

Comment 46

•

13 years ago

(In reply to Dietrich Ayala (:dietrich) from comment #44) > How would moving to LevelDB specifically reduce the main thread IO caused by > localStorage? Are all writes done off-thread? Or does it mitigate the > problem by allowing us to write less and less often? Writing less doesn't really mitigate it; it just makes it harder to notice until you have a really bad case. (In reply to Honza Bambas (:mayhemer) from comment #45) > According the tests it seems that LevelDB should simply be much faster. I > want to investigate first, but unfortunately we still don't have even a > prototype of LevelDB implementation in the mozilla tree to start playing > with it. LevelDB should be faster because it's designed to be a simple key-value store, which is exactly what `localStorage` is. SQLite is a bit more complex than we need.

Matthew McPherrin

Comment 47

•

13 years ago

(In reply to Honza Bambas (:mayhemer) from comment #45) > (In reply to Dietrich Ayala (:dietrich) from comment #44) > > How would moving to LevelDB specifically reduce the main thread IO caused by > > localStorage? Are all writes done off-thread? Or does it mitigate the > > problem by allowing us to write less and less often? > > I'm adding Matthew who's working on LevelDB integration to answer at least > the off-main-thread question. I personally don't remember LevelDB would use > background threads, but can't say for sure for my self. LevelDB does use background threads to improve performance, but writes can still block when buffers fill up and the background thread needs to catch up. So bad cases can still occur.

Dietrich Ayala (:dietrich)

Updated

•

13 years ago

Blocks: 701880

Nathan Froyd [:froydnj]

Comment 49

•

13 years ago

As stopgaps, is it worthwhile to do any or all of the following: 1) Use a more relaxed PRAGMA synchronous on the localStorage DB; 2) Use SQLite's asynchronous (writes on background threads) VFS or one that we write ourselves; 3) Some other way of moving IO off the main thread that I'm not aware of because I'm new to all of this... ? At least option (2) sounds quite similar to what LevelDB already does.

Marco Bonardo [:mak]

Comment 50

•

13 years ago

(In reply to Nathan Froyd (:froydnj) from comment #49) > 1) Use a more relaxed PRAGMA synchronous on the localStorage DB; This is likely doable if we consider dataloss as a nonimportant issue for localStorage. Actually data is set by the page, I expect any page has already some sort of fallback, in case the data is not yet in the localStorage. On corruption it could just throw away the db. Cookies similarly use synchronous=OFF. That way writes would be much faster, since it'd never fsync. > 2) Use SQLite's asynchronous (writes on background threads) VFS or one that > we write ourselves; This is probably too expensive if levelDB is behind the corner. > 3) Some other way of moving IO off the main thread that I'm not aware of > because I'm new to all of this... One may use partitioned tables with a part in memory, and handle all of that with magical queries. Well, we did this in Places for Firefox 3.5 and it is a nightmare. (In reply to Dietrich Ayala (:dietrich) from comment #44) > How would moving to LevelDB specifically reduce the main thread IO caused by > localStorage? Take these results with care, since obviously the benchmarks are made to be largely in favor of levelDB, but it is really good in what it does (simple hashing with sequential reads/writes) http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html

Marco Bonardo [:mak]

Comment 51

•

13 years ago

it may also use openUnsharedDatabase and exclusive locking, to avoid the shared cache and the concurrent writers checks. These changes are simple and known to improve performances.

Marco Bonardo [:mak]

Comment 52

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #50) > (In reply to Nathan Froyd (:froydnj) from comment #49) > > 1) Use a more relaxed PRAGMA synchronous on the localStorage DB; > > This is likely doable if we consider dataloss as a nonimportant issue for > localStorage. As a side note, levelDB in synchronous mode may perform worse than SQLite, so it's probably something that we'll have to do regardless, when moving to it.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 53

•

13 years ago

IMHO, dataloss is totally ok for localStorage.

Marco Bonardo [:mak]

Updated

•

13 years ago

Depends on: 704933

Marco Bonardo [:mak]

Comment 54

•

13 years ago

I've filed Bug 704933 to investigate these easy improvements, I hope someone from the perf team (or anyone else) can take it.

Dietrich Ayala (:dietrich)

Comment 55

•

13 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #53) > IMHO, dataloss is totally ok for localStorage. Why? Unlike sessionStorage, which is understood to be transient and volatile, localStorage is designed to be persistent.

Marco Bonardo [:mak]

Comment 56

•

13 years ago

(In reply to Dietrich Ayala (:dietrich) from comment #55) > Why? Unlike sessionStorage, which is understood to be transient and > volatile, localStorage is designed to be persistent. Dataloss here is intended as database corruption. All the data in localStorage comes from the web and can be added back again by the web app.

Shawn Wilsher :sdwilsh

Comment 57

•

13 years ago

(In reply to Marco Bonardo [:mak] from comment #56) > Dataloss here is intended as database corruption. All the data in > localStorage comes from the web and can be added back again by the web app. That's not true at all. Think offline apps. Think of pages that store user preferences in localStorage because cookies can easily be deleted.

Marco Bonardo [:mak]

Comment 58

•

13 years ago

(In reply to Shawn Wilsher :sdwilsh from comment #57) > That's not true at all. Think offline apps. Think of pages that store user > preferences in localStorage because cookies can easily be deleted. My point is that those pages have a fallback for when we don't have any of their data, otherwise they'd just not work. In case of a dataloss nothing is really lost, apart the fact the page will have again to populate the storage, that sure, may be an annoyance in case of offline apps. But even now we are surely not doing anything we can to protect from that, we use sync=normal. If we really don't want to move to the complete removal of fsyncs, we may still move to WAL+NORMAL, with a small journal size. At this point if we disallow levelDB from working on mainthread, I don't see that as a viable solution for localStorage, then we need an alternative path to reduce fsyncs.

Dietrich Ayala (:dietrich)

Updated

•

13 years ago

Blocks: 701872

Marco Bonardo [:mak]

Updated

•

13 years ago

Depends on: 736144

Lawrence Mandel [:lmandel] (use needinfo)

Updated

•

13 years ago

Whiteboard: [at-risk][see comment #43 for plan] → [at-risk][see comment #43 for plan][Snappy]

(dormant account)

Comment 61

•

12 years ago

e10s is not in Firefox desktop, so not covered by snappy.

Whiteboard: [at-risk][see comment #43 for plan][Snappy] → [at-risk][see comment #43 for plan]

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

12 years ago

No longer blocks: 682602

Honza Bambas (:mayhemer)

Comment 62

•

12 years ago

dup of either 807021 or 600307

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 64

•

12 years ago

Honza: Bug 807021 doesn't fully cover this since bug 807021 is only about writes. There is no reason we should ever need to do reads on the main thread if someone in a child process accesses localStorage since we can easily just block the child process until we've asynchronously read the data. Likewise, I would imagine bug 600307 doesn't fully solve the read problem here, does it? I do think that waiting for bug 600307 before fixing this is the right thing to do though.

Honza Bambas (:mayhemer)

Comment 65

•

12 years ago

(In reply to Jonas Sicking (:sicking) from comment #64) > Honza: Bug 807021 doesn't fully cover this since bug 807021 is only about > writes. There is no reason we should ever need to do reads on the main > thread if someone in a child process accesses localStorage since we can > easily just block the child process until we've asynchronously read the data. > > Likewise, I would imagine bug 600307 doesn't fully solve the read problem > here, does it? Sure it does. On the parent process we do the load on background. Actually, it behaves the same way, as there would be just a single process. Preloads happen on the background thread, if sync load happens, though, it blocks (there is no way around it, even for IPC!). According existing telemetry domain data reads are fast. The first telemetry from desktop Firefox using the patch from 600307 shows 75 - 100% of preloads makes it before localstorage access. SSD makes it better, HDD makes it a bit worse. Load times about 20ms when blocking. > > I do think that waiting for bug 600307 before fixing this is the right thing > to do though. This is a dupe of bug 600307.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 66

•

12 years ago

> Preloads happen on the background thread, if sync load happens, though, it > blocks (there is no way around it, even for IPC!). That's the thing. For IPC there is a way around it! On the child side you make a blocking call to the parent. But when there's a call coming from a child process, there is no need to do anything blocking in the parent! You can asynchronously load the data and while the load is happening the child process remains blocked. Once the data has been loaded, you can send back a response to the child which unblocks it. So from the child process point of view, it is a blocking call. But in the parent process nothing blocking happens. Of course, if localStorage is *used* in the parent process we'd have to make a blocking call. That we can't do anything about. But we shouldn't need to block when localStorage is used in a child process. That's what this bug is about. And that part is not fixed in any of the dependent bugs.

Honza Bambas (:mayhemer)

Comment 67

•

12 years ago

(In reply to Jonas Sicking (:sicking) from comment #66) > > Preloads happen on the background thread, if sync load happens, though, it > > blocks (there is no way around it, even for IPC!). > > That's the thing. For IPC there is a way around it! Do you have a code example? Because I don't think what you suggest is going to work or I simply am not getting it. How can I not block the parent? That is not clear to me. Thanks.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 68

•

12 years ago

Content processes should communicate directly with a non-main thread in the parent process.

Honza Bambas (:mayhemer)

Comment 69

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #68) > Content processes should communicate directly with a non-main thread in the > parent process. How?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 70

•

12 years ago

Child process main thread creates an IPC channel for PStorage or something on its main thread, and then the parent process opens and binds that channel to a non-main thread that services the requests.

Honza Bambas (:mayhemer)

Comment 71

•

12 years ago

StorageParent::RecvSomeSyncRequest(const &in_arg, &out_arg) will be called on the main thread and will block the main thread until value of the |out_arg| is known. I don't know about a way to make sync requests be handled on non-main thread.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 72

•

12 years ago

We would need to move the StorageParent code to run on an off-main-thread, with a dedicated IPC channel. Nothing would be different from the client (child) side.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 73

•

12 years ago

Chris: Is there no way in ipdl to send a message from the child to the parent and then block until it gets a response from the parent, while letting the parent produce a response asynchronously? I.e. let the parent send a separate message in order to deliver the response or some such? If that's not possible, that seems like something we should add to ipdl. Allowing a child to block while the parent runs some slow operation seems like one of the big benefits we could get from process separation.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 74

•

12 years ago

Parent can spin event loop while creating the response. That is kind of ok case for event loop spinning.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 75

•

12 years ago

Right, spinning the event loop is the only way to do that. The semantics of half-sync are messy and not at all clear, so I don't want to add it. Spinning the event loop achieves the same effect.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 76

•

12 years ago

Ah, is that what the message manager does? IIRC the message manager has the ability to let the parent process create a result asynchronously. So yeah, one "simpler" option here would be to let the parent process, when it gets a localStorage request from the child, call the preload API added in bug 600307, and then spin the event loop until whatever data was requested is available. Then it could fetch that data without ever causing blocking IO in the parent.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 77

•

12 years ago

Message manager doesn't have such functionality, but scripts in the parent can ofc spin the event loop.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 78

•

12 years ago

Ok, spinning the event loop works for me, and should make this bug relatively easy to fix as long as we have a way to get a callback when a particular origin has been preloaded into the localStorage.

Honza Bambas (:mayhemer)

Comment 79

•

12 years ago

Thanks for all suggestions guys. I see two ways here: - spin the even loop on the parent (super-easy to do), however it may cause unpredictable behavior when something underlying changes - have a dedicated channel handling on a non-main thread (no idea how to do it), I am more fun of this one when easy to do

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 80

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #79) > - spin the even loop on the parent (super-easy to do), however it may cause > unpredictable behavior when something underlying changes What underlying? You'd be spinning event loop very close to the place where data is received from the other process, so usually very close to the normal main event loop spin. And in case of showModalDialog nested spinning is legitimate anyway. As far as I see spinning event loop in this case isn't really a problem. It doesn't cause similar issues as alert() or sync XHR:

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 81

•

12 years ago

(In reply to Honza Bambas (:mayhemer) from comment #79) > Thanks for all suggestions guys. I see two ways here: > > - have a dedicated channel handling on a non-main thread (no idea how to do > it), I am more fun of this one when easy to do This is actually pretty straightforward and is definitely the better approach, but I don't know what assumptions in the localstorage code might be violated by running on a non-main thread. I or bent can help with the IPC boilerplate.

Honza Bambas (:mayhemer)

Comment 82

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #81) > This is actually pretty straightforward and is definitely the better > approach, but I don't know what assumptions in the localstorage code might > be violated by running on a non-main thread. I or bent can help with the > IPC boilerplate. Can I let the *child* receive messages on other then the main thread somehow? It would make the new code much much simpler and also immediately fix the issue of blocking parent's main thread. Thanks for help.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 83

•

12 years ago

Yes, absolutely.

Honza Bambas (:mayhemer)

Comment 84

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #83) > Yes, absolutely. Can you please reveal how? (Code example, API I can use to set the thread...)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 85

•

12 years ago

Here's the unit test for the feature http://mxr.mozilla.org/mozilla-central/source/ipc/ipdl/test/cxx/TestOpens.cpp The test starts in the parent process main thread, then sends a message to the child that initiates - in the child, creating a new channel, PTestOpensOpened - in the child, creating a new thread to process messages for that channel - binding the channel to the off-main thread - in the parent, accepting the channel allocation message - in the parent, also creating a non-main thread to process messages for the new channel and binding it to that thread Then there's some IPC chatter on the newly-created channel.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 86

•

12 years ago

Honza: I don't understand why we would want to do anything off the main thread in the child process though? Can you explain?

Honza Bambas (:mayhemer)

Comment 87

•

12 years ago

(In reply to Jonas Sicking (:sicking) from comment #86) > Honza: I don't understand why we would want to do anything off the main > thread in the child process though? Can you explain? Yes. I will describe the new implementation in the bug 600307 now: In a single process mode when a window is opening we tell the storage background I/O thread to do an async preload of the storage cache. It SELECTs and then for each key+value calls LoadItem(key, value) method of the cache. When done it calls LoadDone() that marks the cache as loaded (also can be understood as "valid"). These calls are made on the background I/O thread. When a JS script accesses localStorage sooner then the preload has finished, we block the main thread until LoadDone call (we wait via monitor.wait(), LoadDone calls monitor.notify()). There is no main thread event looping. In the IPC case when we get to the situation we have to wait for the async preload to finish there is no way (I didn't find one...) to just loop the IPC message queue to receive all async messages from the parent we expect to come. I cannot do monitor.wait() on the main thread since main thread has to spin the event loop to process the messages from parent. And I don't want to spin the loop (in what ever cost, we know why). I solved this by doing a sync call to parent to do a preload of all remaining unloaded data synchronously. Thus, if the child process would process messages on a non-main thread, then I can just call monitor.wait() on the main thread to for all data to be received from the parent. I could remove the relatively complicated sync load IPC code as well.

Honza Bambas (:mayhemer)

Comment 88

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #85) > - binding the channel to the off-main thread Thanks for this example. My situation is backwards according the IPC connection creation, it starts on the child. So I need to initialize the child first on non-main thread. The trick in the test is in these lines I presume: child opens PTestOpensOpened; virtual PTestOpensOpenedChild* AllocPTestOpensOpened(Transport* transport, ProcessId otherProcess) MOZ_OVERRIDE; When is AllocPTestOpensOpened called?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 89

•

12 years ago

That example does initiate the new IPC connection from the child side. The channel is allocated from this call http://mxr.mozilla.org/mozilla-central/source/ipc/ipdl/test/cxx/TestOpens.cpp#157 which is the interface generated from |child opens PTestOpensOpened|. That function does some IPC magic behind the scenes. When it's finished, then your AllocPFoo*() functions will be called in both the parent and child processes and handed tokens that refer to the new IPC channe. Those methods, implemented by you, are responsible for Open()ing the channel (which binds it to the thread it was opened on). For that part you'll just want to copy the boilerplate ;).

Honza Bambas (:mayhemer)

Comment 90

•

12 years ago

Chris, thanks very mach for suggestions! I decided to do this as a followup after bug 600307 landing.

Honza Bambas (:mayhemer)

Updated

•

12 years ago

Depends on: 600307

Honza Bambas (:mayhemer)

Comment 91

•

12 years ago

Note: after we land 600307 the situation will much improve. However, there still is a chance to be blocked on a corner case synchronous load of the cache (comment 65). It will be completely removed by work on this bug.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 92

•

12 years ago

(This is one of the sources of b2g jank that we prefer not to talk about.)

Blocks: b2g-v-next

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 93

•

12 years ago

Just need to fix.

Whiteboard: [at-risk][see comment #43 for plan] → [at-risk][see comment #43 for plan][tech-bug]

Marco Bonardo [:mak]

Comment 94

•

12 years ago

Honza, what's current situation here, did anything change from comment 91 after the new implementation landed? What may we still do here? Comment 43 is clearly no more the plan, having a plan, even if future, would be useful. Looks like bug 736144, the only dependency, is blocked on getting more telemetry.

Flags: needinfo?(honzab.moz)

Whiteboard: [at-risk][see comment #43 for plan][tech-bug] → [at-risk][tech-bug]

Honza Bambas (:mayhemer)

Comment 95

•

12 years ago

I don't work on this bug just because I'm really pretty occupied with network stuff (http cache rewrite). Blocking of the main thread on the parent may still happen, however with much lower probability then before 600307. What we need is to receive IPC messages on the parent process on a non-main thread - that's all - nothing more to do! I have an idea how to do it, but IPC API is hard. If there is anyone that could do it quickly then please do it. I may not get to this at least a month, I can say.

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 96

•

12 years ago

I don't work on this bug just because I'm really pretty occupied with network stuff (http cache rewrite). Blocking of the main thread on the parent may still happen, however with much lower probability then before 600307. What we need is to receive IPC messages on the parent process on a non-main thread - that's all - nothing more to do! I have an idea how to do it, but IPC API is hard. If there is anyone that could do it quickly then please do it. I may not get to this at least a month, I can say.

Marco Bonardo [:mak]

Comment 97

•

12 years ago

Thanks, I was going through the StorageJank bugs and trying to update their status and ensure there's a plan for them, this is good information to have here.

Marco Bonardo [:mak]

Comment 98

•

8 years ago

Honza, is comment 95 still relevant, should we just close this as WFM and file a separate bug for the IPC problem, since it's not related to main-thread IO?

Flags: needinfo?(honzab.moz)

Jan Varga [:janv]

Comment 99

•

8 years ago

(In reply to Marco Bonardo [::mak] from comment #98) > Honza, is comment 95 still relevant, should we just close this as WFM and > file a separate bug for the IPC problem, since it's not related to > main-thread IO? I'm sure Honza is ok with me answering this question... Yeah, that comment is still relevant and I'm actually working on fixing the problem in bug 1350637.

Flags: needinfo?(honzab.moz)

Marco Bonardo [:mak]

Updated

•

8 years ago

Depends on: 1350637

Honza Bambas (:mayhemer)

Comment 100

•

8 years ago

(In reply to Jan Varga [:janv] from comment #99) > (In reply to Marco Bonardo [::mak] from comment #98) > > Honza, is comment 95 still relevant, should we just close this as WFM and > > file a separate bug for the IPC problem, since it's not related to > > main-thread IO? > > I'm sure Honza is ok with me answering this question... > Yeah, that comment is still relevant and I'm actually working on fixing the > problem in bug 1350637. Thanks, yes, I agree.

Worcester12345

Comment 101

•

8 years ago

Please add the "Perf" key word.

Jan Varga [:janv]

Updated

•

8 years ago

Keywords: perf

Honza Bambas (:mayhemer)

Updated

•

7 years ago

Updated

•

7 years ago

Priority: -- → P2

Hsin-Yi Tsai (she/her) [:hsinyi]

Updated

•

6 years ago

Component: DOM → DOM: Web Storage

Andrew Sutherland [:asuth] (he/him)

Comment 102

•

6 years ago

Marking works-for-me as fixed by bug 1350637 which :janv noted in comment 99 was the fix for avoiding the parent process main thread. (Things now go via PBackground in the legacy implementation, and will also go via PBackground when LocalStorage NextGen is fully landed and enabled.) The main thread of whatever content process is using localStorage where the preload did not complete in time, but there's no avoiding that.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WORKSFORME

experimental1 14 years ago Honza Bambas (:mayhemer) 11.55 KB, patch		Details \| Diff \| Splinter Review
experimental1.1 14 years ago Honza Bambas (:mayhemer) 11.01 KB, patch		Details \| Diff \| Splinter Review
profiling data 14 years ago Alon Zakai (:azakai) 8.08 KB, text/plain		Details