The sad state of affairs is that papercuts are the death of us all. We've come t...

wheels · on July 5, 2011

Your comment is wrong on so many levels:

• It's not even on topic. This post is not about general memory usage; it's about fragmentation in the javascript VM – i.e. that a system object existing in a general memory page precludes that page from being garbage collected / deallocated even though the lifetime of system objects is very different. This is not a trivial matter for which the only explanation is laziness.

• It's spoken like someone who's never seriously worked on a large open source project. Memory profiling and leak tracking is done widely and often (using valgrind, mostly).

• You speak as if you know how much memory a web browser should use. How do you know? Have you worked on modern web browsers? Or are you just, as I assume, by fiat deciding that 500 MB is too much?

• There's very often a speed / memory usage tradeoff. At present, especially for web browsers, users tend to prefer speed. (i.e. how many rendered pages do you cache per tab so that clicking the back button is next to instant?)

• The right time to profile and optimize is usually later in the development process. We all know the famous quote. Optimizing early tends to lead to ugly code that, amusingly, is harder to optimize later.

srean · on July 5, 2011

While not right square on topic I dont think the parent post was so off target compared to digressions and explorations that are common on discussions here that it warranted an admonishment. I have not "worked on modern browsers" and that seems to disqualify me from expressing an opinion on what I consider using too much as an user of a piece of software.

But this much I can say, that even a few years back I could comfortably use a Linux desktop on 64MB RAM. Now even a Debian box with X would run rather choppily on a system that has 256MB and it is not very clear to me in what exact way is my compute experience better now than the one that I had on the 64MB system. Even those that are considered a lean browser, take upto 200MB of that 256 and still does not work smoothly.(FF 3.5 still does fine on this system BTW). Had my old system been augmented with some 3Gigs of RAM it is possible that its performance would be worse than a modern system of equivalent RAM, but now that it is common for a system to occupy more resources to accomplish the same level of activity is at times an irritant.

The usual counterargument is that RAM is cheap and one should just go and buy some or suffer otherwise. I think it is this assumption that is made or a requirement that is imposed by the newer versions of software that GP was complaining about.

hapless · on July 5, 2011

"Comfortably use" my eye. Web browsers have always been bulky by the standards of their day. I remember using IRIX on a 64MB SGI Indy and cursing. It thrashed swap furiously with Netscape 4.

Today's agony always seems new and fresh, but it never is. I still have some of those old systems sitting around to remind me. The "good old days" were pretty miserable.

Today I have Linux and Firefox and the exact same problems -- in 1G of RAM, which is a comparably low-spec system today to what the Indy was back then. The more things change, the more they stay the same.

jbert · on July 5, 2011

> • There's very often a speed / memory usage tradeoff. At present, especially for web browsers, users tend to prefer speed. (i.e. how many rendered pages do you cache per tab so that clicking the back button is next to instant?)

This is a good point, and the amount of memory used for such a cache is ideally "all of it, but no more".

Which is the same argument for buffer cache.

So I wonder if it's possible to use the buffer cache in this way?

Perhaps write the rendered image to a tmpfile. Then don't sync the write and also something like fadvise(FADV_DONTNEED). The app can then free the memory. (Clearly you could play a similar game with mmap()).

The idea would be to have the memory sitting in the system buffers. But the app can inform the system that these pages are discardable under memory pressure, rather than needing to be written to disk.

bzbarsky · on July 5, 2011

What's being cached in the "rendered page" case is not an image but an object graph in memory.

Writing that to disk involves either fixing up all the pointers on read or making sure you can read it all back into _exactly_ the same locations in your virtual address space or something. This can still improve performance over creating the object graph from scratch (c.f. XDR for Gecko's JavaScript or Mozilla's fastload stuff for the user interface), but not nearly as much as just having the object graph there.

One other note: just because _you_ don't sync the write doesn't mean some other random part of your app, or some other app, won't.... and then you have extra disk traffic.

jbert · on July 5, 2011

Well, you either serialise/deserialise your data structure to make it memory relative, or you could use mmap().

Allocate such pages with a malloc which works out of an anonymous mmap() backed pool, with madvise(MADV_DONTNEED).

The OS will then discard the pages under memory pressure and will give you back zero filled pages if they have been discarded.

If each object has a nonzero sentinel value and is smaller than a page, you can detect discarded objects.

Objects bigger than a page can be composed of a 'dictionary obj' which pts to multiple objects - check the sentinel on each and discard/rebuild the whole lot if any are missing.

That last bit may be hard to retrofit, I agree.

bzbarsky · on July 5, 2011

> Well, you either serialise/deserialise your data structure > to make it memory relative

Yes, aka "fix up all the pointers".

> or you could use mmap()

That doesn't help: the objects in the DOM are already allocated on the heap. You just want to hold on to them and then drop those references at some point. Writing a pointer to a heap object to disk, even via mmaped memory areas, isn't going to really work well.

And you can't just drop the objects from the heap because they may have references from elsewhere to them.

jbert · on July 5, 2011

> That doesn't help: the objects in the DOM are already allocated on the heap. You just want to hold on to them and then drop those references at some point.

The idea is to have more than one heap, with different behaviour. Objects in one heap have a "can disappear at any time, but we can detect that" behaviour.

An 'object deep copy' is all it takes to move an object from one heap to another. Or if you prefer, you can allocate all such objects in the discardable heap but temporarily 'pin' an object by changing the madvise on it's pages (this involves more page-level complexity of how stuff is laid out and how pages are shared between objects).

I'm not saying this is a 2 hour project, but it's the sort of thing which could be captured in a library without too much complexity.

> Writing a pointer to a heap object to disk, even via mmaped memory areas, isn't going to really work well.

You'd be surprised. I've done exactly this [1]. As long as your mmap'd addresses are stable, it's fine. Also note that in the case we're talking about,the pages never go to disk. They're in anonymous-backed mmap'd memory, which the kernel has been told to throw away instead of writing to disk (swap in this case, since this is an anon map).

[1] well, pretty close. I made a single-proc app multi-proc by making it's key data structures allocate out of a pool controlled by a custom malloc, backed by a mmap()'d disk file. Other processes then attached to that file via mmap() at the same address and lo and behold, all the pointers to the nested data structures were good. (One proc had read/write access to this pool, the others were all readonly. Some additional locking was required. Your mileage may vary.)

You're right in that things won't work if something else has a ptr to these pages. But as long as all references to such pages go via a single "get_page_from_cache()" it's fine.

bzbarsky · on July 6, 2011

> You're right in that things won't work if something else > has a ptr to these pages.

That's the problem, exactly. In a browser context it's pretty easy for something else (another frame, a web page in another tab or window) to have pointers into the DOM of pages that have been navigated away from. So when a page is evicted from the in-memory cache some of the information can just be destroyed. The layout objects, say. These are already allocated out of an arena, so the mmap approach may work there; it's worth looking into. But the DOM can't just be unmapped (and in fact is not arena-allocated for the same reason right now) unless you're willing to pin the whole mmaped area if something holds on to those objects, which brings us back to fragmentation issues.

ootachi · on July 5, 2011

The latter doesn't really solve the problem. Folks are still going to see that Firefox is using a lot of memory, even if the OS is now more inclined to swap out pages that are less important.

jbert · on July 5, 2011

I think it solves the technical problem, I agree that there is an potentially an issue of perception. (Note that the OS won't swap these pages, it will discard them under memory pressure).

(Although an argument could be made that any the OS need not account pages which a process has marked as DONTNEED as "belonging" to that process). In as much as a page does "belong" to a process, anyway.

mike-cardwell · on July 5, 2011

"There's very often a speed / memory usage tradeoff."

And yet Firefox seems to use about 4 or 5 times more memory than Chrome on my system whilst being noticably slower as well...

If Chrome had all the addons that Firefox had, Firefox would be a dead browser.

I still use Firefox as my primary browser. For now.

itsnotvalid · on July 5, 2011

I have the inverse of your situation that chrome always use much more memory than firefox.

william42 · on July 5, 2011

That's because 1) Chrome is better-designed 2) Chrome extensions are JS and not native code, meaning they're less likely to leak due to programmer error, and there are fewer of them, to boot

lmz · on July 5, 2011

> 2) Chrome extensions are JS and not native code, meaning they're less likely to leak due to programmer error, and there are fewer of them, to boot

Are there really that many Firefox extensions that use native code?

udoprog · on July 5, 2011

Side note: Firefox extensions aren't native code, they're written in JS.

EDIT: I do however agree that chrome is more well designed. Firefox has been through a lot of iterations through the years.

mnutt · on July 5, 2011

While most Firefox extensions are in JS, I believe you can bind to native c++ as well.

lmz · on July 5, 2011

Chrome also allows you to access native code in extensions using NPAPI plugins: http://code.google.com/chrome/extensions/npapi.html

z92 · on July 5, 2011

I remember sometime back probably around 2004. There was a lot of noise regarding memory leak in FF. And the expert sounding ones started to claim that there was no memory leak and the very large memory usage in FF was because of browser history and cache storage. Then one year later they all accepted that FF was actually leaking memory.

Since then I stopped accepting expert opinion blindly.

wheels · on July 5, 2011

Experts are sometimes wrong. That doesn't change the fact that they're right vastly more often than non-experts.

bzbarsky · on July 5, 2011

There is an important difference between "expert-sounding" and "expert".

Where memory leaks (or pretty much any performance metric) are involved, the more sure someone sounds, the less likely they are to be an expert unless they have done extensive measurements to back up their claims.

clobber · on July 5, 2011

I could be wrong, but wasn't a large part of this due to Flash?

starwed · on July 5, 2011

I think it turned out that a large part of it was due to, like, 100 different small things. There was never a memory leak.

nnethercote · on July 5, 2011

The key word here is "workload". A web browser is a programming environment. It would be silly to say "no single native program should use 500MB of RAM", and it's equally silly to say that no website (and by extension, the browser) should use that much.

That's not to say Firefox's memory usage is perfect, far from it. That's why we're making improvements like this one. See https://wiki.mozilla.org/Performance/MemShrink for more.

saulrh · on July 5, 2011

It's not just a programming environment, either; a web browser is an interpreter for some of the most poorly-written, highest-level code in existence. Not very conducive to good memory usage even theoretically.

mixmastamyk · on July 5, 2011

Perhaps, but wouldn't you agree it should get freed when you close the tabs?

nnethercote · on July 5, 2011

Yes, although it might not happen immediately. That's exactly what this fix helps with. And see https://bugzilla.mozilla.org/show_bug.cgi?id=668809 for more about this.

ahpeeyem · on July 5, 2011

But then the Ctrl+Shift+T feature to reopen closed tabs wouldn't work so well. Try it, you can reopen closed tabs as far back as you like; it's pretty awesome and has saved me more than once.

It doesn't work in Private Mode (for good reason); I wonder if browsing in private mode consumes less memory?

mixmastamyk · on July 5, 2011

I like that feature and use it. But it can be separated from the memory caching. Back button speed is nice too, but I could do without it.

There are a few options here ... keep the last one in memory, the last few in the disk cache, etc. Right now it appears to keep everything you've ever visited in RAM, which is clearly wrong.

__rkaup__ · on July 5, 2011

Hmmm... how about using Private Mode through a proxy to keep your history?

saulrh · on July 5, 2011

I believe that there are also some information-theoretical limits on the efficiency of garbage collection and memory allocation. Don't quote me on that, though.

demallien · on July 5, 2011

It's not laziness, it's all about avoiding premature optimisations. Personally I write the bulk of my code on embedded systems where we are severly constrained both in sheer CPU performance, but also in the amount of memory available. Nevertheless, I still start off writing code in what you might call a "lazy" style, but which I tend to think of as "the simplest, clearest way to write code to do the job". I assume I'll have the memory and the performance that I need. Having written the code, if that assumption turns out to be false, only then do I go back and try to improve performance characteristics - adding caching, optimising algorithms to avoid doing unnecessary calculations, and so on. The code is inevitably longer, and less flexible, after this work which means that I have probably added in bugs during the process. And that is why I don't start off doing things that way, I'm trying to keep the LOC count (and hence the bug count) down as much as possible. Memory and performance optimisations are done only when the is a demonstrated need for them, and I feel that this is the correct way to approach optimisation.

JoeAltmaier · on July 5, 2011

Yeah; sounds like a good process. But all too often the "try to improve performance characteristics" part is entirely skipped. It doesn't take much to run A/B tests and wonder why you are 5X over other similar apps. It was clearly never done, at least not by anybody in a position to help mitigate the issue.

I use a similar version of your processes. adding: when code complete examine the footprint and speed, and diligently explain to myself where the time/memory went. If it was wasted, do something about it immediatley.

techdmn · on July 5, 2011

Obligatory Abrash quote: "In the long run, programs will be bigger and slower yet, but computers will be so fast and will have so much memory that no one will care." [1] I don't agree 100%, but he has a point. I'll say that resources are cheap enough they aren't always worth optimizing. On the other hand, it's easy to get caught on the wrong side of that. Always painful to discover that someone else's "plenty to spare" is your "just a bit short".

[1] http://www.phatcode.net/res/224/files/html/ch39/39-01.html

mcpherrinm · on July 5, 2011

What if I have 10 tabs open, each with 5 images that are 10mb each? It's not an unreasonable workload.

Yet there's 500mb, right there. Never? Whatsoever? I don't think so.

sophiebits · on July 5, 2011

I seriously doubt you're opening many pages with 10 MB images; even 1 MB is considered pretty large for a web image.

masklinn · on July 5, 2011

> even 1 MB is considered pretty large for a web image.

Compressed. Remember that in order to display it, the browser has to unpack it to what amounts to a bitmap. And newer, better compression formats (e.g. PNG) make that even worse in compressing the same bitmap file to a much smaller transmission size.

tsvk · on July 5, 2011

Did you consider compressed JPG/PNG on disk versus uncompressed raw image data bitmap in RAM?

A 1024x768 32-bit image is 1024x768x4 = 3.145.728 bytes when uncompressed.

mcpherrinm · on July 5, 2011

Right: the example is contrived. My point is mainly that it is actually within reach, and that workload needs to be taken into consideration when making any statement about how much memory a browser should be using.

epivosism · on July 5, 2011

It happens to me all the time - the "linked images" bookmarklet combined with EarthPorn reddit + autopager.

ootachi · on July 5, 2011

"Simply we're being cut to death by bloat and pure, sheer lazyness."

All these memory leaks being fixed are "pure, sheer lazyness"?

https://bugzilla.mozilla.org/buglist.cgi?keywords=mlk%2C%20&...

ak217 · on July 5, 2011

That's not what spitfire meant. He meant that the people who put the features responsible for these extra fields/leaks/fragmentation/etc. were not thoroughly performance tested or thought out with performance in mind, out of laziness.

gorbachev · on July 5, 2011

I agree on your principle, but not with the 500MB comment.

You don't know how people use their browsers.

For example, I have 8 tab groups open right now, each with 5 - 20 tabs. Let's say that's about 75 tabs. That's only 6.67MB each to fill 500MB. I'm regularly over a 1GB. I'm sure a significant part of that is actually memory leaks, but that's beside the point.

michael_dorfman · on July 5, 2011

Wait, you have 75 tabs open? Doesn't the time it takes to locate the tab you are looking for exceed the time it would take to launch a new tab, and load the desired page?

Argorak · on July 5, 2011

The firefox awesome bar has an awesome (haha) feature for that: if you type an address, it will detect wether you have matching tabs open and give you the option to switch instead. It works over multiple tab groups and windows. So: locating a tab is quicker and it keeps the state of the page.

jpr · on July 5, 2011

I presume that most people who have lots of tabs open, like I do, use addons that make finding tabs fast. I use Tab Kit to organize the tabs in trees, and Pentadactyl to give me fast search that is activated by pressing b. I can't imagine using a browser which doesn't have either or both of these features.

gnosis · on July 5, 2011

There's really no reason why browsers couldn't have an option to make tabs which aren't currently active (ie. being viewed) be treated as bookmarks, which take up virtually no memory, since all they are are a URL and a bit of metadata.

This would, of course, slow down browsing when switching from tab to tab, since the entire page would need to be reloaded. But I would gladly give up some speed for memory on my old machine with only 3.7 gigs of RAM.

Before anyone asks why don't I just use bookmarks in the first place, let me say that I prefer the organization that tabs give me for pages I am currently working on. I do move those tabs to bookmarks if I use them long enough. But in the relatively short term I prefer tabs.

Another option that would be a compromise between the solution proposed above and keeping all tabs in memory would be to keep only the N most recently accessed tabs in memory, and treat the rest as bookmarks. That would make switching between those N tabs relatively responsive, but still use a lot less memory than having every page for every tab reside in memory. Especially for people like me, who regularly have over 100 tabs open.

Of course, the value of N should be user-configurable. With a N=1, the second proposed solution would be equivalent to the first.