Ask HN: Why does 'View Source' issue a new HTTP request?

spookylukey · on Dec 19, 2016

It's a bug: https://bugzilla.mozilla.org/show_bug.cgi?id=307089

Or, it's a memory saving feature. To implement "View source from cache" requires keeping around the raw page HTML, which you might not otherwise need after parsing - except you probably will for all the developer tools to work, so this probably should just be considered bug.

supergreg · on Dec 19, 2016

I wouldn't mind having to activate it in the developer tools and disable when I don't need it anymore and can feel the difference. Definitely easier than having to open a separate program like fiddler.

_ugfj · on Dec 19, 2016

How much memory is saved here? http://www.httparchive.org/interesting.php?a=All&l=Dec%202%2... 52 _kilobytes_ ? Really?

morley · on Dec 19, 2016

The browser would have to save the source text for EVERY tab it opens, regardless of whether you view source or not. That seems more onerous than making a separate request for the source code every once in a while.

threeseed · on Dec 19, 2016

Am I in some parallel universe here ?

We are talking about hundreds of KBs of memory total which is nothing and if it's a worry for mobile devices we have the disk. It's not a feature that demands instant performance.

heartbreak · on Dec 19, 2016

> It's not a feature that demands instant performance.

Hence it's acceptable to request the document from the server when opening 'View Source'.

gbog · on Dec 19, 2016

No, because when you want to view the source you mean to view the source of this very document, not the source you'd get by asking to reload the page, it is often different.

sp332 · on Dec 19, 2016

If you want to see the page as it's currently being rendered, looking in the developer tools seems more relevant anyway.

evilDagmar · on Dec 19, 2016

If they wanted that, they'd have looked at the developer tools to begin with.

Many of you folks are completely missing the point. The world wide web took off in large part because it was incredibly easy to learn HTML, because with every webpage if one wanted to know how it worked one could just look at the source code.

How the page is currently being rendered, what state the DOM might be in... These things do not matter to someone trying to view the source HTML for a page. They're looking to learn about the HTML. They're not goimg to get that by viewing pre-digested DOM information.

speeder · on Dec 19, 2016

Or a website I was trying to fix: the source in view source, and thr stuff on dev tools, never matched each other, even with JS disabled. in the end after days I gave up and the page is in production, crazy bugs and all, because I can't figure why on both chrome and Firefox the webpage end with lots of random "strong" and "span" tags that don't exist anywhere in the original source. The tags aren't even closed properly, some are never closed, some are closed multiple times.

jmcdiesel · on Dec 19, 2016

Thats just it...

There are edge cases where a bug is intermittent... and is masked by something on the client side, especially possible with browser plugins. In fact, plugins were the cause of 2 of these for me... Where some issue on page load was causing a bug, but then JS was changing the source away from what cased the bug, but a refresh wasn't guaranteed to have the same information (this was a fast changing log-viewer, for one of them) ...

So you end up not being able to capture the init state of the page... but the bug wouldnt show up without JS enabled because the error is in the JS...

Not common case... but it seems like fetching from the server is MORE work for no reason when the data is already there...

evilDagmar · on Dec 19, 2016

No it's not, and it's particularly inappropriate for any document being viewed as a result of a POST request.

EpicEng · on Dec 19, 2016

But who does this feature help exactly? A _tiny_ number of users. Chrome is currently using over 3GB on my machine. Anything they can do to trim that without affecting performance is worth it to me.

ComputerGuru · on Dec 19, 2016

3GB is how much memory the _rendered/executed_ source code takes. The source code is nothing. Store it LZ4 compressed in memory and it's less than a rounding error.

EpicEng · on Dec 20, 2016

Yes, I know that, but all of these little inefficiencies add up, and for what? A tiny portion of users who need the developer tools? Not worth it.

This kind of thinking is exactly what leads to our software being as slow or slower than that of two decades ago while running on machines hundreds or thousands of times more powerful.

amelius · on Dec 19, 2016

Yes, but if you have more than enough memory, then that's no problem.

So, the browser should load the HTML page into a part of memory that can be discarded by the OS if the OS needs more memory. Actually I think it is strange that no such memory API exists in Unix.

EDIT: Anyway, storing it in the browser's file cache would also do the trick, I suppose :)

Insanity · on Dec 19, 2016

Isn't that one of the issues nowadays, that people have the attitude "we have enough memory" and thus don't tend to care about optimizing for memory usage anymore.

Your suggested solution, about discarding when memory is needed, solves this problem. But it solves a problem that would be created by needlessly storing a lot more data than needed. So it's a solution to a problem that was not a problem to begin with.

But I know, if it's a few kb, it will not make a huge difference.

tmd83 · on Dec 19, 2016

No this is really not the case of thinking we have enough memory. This is the realization that the static source code of a page is a percentage error compared to the memory overhead of a rendered page. The modern browser are memory hog I would rather they focus on that.

Its also a broken implementation function wise since I have asked for the source of the current page not of a reload.

iLoch · on Dec 19, 2016

100kb * 100 tabs is 10MB. I'll take my chances.

kahnpro · on Dec 19, 2016

So hide it behind a feature flag?

user5994461 · on Dec 19, 2016

> "we have enough memory"

Your phone is in disagreement.

Fnoord · on Dec 19, 2016

We're talking about a default feature which is disabled 'for average users'. Average users only have a few tabs open in their browser. Not 100. Average users also don't have 3 GB RAM, nor do they use view source on a mobile phone, and neither do average users actually use the feature. So the default setting makes sense even though a user who is using the feature may end up using several MB due to a reload (not cool on a plan).

If you are saving 57 kB per tab open, that'd be ~5,7 MB with 100 tabs open. But if you have 100 tabs open on a mobile phone (!!), you have a bigger problem, and all those tabs are causing swapping already anyway. In that sense, enabling the feature by default makes sense. And don't forget that some people don't have flat rate internet.

dagw · on Dec 19, 2016

My phone has a quite insane 3 GB of RAM.

mbertschler · on Dec 19, 2016

My phone also has the insane 3GB of RAM, but last week Chrome killed Spotify and vice versa (music stopped playing) as soon as I switched apps. Sadly 3GB is not enough for Android and todays apps.

user5994461 · on Dec 19, 2016

The most common $150 phone from last year only has 1GB, half of which is reserved for the OS.

r721 · on Dec 19, 2016

Similar "tab discarding" feature already exists in Chrome, by the way:

https://developers.google.com/web/updates/2015/09/tab-discar...

tomtomtom777 · on Dec 19, 2016

> So, the browser should load the HTML page into a part of memory that can be discarded by the OS if the OS needs more memory. Actually I think it is strange that no such memory API exists in Unix.

It is more tricky then you would think to determine what "needed" memory means. Does the OS need disk cache? Or the content of memory mapped files?

It's not unlikely that you currently have some process active which has memory mapped a huge file. Does it need the content? Who knows.

evilDagmar · on Dec 19, 2016

You are talking about an OS-provided cache for the browser to store it's own cache in? There's little sense in that.

Kiwikwi · on Dec 19, 2016

Moving memory management into the OS often makes sense, because the OS has the big cross-application picture, knows the system-wide memory pressure, and most importantly, already manages the memory of applications, by swapping between RAM and disk.

For the same reason, OS X and Android Linux both have systems for OS managed caches, and AFAIK Firefox already uses these: https://bugzilla.mozilla.org/show_bug.cgi?id=748598

The status on mainline Linux is a bit more nebulous (seems Android's ashmem has been upstreamed, but it's not directly usable on GNU/Linux systems?), and other efforts have stranded: https://lwn.net/Articles/602650/

For some more thoughts about memory management on OS level vs. application level, I can recommend this "random outburst" from the designer of the Varnish HTTP cache: https://www.varnish-cache.org/docs/trunk/phk/notes.html

godzilla82 · on Dec 19, 2016

But the amount of memory required for plain text source is very small compared to the memory allocated to render the page.

stavros · on Dec 19, 2016

What everyone is missing here is that one in ten thousand users needs this, so why optimize for the vast minority?

acqq · on Dec 19, 2016

> requires keeping around the raw page HTML

Isn't it already in the browser's cache in that state anyway?

And if the cache is small, the preservation doesn't even have to be "for all tabs," if the last few pages can be retrieved from the cache nobody would complain that the older ones behave as they behave now -- you typically don't do "I wonder what was the source of the page from yesterday" in this old tab, but even if you do, you wouldn't be surprised that the source of yesterday does have to be requested again. So I imagine the fix as just "if in the cache, get it from there, else request."

SerpentJoe · on Dec 19, 2016

> Isn't it already in the browser's cache in that state anyway?

Like disk cache as in Cache-Control? In most cases you wouldn't cache the HTML itself, but in cases where you do then your use case should already work as stated, since for the browser to do otherwise would imply the cache is being intentionally ignored for the view-source request.

acqq · on Dec 19, 2016

> In most cases you wouldn't cache the HTML itself

Why not, at least as long as that's the topmost tab? Wouldn't then the view-source-new-request problem be solved by just using the existing features?

bpicolo · on Dec 19, 2016

With devtools closed seems not worthwhile then. Vast majority of users aren't using devtools

riobard · on Dec 19, 2016

Why keeping the original source in memory? Why not saving it to disk?

criddell · on Dec 19, 2016

Maybe because it would prematurely age SSDs?

arpa · on Dec 20, 2016

like anybody gives a damn about that. Spotify for sure doesn't.

mullsork · on Dec 19, 2016

Why would you need the original HTML for dev tools to work? Maybe there are ones I've never seen but the ones I use are using the DOM rather than the original HTML string.

CoffeeOnWrite · on Dec 19, 2016

You don't typically.

One case where you would View Source in addition to using Dev Tools is debugging how the browser is massaging your source HTML into the DOM, for example by inserting missing <tbody> elements. Validating your HTML mostly addresses this (I'm actually not sure about the <tbody> example, I would hope the validator at least issues a warning), but isn't something you necessarily want or can do with, for example, user-uploaded HTML snippets.

mullsork · on Dec 21, 2016

True, though I guess using some HTML validation could improve that. Not sure if <tbody> is mandatory but I think I've seen some tool complain about its absence.

akerro · on Dec 19, 2016

I showed a few people who have no idea about programming whatsoever how to hack websites to download .mp4 or .mp3 from source code of the website. How would you show them that using DOM...

g00gler · on Dec 19, 2016

How would you show them that using the HTML source?

You can use dev tools to view the network tab to find whatever URL Pandora is using.

You can use dev tools to find a media elements src attribute.

To see the JavaScript sources... You check the dev tools.

If anything, in the days of JavaScript, the HTML source will missing a few things.

recursive · on Dec 19, 2016

If there's an audio tag, that element exists in the DOM and has a reference to the url where you can download it. Is this a trick question?

akerro · on Dec 20, 2016

Yes, my mother doesn't know what DOM is or why there are so many buttons in devtools. ctrl+u -> ctrl+f -> mp3 is much easier to explain and repeat.

mullsork · on Dec 21, 2016

Right but then you're talking about viewing the HTML source right? In other words "View Source." I'm asking why dev tools would need access to the original source and not the DOM. It doesn't make sense to save the original source in memory to me once it's been parsed.

corobo · on Dec 19, 2016

ctrl+shift+i, ctrl+f "mp4"

koolba · on Dec 19, 2016

Wow first raised in 2005. Nice!

burnbabyburn · on Dec 19, 2016

Couldn't you reconstruct the source from the internal dom representation anyway?

monk_e_boy · on Dec 19, 2016

No. The parser corrects and changes the source to make the DOM.

So it is impossible to find errors usually. So things like:

    <p><div> </p></div>

the parser will correct, and therefore the DOM will be correct. But what it does to fix this may break your site. I often found the CSS would be screwed up for many reasons (rules don't match DOM structure -- due to bugs all over our codebase)

OJFord · on Dec 19, 2016

Also, changes made by client-side JS.

burnbabyburn · on Dec 19, 2016

oh right.

oneeyedpigeon · on Dec 19, 2016

I wish View Source were more flexible. I would like:

* Raw: a view of the actual body as sent in the response. Although I'm not aware of the current situation, at least some browsers used to subtly alter what was sent, even for View Source (possibly related to validity corrections). I want a guarantee that what I'm viewing is what the server sent.

* View source as it is today (with a good common understanding of what that means), but a bit more powerful. Give me a cursor so I can copy from the keyboard, for crying out loud! Maybe even let me edit the source so I can work with static pages more easily.

* Something in-between View Source and DOM inspector. E.g. the original source, guaranteed to be untouched by javascript, but cleaned-up for easier reading (given that the source returned by many websites nowadays is practically unreadable (take a look at this page, for example). Reformatting (where possible), maybe automatic expansion of any base href, consistent ordering of attributes, highlighting of errors, etc.

LostCharacter · on Dec 19, 2016

To prevent a new browser request and to get a formatted version of the raw source in Chrome, you can open up the dev tools, go to sources, find the name of the page, and then click the curly brackets in the bottom left corner of the tools. It's still missing a good cursor implementation, though.

db48x · on Dec 19, 2016

Hit F7 for caret browsing.

arethuza · on Dec 19, 2016

Isn't a debugging proxy (like Fiddler) a good choice for the first of those?

nommm-nommm · on Dec 19, 2016

Dev tools in most browsers does the first as does Fiddler as mentioned in a sibling comment.

eptcyka · on Dec 19, 2016

You can always use curl. Chrome allows you to copy paste a full curl command (with all the arguments and cookies) to issue an identical http request.

dagw · on Dec 19, 2016

Identical Http requests don't necessarily return identical data.

3pt14159 · on Dec 19, 2016

Provided the content hasn't changed. Sometimes you have a bug that goes away when you refresh the page.

czep · on Dec 19, 2016

Once the DOM is ready, the browser has no need to keep the original source in memory or cache. For how many page views does the user want to view source? One in a thousand? You're asking the browser to waste space storing something that it will only rarely be asked to display.

phaser · on Dec 19, 2016

it could be garbage collected when user closes the tab.

oneeyedpigeon · on Dec 19, 2016

They (well, Chrome at least) even do that for a POST, which is highly questionable.

jmkni · on Dec 19, 2016

I usually have Fiddler running constantly in the background.

If a page is slow to load and I want to see the HTML, I would find the actual HTTP request in fiddler and grab the HTML that way.

Alternatively, the element inspector doesn't re-issue the request, but it will show the compiled HTML, so after DOM augmentation etc

TheRealWatson · on Dec 19, 2016

Is there anything comparable to Fiddler for macOS? I heard you can run it in VM but that's just too much brute force for me.

callahad · on Dec 19, 2016

https://mitmproxy.org/ or https://charlesproxy.com/

fastball · on Dec 19, 2016

Fiddler is available for MacOS.

TheRealWatson · on Dec 19, 2016

Interesting. I had no idea they had a beta version for macOS. I've gotta try it.

unknownUsers · on Dec 19, 2016

From what I can tell the Telerik version is on mac also

spydum · on Dec 19, 2016

Burpsuite has interception proxy and runs on Java.

nikbackm · on Dec 19, 2016

Maybe the HTML source is discarded after the browser has built the DOM object tree?

mwill · on Dec 19, 2016

I think it may just be because in both cases it opens in a new tab with the view-source modifier, it treats it like you just opened a copy of the page in a new tab?

I believe if you open the inspector instead it does not issue a new request.

koolba · on Dec 19, 2016

> I think it may just be because in both cases it opens in a new tab with the view-source modifier, it treats it like you just opened a copy of the page in a new tab?

Interesting. I never payed attention to the resource prefix. Are those standardized at all? Both Firefox and Chrome use the same "view-source:" prefix.

> I believe if you open the inspector instead it does not issue a new request.

The inspector shows the current DOM, not the original loaded HTML, which could be different. It's probably "good enough" as in most cases the original HTML would be the simpler one (say a single div for a single page app) and the live DOM would show what look like right now.

rohansingh · on Dec 19, 2016

It's not the prefix/scheme that's important, it's the fact that it's a new tab.

junke · on Dec 19, 2016

I seem to recall that it wasn't always the case.

INTPenis · on Dec 19, 2016

True and I was shocked that the referenced mozilla bug[1] is from 2006. Time really flies because it feels like yesterday I could view source without re-loading it.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=307089

samet · on Dec 19, 2016

Sweet memories:

http://imgur.com/a/p02cQ

samet · on Dec 19, 2016

Second that. That wasn't the default behaviour at once.

schoen · on Dec 19, 2016

Perhaps pre-DOM, per other comments here?

karmakaze · on Dec 21, 2016

I once wondered this, but didn't think much of it. Now seeing the question, it seems quite clear. The DOM+CSS+javascript is pretty much like a computer running a (self-modifying) program. If it didn't do a fresh fetch, then what you'd be seeing is the contents of 'memory' after the program has been running some time. This is useful in itself, so we have Inspect. If you want to see the initial state, the program before execution begins, then you want to View its source(s).

jutaz · on Dec 19, 2016

I'm not exactly sure why browsers do that, but I suspect that's due to caching.

However, I would suggest that you use "Inspect element" and open up dev tools - this way you will see DOM exactly as it is rendered.

aplummer · on Dec 19, 2016

Often the dom has been changed, and you want to know what the server actually sent

jbm · on Dec 19, 2016

You can still do that with the inspect menu; if you look at the network history in Chrome, you can see the headers sent, the headers received and the body sent / received.

koolba · on Dec 19, 2016

> You can still do that with the inspect menu; if you look at the network history in Chrome, you can see the headers sent, the headers received and the body sent / received.

That only works if you have it open before the page loaded. It doesn't save information about network requests that occurred before the inspector was opened.

noir_lord · on Dec 19, 2016

It would be handy if it did though since I continually forget to open inspector before the request.

I'd love a "I'm a developer, store all the things!" setting

radarsat1 · on Dec 19, 2016

> I suspect that's due to caching.

Shouldn't caching have the opposite effect? i.e. it's treated as a new resource but no new http request is needed because it's in the cache..

anonymfus · on Dec 19, 2016

In Opera up to 12.x it does not.

bbarn · on Dec 19, 2016

I suppose in today's web ecosystem, the initial HTML state of the page is useless in almost all cases after the DOM loads. Rarely will you care about the pre-javascript page, and if you do, you're probably debugging a web site, and the extra load shouldn't matter.

Where it could get interesting though, is if the content changes before you view the source. In that case, you're out of luck I guess.

seanwilson · on Dec 19, 2016

I wouldn't think view-source is a highly used feature that benefits from optimising much and your scenario sounds rare. If a "view-source:..." URL is entered into the browser directly you would need to write code to grab the page from scratch anyway so doing this always makes the logic simpler.

msie · on Dec 19, 2016

Yeah, that always annoyed me. So far, i dont see any answer in this page. Really curious, tho.

f0code · on Dec 19, 2016

The html returned from the already performed HTTP action has been consumed, rendering the view in the browser. If you want to view the raw HTML instead, 'View Source' requests another copy of it, but does not render it.

WallWextra · on Dec 19, 2016

Same with save page, etc.

Bugseverywhere · on Dec 19, 2016

Bug? Wow