Googlebot's recent improvements might revolutionize web development

jacquesm · on May 27, 2014

One potential problem here is that google will use this to widen the gap between it and the 'one page apps' web and other search engines (such as duckduckgo) that can't match it in resources.

How strong of an advantage that will be in the long run is uncertain, I would rather see a web that ships pages with actual content in them than empty containers for a variety of reasons (most of which have to do with accessibility and the fact that not all clients are browsers or even capable of running javascript).

This 'new web' is going off in a direction that is harmful, coupled with the mobile app walled gardens it is turning back the clock in a hurry.

I'm fairly sure this is not the web that Tim Berners-Lee envisioned.

epistasis · on May 27, 2014

I agree 100%, and see other problems for users as well. The worst web pages I encounter are the javascript-constructed DOM pages. And by far the worst offender is the odious Google Web Toolkit.

It may be that I encounter some of these "modern" pages without knowing it because some dev has put in the time to make it work well, it seems that the vast majority are absolutely terrible. It's a small decrease in developer effort for a huge decrease in user satisfaction. I remember the terrible terrible #! days of Twitter with sadness.

Some developers have a tendency to go for the internally sophisticated/beautiful in preference to the best experience for the user. I hope that blog posts like this one don't let loose these developers' worst tendencies.

qyv · on May 27, 2014

I don't see this as a problem at all, here is why:

Google doesn't really have competitors in search, they just don't. I mean look at how we even define 'searching the internet' in our spoken language: 'google it'. Google has become the identity of search on the web in most people's minds. And to top it off, they are really, really good at it. Google getting better is going to widen the gap between them and everyone else, but the gap is already pretty damn wide. Was there really any chance of someone catching them in any foreseeable future?

But, the wider that gap gets the more motivation there becomes to not attack the gap, but to go in a different direction altogether. Nobody talks about duckduckgo because they're search results are better than google's, they talk about them because duckduckgo is all about your privacy. They found a different way to make a search that people want to use. The wider that gap gets the more motivated some will be to try something truly novel to compete with google.

tragic · on May 28, 2014

> Nobody talks about duckduckgo because they're search results are better than google's, they talk about them because duckduckgo is all about your privacy.

'Nobody' is such a strong word.

http://devblog.avdi.org/2014/02/16/why-duckduckgo-is-better-...

MichaelGG · on May 27, 2014

>I'm fairly sure this is not the web that Tim Berners-Lee envisioned.

So? None of these are convincing arguments for application developers. Having to rewrite an application to perform all its UI logic on the server side in addition to client side is a lot of work, for almost no benefit to the people paying to make the application.

As a user, so long URLs work so I can send locations to other people, then most of my accessibility scenarios are solved.

The rest is simply a lack of technology in other clients.

sergiosgc · on May 28, 2014

The problem is not about some God like commandment. It is about the original design of the Web, which we all believe was the reason for its success.

When you receive a GET request for an URL, and the browser tells you it accepts text/html, it is expected that you answer with the content stored at that URL in the format requested. It is not expected that you answer with an application that when run will eventually produce the content.

The correct way to do what this post is saying is to create a new mime type for this content delivery method. Then, if the browser actively tells you it accepts that content type, deliver it.

What the OP proposes is not text/html. It's something else.

marknutter · on May 27, 2014

I'm fairly sure the web has not been the web that Tim Berners-Lee envisioned for a long time now.

wutbrodo · on May 27, 2014

.... and that's a very good thing per se. The idea that the development of something as important and universal to the web should be limited to what one man (no matter how visionary) was able to envision a couple of decades ago is beyond bizarre.

iLoch · on May 27, 2014

It's not difficult to set up middleware that'll render the page for any clients that require it. (For instance, we can assume any client that identifies as "bot" that's not Google probably wants a pre-rendered page, which we can do quite effortlessly. Here's one implementation for Nodejs: https://prerender.io, or you can always roll your own with something like Phantom.js.

Touche · on May 27, 2014

Note that sending a different response to googlebot than what you send to normal users is a violation of Google's guidelines and can get your site penalized. Use at your own peril.

CMCDragonkai · on May 28, 2014

Not necessarily. This concept of HTML snapshots is actually suggested by Google as a solution.

See this: https://developers.google.com/webmasters/ajax-crawling/docs/...

We comply with Google's guidelines thing (https://snapsearch.io/documentation)

Touche · on May 28, 2014

No where in that article does it say it's ok to only serve the snapshot to googlebot. Serving different content to googlebot than what you serve to users is called cloaking and is against their guidelines: https://support.google.com/webmasters/answer/66355

I've invested a significant amount of time in this topic and would love if you were right, but I've never seen the money quote that it's ok to do this. In fact, everything I've read says that you have to treat search bots the same as you treat normal users.

CMCDragonkai · on May 28, 2014

The title of the article says "How do I create an HTML snapshot?"

The FAQ (https://developers.google.com/webmasters/ajax-crawling/docs/...) goes into more detail regarding the concept of _escaped_fragment_ which is used in so that your server can respond with a static snapshot instead of javascript.

Look at the diagram at the bottom of this page (https://developers.google.com/webmasters/ajax-crawling/docs/...)

It explicitly says "snapshots"

BorisMelnik · on May 27, 2014

wow this is amazing. would love to see an offshoot of this where it could render a sitemap, or even keep a live sitemap up to date via cron.d or something (just hoping out loud)

gildas · on May 29, 2014

You can use SEO4Ajax [1] to crawl your SPA and generate an up to date sitemap dynamically.

[1] http://www.seo4ajax.com/

MichaelGG · on May 27, 2014

We're using Brombone. You give them your sitemap, they do the prerendering and save it all. Then you proxy to them for Googlebot and others.

CMCDragonkai · on May 28, 2014

Dynamic constructions of sitemaps is surprisingly difficult. We would need to poll every single page you have, just to check if you potentially have a new link to a new page in your site. And everytime you add a new page, that's a whole another page to scrape and analyse.

Flenser · on May 28, 2014

At the head of the index it probably won't make much difference, big sites will render the initial page on the server, it's the long tail where it will be client only. It may make scaling a website easier though as frameworks will start making adding or moving rendering from client to server trivial. So the path from the long tail to the head will be easier to navigate. (if you'll excuse me mixing my metaphors)

frik · on May 28, 2014

duckduckgo is mainly a meta search engine (relies on search API of Yahoo that relies itself on Bing, Yandex, etc.). Plus it shows some related snippets from Wikipedia and other data-sources.

Several well known web search engines are now defunct or switched to meta-search business (like Yahoo with Bing data).

There are only a few international/world-wide search engines with a crawler:

Google, Bing, Yandex, Baidu, Gigablast, (Archive.org/Wayback Machine)

SixSigma · on May 27, 2014

Berners-Lee envisioned a decentralised peer-to-peer information sharing network where everyone was a server and a client.

theepauk · on May 27, 2014

One page apps that aren't crawlable don't want to be crawled, don't make the necessary work, or are simply incompetent. Making an ajax site crawlable isn't exactly rocket science.

The gap this will really widen is the one between sites that do the necessary work themselves, and those who don't.

workhere-io · on May 27, 2014

One potential problem here is that google will use this to widen the gap between it and the 'one page apps' web and other search engines (such as duckduckgo) that can't match it in resources.

There are free and open source tools available that would help search engines parse pages containing JS (PhantomJS comes to mind).

jefftk · on May 28, 2014

It's not just tools, it's the cost of all that parsing and executing in a mock browser environment.

andrenotgiant · on May 27, 2014

Taking a step back: The "Page" paradigm is still very much alive, despite these recent javascript parsing advances.

1. Google still needs a URL-addressable "PAGE" to which it can send Users.

2. This "PAGE" needs to be find-able via LINKS (javascript or HTML) and it needs to exist within a sensible hierarchy of a SITE.

3. This "PAGE" needs to have unique and significant content visible immediately to the user, and on a single topic, and it needs to be sufficiently different from other pages on the site so as not to be discarded as duplicate content.

mixonic · on May 27, 2014

I'd debate the phrase "step back". If you replace all your references to PAGE with URL, you get closer to a real meaning.

URLs for single-page applications are a serialization of application state. The fact that we now have an application platform (JavaScript/HTTP) providing sharable, mostly-human-readable state sharing (URLs) and is also indexed and searchable is nothing short of incredible.

Yes, the basic abstractions we use are the same. We will have URLs that address content in our applications. But now these are applications running on Google's own servers. Google is running my application (and hundreds of thousands more), and trying to understand what they mean to humans. This is a pretty amazing step forward.

Imagine Apple announcing it would run all iOS applications, interacting like a user to build a search index. IMO, this parallel shows what makes Google's commitment to running JavaScript apps exciting.

andrenotgiant · on May 27, 2014

The point I was trying to make is this:

With every new capability from Googlebot comes new opportunities for us to screw it up as developers.

If we were to replace PAGE with URL, and URL is simply a serialization of application STATE, we could easily end up with infinite URLs that lead to STATES that are not really that different, unique or appealing as answers to queries users type into Google.

When deciding how to build Search-accessible Web Apps, and specifically what to expose to Google, we need to keep in mind that Google likes PAGES that follow the requirements I detailed above.

Flenser · on May 27, 2014

> these are applications running on Google's own servers. Google is running my application (and hundreds of thousands more)

Which is also very beneficial for Google as they'll likely be the only company doing that for a while, and the one able to do it for the most sites for a long time to come, maintaining Google's search index lead.

workhere-io · on May 27, 2014

As I mentioned in the post, all these problems can be solved by using real paths/URLs and changing them dynamically using pushState.

prophead · on May 27, 2014

But the onus is still on the developer to choose _what_ gets a unique URL and what does not.

It might be good for user deeplinking capabilities to change the URL every time any type of state change is made (for example sorting a list by date instead of name) - But exposing that many URLs to Google would be bad.

(This is the modern equivalent of the age-old "infinite calendar" problem that Googlebot had to deal with when dynamic calendar apps let you navigate to dates 2 millennia in the future.)

workhere-io · on May 27, 2014

I agree with you; developers definitely have to think about the URLs they're exposing to Googlebot. But this is essentially no different from how things were before. Your example with sorting a list by date instead of name would be done with a query string (which Google does index to a point), e.g. /users?sortBy=date=&from=392. This can obviously create quite a lot of links to the same content, and developers should know how to handle this situation. Again, not different from before - single page apps don't change anything in this regard.

blauwbilgorgel · on May 28, 2014

Create web applications, rank as a web application.

Create web pages, rank as a web page.

This is a band-aid by Google. Developers created inaccessible websites (JS-only, no HTML fallback) and Google still wanted to give those sites a chance to be in the index. Like when Google made it possible to index text inside .swf movies. This did not mean that flash sites suddenly ranked alongside accessible websites. No, it only meant that you could now find content with a very targeted search query.

Don't think you are gaining any SEO-benefit from one-page JS-only applications, just because Google made it possible for you to start ranking.

And don't forget your responsibility as a web developer to create accessible content. Forgetting progressive enhancement, fallbacks, a noscript explanation for why you need JS, ARIA is devolution. If Google can index your site, but a blind user has a problem with your bouncy Ajax widget, then you failed catering to all your users. If you lazily let Google repair your mistakes, then soon you will be a Google-only website.

mixonic · on May 28, 2014

100% FUD.

There is no evidence that Google is going to punish my website for being rendered with JavaScript, as you imply with your first two comments.

Google is indexing the HTML generated by JavaScript, and the links in that HTML. Not some non-web custom format like SWF.

JavaScript driven sites work just fine with modern screen-readers. https://developer.mozilla.org/en-US/docs/Web/Accessibility/A... and in 2014 97.6% of screen-readers ran JavaScript http://webaim.org/projects/screenreadersurvey5/#javascript

In 2013, 92 out of 93 visitors to a UK government webpage supported JavaScript: https://gds.blog.gov.uk/2013/10/21/how-many-people-are-missi... And mixed into that 1.1% were users getting broken JS, behind firewalls, disabling JS, etc.

Google making this change does not force you to build a JavaScript-driven website, but it does make it more attractive .

blauwbilgorgel · on May 28, 2014

If I wanted to imply that Google will punish your website for being rendered with JavaScript, I probably would have said so. It would likely be false too, as it is less of a punishment, than it is not maximizing your chance to rank (to put your best foot forward as a website).

Accessibility is not a numbers game. In many countries it is a legal requirement. And adhering to the WCAG means providing non-JS fallbacks or progressive enhancement. RMS not being able to access your content is an accessibility issue too, it does not have to involve a disability. It can be technical in nature, like disabling JS or being behind a corporate firewall, or your browser not supporting pushstate.

If you want to look at stats, take a look at the stats and surveys on accessibility of dynamic web applications. Just because your screenreader supports JavaScript does not mean you have no accessibility issues due to JavaScript. Rich internet applications should use WAI-ARIA. I don't think people who create websites without a fallback (avoiding this issue entirely), will worry about creating websites with ARIA-support. And if they do care about such accessibility, they should also provide a non-ARIA non-JS fallback.

Google making this change makes it possible to have your non-fallback JS-only application be indexed. It does not make it more attractive from an SEO or accessibility viewpoint.

mixonic · on May 28, 2014

Web accessibility, as we commonly use the term, pertains to creating a website that disabled users can interact with an navigate. It does not pertain to those who choose to or are forced to disable JavaScript (the RMS example). Creating an accessible site is a challenge regardless of what technologies you pick, for sure. Just as saying "just because your screenreader supports JavaScript does not mean you have no accessibility issues", just because your website uses JavaScript doesn't mean you have accessibility issues. A plain HTML website can have accessibility issues. So can a JavaScript one.

AFAIK, nothing in WCAG says you must have a non-JavaScript fallback to adhere to their standard. If you can back that up I am all ears, I would be interested to read it.

> Google making this change makes it possible to have your non-fallback JS-only application be indexed. It does not make it more attractive from an SEO or accessibility viewpoint.

The attractiveness of JS heavy development is not in an inherent SEO or accessibility benefit. Absolutely true.

The benefit is a development style that is more productive, giving me more time as a developer to focus on solving the problem at hand, be it business logic, SEO, or accessibility. You can debate this benefit, but don't imply that single-page apps cannot have SEO on par with HTML sites and good accessibility.

blauwbilgorgel · on May 29, 2014

>Web accessibility, as we commonly use the term, pertains to creating a website that disabled users can interact with an navigate. It does not pertain to those who choose to or are forced to disable JavaScript...

Often web accessibility focuses on people with a disability, correct. Accessibility, like I said, really is more than that, though. From the Wiki: Accessibility is the degree to which a product, device, service, or environment is available to as many people as possible. Hence it does pertain to those who choose to or are forced to disable JavaScript. It literally means _as many people as possible_, RMS included. Even the WCAG do not solely focus on assistive technologies, but include "a wide variety of user agents".

The comment "just because your screenreader supports JavaScript does not mean you have no accessibility issues" was in reply to your statistics on JS-support for screenreaders. 98% of screenreaders supporting JavaScript is moot when less than 75% of browsers support pushState. In other words: You leave much more than 2% of users incapable of accessing your content. WebAim Surveys show that people have increasingly more trouble accessing content on JS-heavy social sites and dynamic web applications.

> A plain HTML website can have accessibility issues. So can a JavaScript one.

A JavaScript site can have a problem. If you serve it without a fallback (under the assumption that 98% of your users can access it that way) then it has a problem for sure. I have nothing against JavaScript. I have a problem with JavaScript sites that don't provide a fallback or weren't build according to progressive enhancement principles.

>AFAIK, nothing in WCAG says you must have a non-JavaScript fallback to adhere to their standard.

It said so specifically in WCAG 1. WCAG 2 is more ambiguous. You can have a no-fallback application that requires JavaScript provided: You can not show the content in any other way (a fallback is impossible), and you clearly explain in <noscript> why JavaScript is required.

Where a fallback IS possible, not providing one lowers accessibility. This is the relevant principle:

Principle 4: Robust - Content must be robust enough that it can be interpreted reliably by a wide variety of user agents, including assistive technologies.

If you do not provide a fall-back and require JavaScript then your content can not be interpreted reliably by a wide variety of user agents. Not providing a fall-back goes against this principle.

Relevant guideline:

Guideline 4.1 Compatible: Maximize compatibility with current and future user agents, including assistive technologies.

JS-only non-fallback sites do not maximize compatibility, they minimize it, breaking this guideline.

Government 508 guidelines for accessibility:

When possible, the functionality should be available without requiring JavaScript. When this is not possible, the functionality should fail gracefully (i.e., inform the user that JavaScript is required).

Webaccessibility.com best practices:

Ensure elements that use ARIA provide non-ARIA fallback accessible content.

Since you should markup your rich web apps with ARIA, and you should provide a non-ARIA accessible fallback, you should provide an accessible fallback for your rich web app.

I do know that this can be a point of debate, and that is fine. It is up for interpretation what "maximize compatibility" means to you. If you have legal obligations to maximize compatibility (like government organizations in The Netherlands) then this becomes a harder rule.

> The benefit is a development style that is more productive, giving me more time as a developer to focus on solving the problem at hand, be it business logic, SEO, or accessibility.

I really don't understand this way of thinking. If you want to spend time on accessibility, start with a fallback, don't create a website without a fallback and then cheer on the idea that now you have time left to fix the problem you created a few minutes before that...

If you want to solve problems with SEO, don't start out by creating one :D

> but don't imply that single-page apps cannot have SEO on par with HTML sites and good accessibility.

Good accessibility means good SEO. No fallback means poor accessibility. Draw your own conclusions (Socrates is mortal?).

workhere-io · on May 28, 2014

Don't think you are gaining any SEO-benefit from one-page JS-only applications, just because Google made it possible for you to start ranking.

No one is expecting to get any SEO benefits that "normal" pages don't have. We are expecting to get the same chance of ranking as normal pages.

You mentioned that single page apps might rank differently or worse than normal pages. Do you have any source for that? (A source that is current, since Googlebot's improvements are quite new).

blauwbilgorgel · on May 28, 2014

>We are expecting to get the same chance of ranking as normal pages.

Then you should probably adjust this expectation. You say in your article:

>While having this sort of HTML fallback was technically possible, it added a lot of extra work to public-facing single page apps, to the point where many developers dropped the idea...

A JS-driven site with an HTML fallback is a normal page. Then you don't need any tricks or force Google to run your application and hopefully make pages out of them. Start with the fall-back and enhance.

This is a serious mistake with consequences. Tor bundle and Firefox shipped with JavaScript support, because disabling JS broke too much of the current web. It causes accessibility issues (remember when Twitter changed to hash-bang URL's?), if not for Googlebot, then for regular users (From the Webmaster Guidelines):

>Following these guidelines will help Google find, index, and rank your site.

>Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.

>Make pages primarily for users, not for search engines.

I am still going on the assumption that you created a one-page application without a time-consuming fallback, and you rely on Google to make rankable pages from them. Then you leave some users standing in the cold, so why deserve to rank equal to a user-friendly accessible web page?

> ... single page apps might rank differently or worse than normal pages ...

From the original article, the most current source on this:

> Sometimes things don't go perfectly during rendering, which may negatively impact search results for your site.

> It's always a good idea to have your site degrade gracefully. This will help users enjoy your content even if their browser doesn't have compatible JavaScript implementations. It will also help visitors with JavaScript disabled or off, as well as search engines that can't execute JavaScript yet.

> Sometimes the JavaScript may be too complex or arcane for us to execute, in which case we can’t render the page fully and accurately.

> Some JavaScript removes content from the page rather than adding, which prevents us from indexing the content.

In the SEO community Googlebot's improvements were noted for a while now. See for example: http://ipullrank.com/googlebot-is-chrome/

Single page websites or application-as-content-website's are not popular among SEO's. One reason for this is that it doesn't allow for fine-grained control on keyword targeting, and keeping the site canonical, and it can waste domain authority when you have less targeted pages in the index than you can rank for. Experiment and find out for yourself.

rcsorensen · on May 27, 2014

Sending the framework of a page to your users and expecting them to do all the heavy lifting and slow loading of constructing the page and fetching the data is still rather unfriendly if you can afford a server to construct it.

If you love your users, give them HTML and let the Javascript enhance it.

Projects like Facebook's React ( http://facebook.github.io/react/docs/top-level-api.html#reac... ) and Rendr (https://github.com/rendrjs/rendr/) let you use server rendering as well as the single page technologies on the client side.

workhere-io · on May 27, 2014

Clientside rendering doesn't need to be heavy at all. In fact, you could do it with just $.getJSON('/api/users', function(data) { $('#users').text(data.content) }.

Sure, that requires jQuery, but most "normal" sites require jQuery, too.

rcsorensen · on May 27, 2014

Sure!

And now your poor little mobile user has to wait for the page to download, then to execute javascript, then to wait for the API response, then wait for the DOM to update.

Or you could put it in the HTML, and then they just have to wait for the page to download.

workhere-io · on May 27, 2014

For each new page on your site that your user loads, the benefit of single page apps becomes greater: Now she only has to load a bit of JSON, not a full page. So essentially single page apps are a tiny bit slower on the first page load, but much quicker on subsequent page loads.

rcsorensen · on May 27, 2014

I'm suggesting the hybrid approach here, exemplified by the Rendr and React.

Give the full html of the first page, load the JS necessary. You get the best of both worlds.

On the first request, the client receives a rendered html page, gets to read the page, then JS functionality is attached.

On any subsequent pages, you do a simple JSON request and update the DOM.

The approaches come together without sacrificing anything. You get to write a single code base using JS (with a node.js backend doing string manipulation so it doesn't even need a DOM), the user gets a page without having to wait for external resource loads, and all page transitions past that are a simple data-fetch away.

marknutter · on May 27, 2014

The lifting ain't that heavy. And besides, wouldn't you want your server spending its precious cycles on things the clients absolutely cannot handle?

rcsorensen · on May 27, 2014

Like `ssorallen says, slow CPUs and limited battery life is an issue. In addition, latency on mobile networks is rough and should be avoided for your users whenever possible.

You have two choices when taking an SPA approach:

1) Give your users a webpage that is usable, readable, and navigable as soon as they get it.

2) Give your users the skeleton of a webpage that they then have to accept the latency of javascript parsing and execution, and then the latency of data fetches.

we have the ability to do #1.

rgbrgb · on May 28, 2014

SPAs reduce the latency of data fetches because each new resource is a few bytes of JSON rather than some kb of HTML.

rcsorensen · on May 28, 2014

Of subsequent data fetches. Not the first page load. Using JSON fetches to update documents also gives you the ability to not throw away the DOM/CSS and maintain the execution context.

If you use the right technology, you can give a usable application out of the gate without waiting and use JSON data fetches for every subsequent request.

ssorallen · on May 27, 2014

The lifting can be heavy for mobile devices with slow CPUs and limited battery life unlike the servers running your site. Also if your server renders the site and some of the pages are public, the server can cache the HTML and let the web server serve cached HTML rather than render the page in the web framework for each request.

acdha · on May 27, 2014

> The lifting can be heavy for mobile devices with slow CPUs and limited battery life unlike the servers running your site.

The reverse is also frequently true: if client rendering can improve cacheability or reduce the data going over the wire the radio savings will pay for a LOT of text updates. Similarly, if you can transfer a large listing without creating thousands of DOM nodes the results can be a wash depending on exactly how much data, the browser, etc.

There isn't a single right answer here - it really depends on the application and good analysis.

marknutter · on May 27, 2014

It doesn't have to be heavy, even for mobile devices. Decrease your dependency on large libraries, minify and cache everything, and only load what they need. Once they load it the first time and cache it then all they ever need is a bit of JSON here and there.

tragic · on May 28, 2014

Like many, I am suspicious of the rather overbearing claims made on behalf of the SPA architecture.

I just launched a website. It's a weekly periodical with political analysis, word-count on articles 1500-6000. It needs to carve up the content in a few different ways (categories, issue numbers etc), decorate an article with links to other relevant content, and provide a nice CMS for non-tech people to use. So it's on Django, with the regulation sprinkling of JQuery. (If it were only techies updating it, you could probably do it with a static site generator...)

To me, the idea that you'd try and force something that is plainly a big collection of pages into a 'single page' is just philosophically bizarre, like printing Moby Dick on a square mile of paper, using some amazing origami skills to present it to the reader, all in order to save a bit of effort at the paper mill.

The googlebot business is one aspect of a bigger issue, which is that a website needs to be consumable by a host of different clients. I don't see how you can do the SPA thing without making major assumptions about those clients.

Sometimes, of course, those assumptions can be justified - it depends on the job. And Angular etc are enormously fun to play with, and handled well can enable a great UX for certain jobs. But I don't think it's 'the future'. It's another tool in the box.

Relevant here, a nice talk by John Alsopp from Full Frontal 2012:

https://www.youtube.com/watch?v=KTqIGKmCqd0

EDIT: clarification

xpose2000 · on May 27, 2014

I'm not sure if this announcement changes anything. The bottom line is to make apps for the end user. Google is simply saying that those best practices are now crawlable in a way that is very mature. The same rules still apply.

A simple guide can be found here: https://developers.google.com/webmasters/ajax-crawling/. Although I suspect it needs to be updated since its from 2012.

If you create an application, make sure it alters the URL when applicable. For simple apps, the following repos will be useful:

The old way, that still works: https://github.com/asual/jquery-address

The better way, preferred: https://github.com/browserstate/history.js or https://github.com/defunkt/jquery-pjax.... not sure which is better to be honest. Feel free to chime in.

rpedela · on May 27, 2014

fixed link: https://github.com/defunkt/jquery-pjax

ssorallen · on May 27, 2014

> Single page apps are not a new concept, but up until now they were typically a bad solution for public websites that depend on hits from search engines

If your users (I'm talking humans, not bots) have to download a mountain of JavaScript and execute it before seeing any content, your site is slower than it could be for everyone. We should stop saying that "single page apps", i.e. sites rendered in JavaScript in a browser, are bad because they can't be scraped by a bot. They are bad for EVERYONE who wants to view the site because of the network and CPU time it takes to download the assets and render the site in the browser.

scotth · on May 27, 2014

Doesn't it all depend on how long they spend on the site? If it's one page and bounce, sure, that's a terrible experience. More pages than that? Now we're starting to see savings.

ssorallen · on May 28, 2014

If you serve a rendered page of HTML with CSS links, browsers can progressively render the page as it is downloaded. Users will notice that on the first page load, particularly on higher latency connections where round trips for resources like JavaScript files are expensive.

scotth · on May 28, 2014

My point still stands. Time to load is only part of the equation.

acqq · on May 27, 2014

So you want to have the URL for every content but you don't want to provide the content as HTML, instead expecting that once the page is by client, the client only then separately loads the content? Only because it's easier to you to program, you want to deliver me the content much slower than you can?

There is some strange logic there.

marknutter · on May 27, 2014

Slower in some respects and faster in others. Suppose I cache the data I get back from the server for page 1 and page 2 of content. Now, if the user switches between page 1 and page 2 they don't needlessly ask the server for the HTML every time like they do when relying on the server to render templates.

And I'm not sure where you're getting that it's "easier to program" single page apps than it is to simply rely on the server to render html on the server. The fact that it's not easy is the very reason we have so many competing front-end frameworks to solve the problem elegantly.

workhere-io · on May 27, 2014

The whole point is to make the experience better for the user: When going to each new page on the site only involves fetching a bit of JSON and not an entire HTML page including header, footer, JS, CSS, etc., that makes the user experience faster. Add to that the fact that since the front page HTML now no longer contains any dynamic elements (you would get those dynamic elements via JS), you can put your front page HTML (the one HTML page there is) on a CDN and get faster load times.

As for speed: Sure there are exceptions when developers go crazy with tons of heavy JS, but that doesn't need to be the case at all.

Touche · on May 27, 2014

Only the first load is slower. If you click on any links within the page they will render much faster as you skip the whole full page load thing.

grey-area · on May 27, 2014

With the new improvements to Googlebot, single page apps will likely advance from being niche solutions for non-public websites to being the default way to build websites. A website will contain a single HTML page (typically heavily cached and served via a CDN). The JS on that page will then fetch content (as JSON) from the server and change the path as necessary using pushState.

I find the cheerleading for single page websites disconcerting and the proposed benefits unconvincing. Why should this be the default way to build websites? A few desultory upsides are presented without a full consideration of the multiple downsides to client-side development.

The biggest advantage of thick-client architecture is sending less data to the client and, if you like using javascript, writing everything in js, but there are multiple downsides compared to more traditional thin-client websites - load times which depend more on client capabilities (hugely variable and out of your control) than servers, dependence on js on the client, loading pages while your content is placed in the dom by js, forcing everyone to write in js instead of switching language on the server whenever they like, ignoring the simple document model of html served at predictable URIs, which has served the web so well and means you can use dynamic or static documents, full documents can be cached for very quick serving and by intermediaries, etc, etc. Of course some of these can be overcome, but there are serious obstacles, and the advantages are meagre to non-existent unless you enjoy javascript and feel its the only language you'll ever need.

For someone who doesn't like working in js, and/or doesn't have a huge amount of logic already in js (many websites work just fine with some limited ajax), trying to force every website into the procrustean bed of client-side development is not an appealing prospect. I can see why it appeals to those who have already invested in js frameworks, but predictions of its future dominance on the web, like predictions that eworld, activex or mobile would replace the web, are overblown.

I suspect the birth and death of Javascript will be a footnote in the history of the web, rather than taking it over as this article suggests. If anything we should be looking to replace our dependence on js, not making it mandatory.

needs · on May 27, 2014

I can't believe that single page webapp are easier to write than true old website. Maybe you can gain some performance improvement but if you use a framework you will loose this very little gain. Those who claims that developpement is easier with framework on a single page have too learn programming, because for most case, the "old" way works very well and is incredibly faster than a bloated javascript page.

I really, really dislike this approache of doing website, it make the code and the design hard to understand and cost a lot of problem when comes the time to debug or made major changes.

There are no magic when using javascript, it will slow down the client, and manipulating DOM is very slow. Doing things server side cost nothing compared to javascript. Remember that loading a webpage is very fast when you have only a CSS and html code, because it very easy to put it in cache and doing some pretty nice optimizations on it.

With frameworks, making "webapp" become a huge nigthmare, things becomes overly complex and bloated, the request as to pass trough a lot of layers before end up somewhere, and the developpement is not faster than have a custom code. Good framework don't make good programmer.

When the article say "put the CSS inside a <style> tag on the page - and the JS inside a <script> tag", it's just horrible, fuck it.

workhere-io · on May 27, 2014

Those who claims that developpement is easier with framework on a single page have too learn programming, because for most case, the "old" way works very well and is incredibly faster than a bloated javascript page.

Who says the page will become bloated with JS just because you use clientside loading? The mechanism I'm talking about can be done with something like 10 lines of JS or less. No one's saying you have to use AngularJS with every web page you make.

When the article say "put the CSS inside a <style> tag on the page - and the JS inside a <script> tag", it's just horrible, fuck it.

First of all, this is not a requirement of single page apps, just an option. Secondly, when you're developing, you would still have your JS and CSS in separate files. Your compiler would then minify the whole thing and put it inside your minified HTML file.

CMCDragonkai · on May 28, 2014

Not every single search engine will be able to scale a JS virtual machine for all the pages they need to index. There are also social network bots that you might like supporting such as Facebook and Twitter which will not be able to crawl javascript either.

At any case, if you want to have a solution to this SEO problem now, I created SnapSearch (https://snapsearch.io)

bhartzer · on May 27, 2014

I wouldn't actually call this "recent" improvements. I mean, Google has been handling JavaScript for years now. And they're just now coming out and publicly saying it. Which is typical Google.

workhere-io · on May 27, 2014

What they were saying before was that you always need a HTML fallback for JS-generated content. Now it seems they're saying you don't necessarily need to.

drakaal · on May 28, 2014

Optimism rather than fact.

There is a lot more to it. I am pretty well known as an SEO, and while I would love this to be true it isn't.

Google's improvements to GoogleBot are mostly targeting spam, and obfuscation of content. The ideas is not to discover content as much as it is to avoid having content hidden.

Previously you could have a webpage that appeared to be about Puppies, but then used any number o dynamic methods to instead show naked cam girls. Google worked to fix this.

Google is now doing some indexing of named anchors, and this allows for linking to a page with in a page as it were. But that is a Long ways from building indexable single page applications.

-Brandon Wirtz SEO (formerly Greatest Living American) http://www.blackwaterops.com

basseq · on May 27, 2014

This is a great improvement, but I'm struck by two things:

1. The state of JavaScript-only application development is still nascent. The number of JS-only sites I see that are buggy, don't use PushState correctly, or have other shortcomings is growing faster than the overall trend. Not that it can't be done well, but if your JavaScript-only "app" is really just a standard website, you might want to re-think your approach.

2. There has to be a better way. There are distinct benefits to approaches in caching content and providing feedback, but JavaScript seems to be a kluge-y approach. It reminds me of frames back in the day. Some of this is browser support; some of this is lack of standardization; some is perhaps a missing piece of the HTML spec; etc.

workhere-io · on May 27, 2014

Author here. Some of you are saying that this will lead to bloated, JS-heavy websites. I disagree. The JS necessary for making a single page app can be done with something like 10 lines of JS (plus jQuery or something similar, but that is already included in most normal pages anyway).

A single page app isn't JS-heavy by definition, and a "normal" page (with HTML generated on the server) can easily be JS-heavy. It all depends on how you program it. Just keep in mind that single page apps don't necessarily need to use heavy frontend frameworks such as Knockout, Ember or AngularJS.

CHY872 · on May 27, 2014

This seems like an odd thing to be shouting about.

As far as I can see, throughout this thread the performance benefits of single page apps are touted as being fantastic, making it worthwhile to use the new technology etc.

When has performing operations efficiently ever been the domain of the web? Websites in my experience have the worst performance of almost any software I use! I've seen developers cite 200ms or longer to load a page as being a good benchmark - that seems pretty awful to me.

If getting this tiny performance improvement (which often results in poorer performance on the first load (not ideal for many)) is so critical, why do the same developers not invest in writing more performant server apps? Yes, often the database is a bottleneck, but these problems can in general be worked around (either by use of faster queries or caching etc).

Why attempt to get a small performance benefit by saving 30-odd kB of HTML on each page load (static and so essentially free for the server), when one could get a much larger performance benefit by optimising the backend?

Almost all serious sites will still see their page load being limited by the time it takes to produce the page. It's possible to write really fast websites (try http://forum.dlang.org/) but no one seems to do it :(

If anything but almost all of your website is static, you won't be saving all that much time.

workhere-io · on May 27, 2014

If anything but almost all of your website is static, you won't be saving all that much time.

Single page apps can easily be static (static HTML page + static JSON). The point of this would be to decrease the download size for each new page visited by the user.

CHY872 · on May 27, 2014

I think you missed my point. In each web page downloaded there's a bunch of (basically constant) static data - to download your javascript files, and set up your document - your template (or similar). This is the only data that single page apps can eliminate - everything else must either be queried from the server or can already be cached.

Some sites obviously inline CSS or JavaScript, but that can be eliminated if necessary (and only affects the first page load anyway).

This information is free to generate on the server side, so it's not slowing down that computation at all (it's just a stringbuilder function, essentially). Furthermore, the transfer time is generally not the deciding factor - it's the server side time to put the rest of the information together.

To give one example, I went to a typical website - the Guardian (it's a fairly standard high-traffic news website). Chrome informs me that in order to request one article, it took 160ms to load the html - 140ms of waiting and 20ms of downloading. Now, the RTT is about 14ms, so that's about 110ms of generating the web page and 20ms of actually downloading it. It's about 30kB of compressed HTML (150kB uncompressed), most of it's 'static content' - inlined CSS and JS.

Them using the single page model would reduce the page download time (apart from the first page) by an absolute maximum of 20ms - which means that the time to load each page has been reduced by about 12%.

This is fine, but almost all of the data is just the result of string concatenations and formatting - i.e. free processing (or at least almost-free processing). It's getting the rest of the data together that's somehow taking the 100ms (or crap implementations).

The cost of moving data around on websites is typically small compared to the actual production time of the content. That's why we see people preferring to inline huge amounts of CSS etc on each web page and having people download it time after time - because it's only about 10kB compressed the data transfer is inconsequential, and normally is dominated by the RTT.

Spending all the time writing these frameworks because of performance benefits is a fallacy - the data still has to be generated somewhere, and if it happens dynamically it's slow as hell. The savings can never become that great - at most they lead to 20-30ms of improvements if bandwidth is acceptable.

Writing the frameworks because they make development easier is a much more reasonable argument.

This still all detracts away from the fact that non-static websites are typically dog slow and they shouldn't be.

PinguTS · on May 28, 2014

The single page app has another _huge_ drawback.

The reload via JS fails silently, when you are on a bad Internet connection, like I am currently here in Nepal. You can not simply do a reload, like with a simple HTML page.

For example, Facebook is here unusable, because of the same issue. It works only, when you request the mobile site of Facebook in the browser.

So please, get rid of that damn JS, if you care about your user base and usability.

BTW: that problem also happens on bad hotel Wifi in the US.

adamconroy · on May 28, 2014

Somewhat off topic, but how do single page apps deal with people hacking the JS? For example, if as a particular user I am only allowed to perform certain functions within the app, and that functionality is contained in the JS, then it doesn't seem like it would be very hard to modify the JS to enable the functionality I shoudn't be allowed to use.

ailox · on May 28, 2014

Usually the functionality of the app exists in the backend, which would be server side. No matter what you do on the frontend, there should be no way for you to trigger actions in the backend you were not authorized to perform.

allendoerfer · on May 28, 2014

This news article seems to come up every few years. Nevertheless, the niche of apps, which can profit of this, is quite small.

Either you have a highly interaction-heavy web-app, where it makes sense to execute most of the code on the client and deliver the content as JSON or you have a content-heavy website, where it makes sense to deliver cached content to the client.

There are some apps in between, which are highly interactive and content heavy like web versions of social apps. For them additionally the question arises, if they want to be crawled by Google or Google wants to index their content.

To profit from this, you need an app, which content the users search for and interact with several times after they have found it. So i guess, "revolutionize" seems a bit much to me.

slashdotaccount · on May 28, 2014

Please use this instead: https://en.wikipedia.org/wiki/Progressive_enhancement

SquareWheel · on May 28, 2014

But we're talking about making AJAX pages robot-friendly, not making regular pages mobile-friendly.

kayoone · on May 27, 2014

Loading off most of the work to the client has its downsides too. If that was a reality i would assume that mobile devices would require quite a bit more battery power to render basic webpages.

h1karu · on May 28, 2014

google is not the only search engine.

workhere-io · on May 28, 2014

Which I emphasized in the post :)

justinph · on May 27, 2014

We're still using hypertext transfer protocol, right? You need to send hypertext down the wire.

We shouldn't let one company, google, dictate how the web works, simply because of their proprietary technological innovation.

o_____________o · on May 27, 2014

Roads are meant for horses, right? We shouldn't let one company, Ford, dictate how the roads work.

My man, if we only used infrastructure and technology in the way it was originally intended and narrowly imagined, the world would be a dim place.

justinph · on May 27, 2014

Ok, I see your point.

But, have Yahoo or Bing or DuckDuckGo made the transition to be able to crawl the web with a full JS & DOM rendering engine? I doubt it. By eschewing that compatibility we're setting a very high bar for what any competitor to google would have to achieve.

I like google. I just don't think it's good to have one company own a market so completely.

o_____________o · on May 27, 2014

I agree that diversity is important, but if Google is owning the market via innovation, they deserve to own the market.

In this case, they're closing an ever widening rift between human consumable content and that which is targeted for search engines. The data that comes down the pipe can still be semantic, it's just glued together differently.

Web technology is presently outpacing the ability of indexing services. The onus is on Google's competitors now; they need to catch up. Luckily js rendering on the server isn't an academic problem, it's one of resource allocation.

workhere-io · on May 27, 2014

But, have Yahoo or Bing or DuckDuckGo made the transition to be able to crawl the web with a full JS & DOM rendering engine?

They can just use PhantomJS (http://phantomjs.org/), which is free and open source.

voltagex_ · on May 28, 2014

They could, but I wonder what it'd take to scale it to crawling that number of pages.

I think only Bing would have the cash and resources to build that.

sigzero · on May 27, 2014

I don't really like google and I agree.

Yetanfou · on May 27, 2014

Google is not dictating anything so there is no need to vent your dislike, of, Google, in that way. If anything, this releases developers from a constraint on how sites are built.

That said, I'd rather see more server-side HTML instead of client-side JS when it comes to the web. If you're developing a game, client-side JS is fine. If you're serving textual content with the odd image, please serve it in the way the web was won: HTML. Use CSS if you feel the need for some 'style', but remember that perfection is reached when there is nothing left to take away, rather than nothing more to add.

fixermark · on May 27, 2014

If my product is well-formatted data, why should I build a server that needs to know how to vend that data in a machine-readable visual-formatting-agnostic format (such as JSON or XML) and also a targeted-for-human-consumption format such as HTML? It's a reasonable architectural decision to build one server that knows only how to vend JSON and an associated viewer (that happens to use HTML and JavaScript for presentation purposes) that knows how to consume and render that JSON.

pessimizer · on May 27, 2014

Because it pushes work onto the client, and the client has an indeterminate number of applications competing for the same indeterminate speed and size of CPU and RAM. You know exactly what your situation is on the server, and you can decouple data and presentation without putting them on different sides of the wire.

fixermark · on May 27, 2014

The counter-point is that (a) the number of applications competing for resources is semi-determinate, constrained by the user's patience and willingness to upgrade and (b) the number of users the server must support is similarly semi-determinant, constrained by the popularity of the service, size of data a user could want to manipulate, and the desired speed of those manipulations. There's an argument to be made for the economy of scale of pushing some of the presentation work to the side of the wire where there is a user who wants to consume the data; I wouldn't want to have to use, say, Google Spreadsheets as a server-side-only product.

Note that we can reduce this line of thinking to absurdity if we substitute JavaScript with, say, video; some clients are incapable of rendering video, some clients are constrained in the size of video they can render. Do we therefore never stream video down the wire, and require the client to grab the whole video element wholesale as a single request-response? Of course not; users don't want to wait that long. So developers make an educated guess on what their clients' user-agents can do and set a sane bar for size and scope of video content. A similar sane bar can be set for richness of client-side executable code interactions.

pessimizer · on May 27, 2014

I'm not saying that you shouldn't architect that way, I'm just giving reasons why you might not.

>semi-determinate, constrained by the user's patience and willingness to upgrade

The user can vaguely determine it, but you can't. You can't guarantee a user experience without relying on the user not to be viewing your page on a two year old cellphone that they're simultaneously listening to music on and running three other js-heavy pages.

>the number of users the server must support is similarly semi-determinant, constrained by the popularity of the service, size of data a user could want to manipulate, and the desired speed of those manipulations.

You can assign whatever arbitrary amount of resources you want to your site on the server side, and allow in whatever limited (or unlimited) number of clients/visitors that you want to, so it's completely deterministic. The amount or resources utilized (whether on the client or on the server) will likely not be very deterministic, but you can have algorithmic problems whether on a thick client, on a backend data server, or on a backend presentation server.

ksk · on May 27, 2014

Well said. Too many websites these days are JS heavy.

wutbrodo · on May 27, 2014

> We shouldn't let one company, google, dictate how the web works, simply because of their proprietary technological innovation.

What? This article is about how a constraint due to a single company (and to a lesser extent, other search engines) is being _relaxed_. Before this, single-page apps were a riskier proposition because of many sites' reliance on Google traffic. Now that we're moving closer to that technical limitation being overcome, this is one _less_ constraint "dictated" by Google that site owners have to deal with.

jfoutz · on May 27, 2014

Not to sound like a complete troll but, uh, how should i send this gif to the client?