Static sites served from the filesystem directly through any reasonable webserver are always going to be quick. 100k hits per month is nothing and a hackernews spike should not even be noticed unless you are serving up video or something large.
My site sits on a $5 DO droplet and I have never even come close to hitting any kind of limit despite the occasional traffic spike.
Dynamic content management systems are a convenience for the developer/maintainer of the site but push additional requirements onto the hosting CPU (and the client for javascript heavy sites). The cost of these requirements is often not clear but usually much larger than you might expect.
Yeah, serving static files is unbelievably cheap these days. Some of my clients have nginx serving 1B+ requests per month on a single box, and the CPU doesn't even break a sweat. nginx can easily saturate a gigabit link with very little effort.
Would be kind of interesting just to have a point of reference (even if it ends up being super beefy and overkill since you mentioned CPU load is low).
A run-of-the-mill quad-core Xeon E3 with some SATA SSDs. Even that is overkill as the CPU usage never goes above 1 core. You could probably get the same job done with a $20 droplet if you're frugal, or $40 with room to spare.
The point is that the hardware doesn’t matter. RAM is faster than Ethernet.
The bulk of the load will be negotiating the TLS sessions, and modern ciphers are both fast and hardware accelerated, and that’s assuming something upstream isn’t already taking care of the TLS.
Yes, althought in the case of wordpress, it's also because it's really badly architectured: no internal cache, so many requests for just one page, comment are loaded even when you don't read them, etc.
The first time I peeked at wp code I was flabbergasted at how cavalier they were with piles of inefficient queries.
At the time I was tasked with evaluating it as a "cheaper" replacement for an expensive, managed CMS. My conclusion was it would end up being more expensive to accommodate all of this inefficiency with caching galore not to mention support and dealing with a loss of dynamic web sites.
I looked again last year, assuming it had greatly improved, had replaced all that with GRPC or graphql and nope, still horrendously designed.
I wonder if it's intentional or just a case of support for backwards compatibly.
Optimising WordPress is a known and solved problem. I don't mean to undermine your comment but it is trivial for a web infrastructure ops guy to deliver on. There is a lot of fud out there on the topic so it's easy not to see the diamonds in the rough.
That's not the point. The point is it's not built it, because the architecture decisions were unsound.
Now, don't get me wrong, I use and like Wordpress. There is not more versatile yet approachable blog/CMS/whatever out there, the ecosystem is huge and it's my go to tool when I want a quick and dirty solution.
But it doesn't change the fact it has been badly designed from the start.
Popularity in software doesn't reward good design. It rewards solving problems.
It’s harder to scale it properly over time, because every new post takes longer to regenerate aggregated pages. Obviously this can be somewhat lightened with some hard choices (e.g. only have aggregated pages by fixed period), but it is a tradeoff that dynamic sites typically don’t have to worry about.
Anybody who was around during the first blogwave in the early 2000s knows the score: after a while, with static generators you end up waiting a lot after hitting “publish”. And god help you when you change the template over the whole site.
Of course, dynamic systems have their own tradeoffs, but they tend to scale better by volume content. You can then slap a simple cache in front and be done with it until views get in the hundred millions.
Just think of the immense amount of work a compiler is doing to compile a decently sized C++ or Rust binary.
If there was a need for it, static site generators could be scaled to tens or hundreds of thousands of pages and still finish in a few seconds. Caching partial results is always an option.
There are generators that are decently performant, like Hugo (Go) [1] or Zola (Rust) [2].
You are not wrong but it hardly matters for most blogs/product web sites.
For example, I generate my blog from markdown with a custom generator written in not-very-optomised Python. There are about 400 pages currently and it only takes about 2 seconds on a 6 year old MacBook to completely build the entire site.
Even if I write another 1000 blog posts each year, generating the site will still take less than a minute for the foreseeable future. Paying the cost of building the pages up front like this is still much better than having my users pay it by waiting 10s of seconds from the page to slowly be dredged out of a database.
There are sites that do have tens of thousands of pages, and perhaps a content management system is appropriate for them. Horses for courses and all that.
Can't you serve a static site purely from memory and avoid hitting the filesystem altogether? If the site is not huge then all the pages could easily fit in memory and served from a memory cache.
When you serve from the filesystem, your content is usually in the filesystem cache in practice, and then it's being served from memory.
Adding a memory cache on top of the filesystem cache is probably a waste of time for no gain (unless your workload has really specific pattern maybe…).
I think the most amazing thing that have enabled startups to grow is the existence of $5 VPS using hosts like Digital Ocean or any of the other ones out there. However, one of the things that seems to kills performance these days is the massive amount of code abstraction layers we put in place which unfortunately increases the overhead by a massive amount.
In our case, I started and successfully exited a crypto trading service (tradedash.io which was sold to Bittrex.com) that made billions of dollars in transactions using nothing more than 2x $5 VPS. Our entire tech stack costed us less than $75 dollars per month. The backend API handled massive amounts of trading data as you can imagine using those two servers and they never even got close to being at full utility - even when tens of thousands of orders were placed and cancelled at the same exact second. We had thousands upon thousands of users hammering our servers with requests every second of every day and the dropplets did just fine. At the end of the day, it all comes down to building out your infrastructure properly.
Clean your code base, use only what you need when you need it and make sure things are as fast and secure as possible. Dont over-complicate things for yourself and dont add abstraction layers just for the sake of it. That to me is the essence of good software and the main benefit of that is less overall headache as I dont have to go in and engineer a solution every time something breaks in some random library or abstraction layer. Scale will also be a fortunate side-effect of that as well.
PHP/MySQL and React/Redux. We also used Electron for the front-end desktop software. Electron certainly comes with a ton of baggage but upon evaluating all other options, we decided to go with it because it was the best fit for our team´s experience.
Other than that, there were actually very few tech pieces involved. I am an extreme minimalist by nature so I kept things as simple as possible in order to minimize the surface and complexity of the code base.
Overall, it took us 9 Months in total to build the application from a prototype phase to a full blown trading system. We did it so fast because we kept things simple and low in complexity.
I don't understand why this is on the front page of Hacker News. 100k monthly page views is 139 per hour. Even if you assume the peak rate is 10x the average, that's several seconds per page load. A standard install of WordPress running off s raspberry pi on your home cable connection could handle that quite easily.
Since the site in question is a personal blog site that isn't updated daily, the 100k page views likely aren't uniformly distributed across the entire month, but occur in a single spike at a certain time, e.g. when a link from it goes to the HN front page.
In that scenario, the majority of hits will happen in the <24-hour period when the link is on Page 1.
The rest of the month could see daily traffic in the hundreds, and it might still amount to 100k+ if added to the traffic obtained during the viral spike .
It really depends on your target market. If it’s a consumer service in a couple of times zones, you’ll get the bulk of your traffic in the evenings.
E.g. my personal finance traffic would be mostly on weekday evenings. These were usually 9-5 working people, so not a ton of traffic overnight, holidays. Weekdays had more traffic than weekends.
I switched to static some years ago because the VPS running Drupal kept crashing with 100k visitors per month.
I keep telling people 'you probably don't need a massive server if you're just running static pages', again this seems confirmed (for some use cases).
A few questions:
1. Do you have any protection against DDoS? I currently implement my own custom solution, people seem to be transitioning towards Cloudflare but I see such a move as dangerous. What's your mitigation and/or opinion?
2. You mention the ease of spinning up a new server - I personally just run a bash script for this. How do you automate the transference of your URL over to a new IP?
3. Have you ever experienced slow down and if so, how did this affect your site?
4. What are the most resource intensive aspects of your website? (I.e. What is using your bandwidth/CPU/RAM?)
Regarding DDoS protection, it's simply not possible to implement a reasonable defense against anything but the most naive attacks without spending a ludicrous amount of money. I for instance was working on a site that got targeted, and at one point tried to load-balance the traffic across _30_ machines efficient blocking rules, I was saturating their connections.
After giving up, tried to use cloudflare. A few clicks later, they just ate the entire attack for me. All on their free tier.
That said, I strongly suspect cloudflare probably works hand-in-hand with the NSA or some shit, and it's a dangerous trend. But I just don't think there exists a viable alternative, unless you're a multi-million dollar internet company
> at one point tried to load-balance the traffic across _30_
> machines efficient blocking rules, I was saturating their
> connections.
I have a slightly more complex "load-balancing" method, as long as the network bandwidth itself can hold out (hosted in the cloud) I've been able to deal with all attacks so far. My current method is to have a "whitelist" and "blacklist" look up table to quickly begin filtering incoming connections. If you end up on the blacklist (reasons include too many connections over a time period, bad requests, too many requests without authentication, etc, etc) then you instantly get killed - not a single byte gets sent. Being on the whitelist (reasons include being an already auth'd user, low traffic density, unique requests, etc, etc) then you shortcut some additional checks. Connections not in either list (potentially either attackers or new users) go through an additional per-generated (the check and answer is statically generated, so it's not added to connection overhead) check to "test for humanness". Both lists are decayed over time and connections on both lists can be re-assigned if they start behaving badly. The additional checks are mostly only activated under high load.
After that there is a resource management layer that is essentially "service temporarily unavailable" on anything too heavy (large file GET/POST, large database read/write, non-essential database writes are dropped, etc, etc).
This is all then run through a high-level network simulation to test where the bottlenecks will be. Each "release" goes through the test before making it into production.
I can't give too many details, but there is also a slightly malicious protection layer - if we detect somebody using a known/obvious bot (which is harmful) we have a few methods to crash them (based on known vulnerabilities). Some of these also act against aggressive search bots.
I have a beautiful graph sitting somewhere showing an attack ramping up and then mostly disappearing when the defense was triggered. It was quite worrying at the time because we just assumed our system went down or we had accidentally started blocking real users.
> That said, I strongly suspect cloudflare probably works
> hand-in-hand with the NSA or some shit, and it's a dangerous
> trend.
I also highly suspect this.
> But I just don't think there exists a viable alternative, unless
Your defense mechanism is actually the exact thing described as protection against a simple DDOS attack.
With a REAL ddos attack you will get so many incoming traffic that your bandwidth is saturated before any software will come in effect. A real ddos is a hardware problem, and it cant therefore be solved by software.
> Your defense mechanism is actually the exact thing
> described as protection against a simple DDOS attack. With
> a REAL ddos attack [..]
I'm not entirely sure what is really classified as a "simple" vs "real" DDoS attack.
> With a REAL ddos attack you will get so many incoming
> traffic that your bandwidth is saturated before any
> software will come in effect.
I specifically said: "as long as the network bandwidth itself can hold out". If the network is saturated, it's saturated - it's game over, for anybody. But you'll find that most services will die way before this limit is reached. On a WordPress site for example you'll usually find the database will give out long before the bandwidth is saturated.
Even with legitimate users coming through some great filter like CloudFlare, it doesn't prevent a hug-of-death if you don't have some "smart" application level handling of high numbers of users. Like Linux OOM, when you hit your limits, the only choice left is to try and quickly and intelligently start killing without affecting the well behaved.
Not to mention his solution really does nothing against ping, UDP, and TCP traffic. It's really only effective against http traffic. Which means his upstream or provider is probably going to cut him loose for the duration of the attack. A single person cannot effectively block a concerted DDOS.
> Not to mention his solution really does nothing against
> ping, UDP, and TCP traffic.
Please bare in mind that the description is heavily simplified. If you're talking about bandwidth, of course this is the limit.
> It's really only effective against http traffic.
How so?
> Which means his upstream or provider is probably going to
> cut him loose for the duration of the attack.
There's nothing we can do about that, but being in the cloud offers at least some protection. DDoS'ing some random Pi at home is a little different from DDoS'ing an AWS server.
> A single person cannot effectively block a concerted DDOS.
Of course not, but that doesn't mean you have to make it easy.
The issue is that TLS handshakes are actually quite expensive. Assume a limit at about 3MHz of a modern non-hyperhtreaded core for each rps you want to do tls handshakes for. This is <40kbit/s in each direction. That's a measly 2.5 Gbit/s symmetric on one of AMD's 64 core flagship processors. They can normally push out ~1Tbit/s http unencrypted if the SSDs keep up, or about 20~50% of that if they have to encrypt it.
These numbers are all rough, but they are all for the absolute core/unflexible C-code that barely complies with the RFCs.
> The issue is that TLS handshakes are actually quite
> expensive.
Agreed and a very good point. IPs that end up on the "blacklist" are rejected before any handshake occurs. The description I gave was really quite simplified.
It's pretty simple, don't say Cloudflare are nazis are you will be allowed on their platform. They have been very vocal about being neutral in almost every case.
Why though? If you don't like a service, move elsewhere. They're not obligated to serve your site. It's the same as any brick and mortar business. If my clients attack me, I cut them loose.
I disagree that calling CloudFlare Nazis is framing them, as the purpose (assumedly) is not to incriminate. Criticizing them by likening their behaviour to that of Nazis or suggesting that the company is filled with fascists may be in reality false, but a valid criticism regardless.
There's an overblown meme floating around that cloudflare is a man-in-the-middle attack on your SSL connection. But even if you don't use cloudflare, your SSL connection typically terminates at your cloud provider's load balancers anyway. Would people then say the load balancer is a MITM attack? At some point you just have to trust the cloud infrastructure you paid for.
That's not the issue. The issue is that as more of the web uses cloudflare, it becomes a gatekeeper of the web. Any person that cloudflare kicks out will be cut off from much of the web. (It's already happening, with people having to solve captcha like it's their day job)
Even if cloudflare doesn't misuse their power, something worse happens. Cloudflare becomes an easy OFF Switch for laws, warrants and copyright trolls to track people or kick both content and people off the internet.
I'll go one step further and simply call it a voluntary (by server owners) centralization of the web. Blocking the Daily Stormer (rightly or wrongly) shows that they are a political entity.
Or that they're a business that doesn't want that kind of publicity. Just because you have a loudspeaker and a podium at your business doesn't mean you need to let everyone walking in to your office use it. Just as Chick-fil-A chooses not to be open on Sundays, Walmart chooses to sell bullets not guns, and Dunkin chose to make their company name really stupid sounding they're allowed to make business decisions good, bad, it otherwise because they think it will be the best thing for their bottom line.
CF have booted off a neonazi site, and general filtering will very likely occur at the DNS level. At the very least, some filtering of internet content probably isn’t the worst thing in the world.
I’m not disregarding the possibility or risk, but there are far greater problems that deserve our communities attention more than CF.
So, rethorical question, how many people don't know that and have login credentials sent over a plaintext link (inside your provider but even then, between CF and your site)?
This might be impressive for that person's business but I don't think there's nothing that technically exciting about this. 100k/month is one page view every 25sec average.
> I don't think there's nothing that technically exciting about this. 100k/month is one page view every 25sec average.
The numbers aren't meant to be brag or impress anyone. It's just an example showing real numbers based on my current site's traffic.
I wrote it because over the past year at least 40 people have asked me how I host my site and what the process looks like to build it. Now when I get future emails, tweets or Youtube comments I have a place to link folks instead of having to wing a half-assed individual response.
Also in a world where everyone thinks you need a multi-galaxy hyper inverted Kubernetes cluster with DeepMind-level AI to auto-scale to infinity I think it's a breath of fresh air to know you don't need to do that to host a lot of different types of content.
> in a world where everyone thinks you need a multi-galaxy hyper inverted Kubernetes cluster with DeepMind-level AI to auto-scale to infinity
In a world where you are promoting 100k views for $5 a month perhaps the people asking you for advice will be given a chance to consider the faster alternatives for exactly $0
Basic off the shelf webhosting with a half dozen middlemen taking their cut can handle that perfectly well :/
Static website with 100K page views/month can happily be served without breaking drop of sweat from a little NUC sitting in one's basement. Not sure what is so exciting about that particular number.
A little NUC sitting in one's basement should not be underestimated.
A modern NUC has 4 cores / 8 threads, an NVMe SSD and offers performance equivalent to a $300/month Azure/AWS VM. There are backup/availability/connection concerns, but in terms of performance bare metal on one's basement can't be beaten.
As I have 1gbps symmetric connection I serve some API right from my basement. One API in particular has no problems serving hundreds of requests per second from the NUC with the exact parameters you just mentioned. As for backup I use 2 things:
Internal scheduled backup to hard drives that will be physically disconnected most of the time.
Standby redundant synchronized server deployed somewhere else for peanuts. It is not as performant as said NUC but will do while the emergency is fixed on main one.
If your trying to spec hosting hardware this isn't helpful, since you have to plan for serving requests at peak time. The real question is what is the max views/sec the site receives in a given year.
We were looking to rewrite our WordPress site as a static site last year in the hopes of reducing dev and hosting costs. Our prelim research led us to Hugo but our discussion [1] with them, including a founder of Netlify, led us to sticking with WordPress since we have over 2M posts and can get 1,000+ new posts a day which would lead to a overly-complicated build process and delay publishing posts.
Static site hosting is usually just S3 + CDN so why wouldn't a CDN in front of Wordpress caching every page accomplish the same thing for you? No need to change the editorial process to an entirely new system.
This site would be roughly $free to run through CloudFront on AWS, which would scale more-or-less infinitely at a sublinear cost.
Their free tier covers 50GB of transfer and 2mm requests. If you need an extra 50GB as indicated elsewhere in the thread it would cost about $4.25 a month, less if it's not the full 50.
I'm looking to move off of a cheap shared hosting provider and on to AWS for my ten-accidental-hits-a-month personal (static) sites, it doesn't make sense to even consider a webserver for a solved problem like this.
Using S3 and CloudFront to serve static content and I doubt you'd break out of free tier.
If you need dynamic content served from a CMS, put your CMS's public API behind cloudfront and bust it when the owner makes changes.
For websites that don't get edited a lot, you can use lambda to store the website content as a JSON file which sits on S3 (somewhere like /content.json).
If you have multiple editors, use Dynamo or Firebase as the content store (again, behind cloudfront).
With every one of these options, the most expensive part would be the domain name.
I ran a small single page JavaScript app that at its peak had 10K visitors per day. I served all the static assets directly from a public S3 bucket fronted by Cloudflare (for caching, domain name resolution).
For my use case it was the simplest (and cheapest) thing that could possibly work.
If it's just nginx serving files you can do this with a 386. I mean seriously what's so special about this? Maybe I'm missing something. Pretty sure nginx can serve thousands of static files with gzip+ssl per second using the crappiest ARM VPS available. Stick that in Online.net or OVH and you will pay no bandwidth fees.
Of course if your baseline is a wordpress site where every page weighs 3 MB and takes 2 seconds to generate when there's no server load, then this might seem like black magic to you
You do realise you're reading hackernews, right? Where the majority of this generation likely grew into the industries using AWS / Heroku / "your other managed server as a service tm"
I'm pretty close to reaching their free tier's cap on outgoing bandwidth. They only give you 100 GB.
To buy another 100 GB I would have to pay $20 per month (4x as expensive).
Also I don't feel comfortable basing my entire business on Netlify. My blog and course landing pages are how I earn a living. It's the same reason why I don't use GitHub pages. I just don't like the idea of having to adhere to their TOS (even if I'm not doing anything wrong) and also be limited to their free tier's traffic constraints.
I'm sure they are a good company and I wish nothing but the best for them but I just don't see them as a good fit for me given the above.
How do you view your Netlify usage while on the free tier? Your comment made me curious about my own usage, but I can't see a way to determine my current usage through the Netlify website.
According to my DO stats I average ~60 GB per month in outgoing transfer.
I guess it's because I have a lot of photo gallery images of travel trips and they get caught up in image search results. I'm not sure why it's so high to be honest.
It is going to go up quite a lot too because now that I have a podcast, each episode is around ~50MB for the mp3. I might end up putting the mp3s on a CDN in the end. Probably DO Spaces.
These two don't play nice together. That is you can't use cloudflare as a CDN and netlify without netlify complaining to you about not being able to provision your ssl.
Even a raspberry Pi can serve 100k page views in a matter of minutes (seconds?). But in a world of serverless-cloud-bigdata developers seem to have forgotten just how powerful modern computers are, and how much we have already been doing for decades with less powerful hardware.
Just to give you an idea: in 1997 Rob Maldo grew Slashdot to 100k pageviews a day on a single server [0] [1].
I run a WP website with 500k monthly page views (~800 blog posts). How would you go on implementing a static site (Gatsby would be my choice) knowing that there are WooCommerce and Sensei (Courses) installed? WP-REST?
Cloudflare + App Engine is working well for my mostly static site. The combination is completely free for any amount of traffic, and scales practically to infinity with literally zero effort.
when i see a statistic number in per month unit, it's probably not very impressive when using a smaller unit like per hour / per second.
i've used a $20 DigitalOcean in the past that used to serve 5 millions pageviews per month. but that number is not very impressive at all.
at peak, there were at most 2,500 people online (number via Google Analytics) and that's not a lot.
I know I could have done better with other solutions but most of the editors are familiar with WordPress so I had to keep using it and optimize the heck out of it.
Why the hell wouldn’t you use a CDN? Throwing up files on S3 and then putting your site behind a CDN has to be way cheaper and more available than serving it out of some EC2 server running Nginx 24/7.
My site sits on a $5 DO droplet and I have never even come close to hitting any kind of limit despite the occasional traffic spike.
Dynamic content management systems are a convenience for the developer/maintainer of the site but push additional requirements onto the hosting CPU (and the client for javascript heavy sites). The cost of these requirements is often not clear but usually much larger than you might expect.