The Architecture of Mailinator (2007)

zinxq · on June 24, 2018

Surprised to see this here now after 11 years.

Needless to say, this is incredibly out of date. Mailinator now runs across several servers with a real-time websocket system tied to Redis channels for incoming emails. The multiple server system is more for fault-tolerance than scaling issues.

The Websocket-to-redis connection allows incoming emails to be available the instant they arrive (fully) at Mailinator. Instantly - no polling front-end or otherwise. Public email storage is still in-memory with a system that chunks emails into common parts and reuses them. A reference counting garbage collector cleans them up once no email in the system is still using that part. I tried compression a million different ways but nothing beat reusing parts of emails.

http://mailinator.blogspot.com/2012/02/how-mailinator-compre...

The 2007 article is a valid picture of things then, but the system has come a long, long way since that time.

flashmob · on June 24, 2018

Hi! I run a "copycat" site that perhaps you may be referring to in that post, GuerrillaMail.com (well, actually, I didn't copy it myself, but acquired it from another guy many years back) and also been using Redis to store all incoming mail in memory. RAM is cheap these days and you can find decently priced servers with >= 128GB easy. Haven't moved over to websockets or "chunking" / deduplication yet, but architecture and UI needs an update in that direction to make it more instant. Thanks for some ideas ;-)

Actually, like you, I've also hand-rolled most of it myself, replacing the previous guy's architecture. It's been a lot of fun. No frameworks, no bootstrap, just a few dependencies here and there. Started with PHP, but now prefer to use Go for anything new. Also hand-rolled the SMTP server which turned in to a project on its own, https://github.com/flashmob/go-guerrilla

Dagger2 · on July 3, 2018

Is guerrillamail.com supposed to be displaying just a "Welcome to nginx" page?

Screenshot on the left: https://nat64check.ipv6-lab.net/v6score/measurement-30830/

newman314 · on June 24, 2018

Thanks for the service! It's very useful.

However, I find that quite a few of the providers have wised up to the domains and more often than not I am now unable to use mailinator address for signups (even with the alternate domains)

zinxq · on June 24, 2018

Indeed. I thought about a blog post how to configure haproxy to proxy SMTP to mailinator and stay undetectable.

But Mailinator's use case has morphed over the years. It's now used as a tool for QA and UAT testing now a great deal. That use case has spawned private domains and API access.

Same thought process goes toward services that wish to ban it, if they really want to, that's their business.

(You can of course sign-up for a Mailinator account using a Mailinator email)

fla · on June 24, 2018

> Public email storage is still in-memory with a system that chunks emails into common parts and reuses them. A reference counting garbage collector cleans them up once no email in the system is still using that part.

That sounds really cool. Any chance you'll write a blog post about it ?

sova · on June 24, 2018

Looks like he did: http://mailinator.blogspot.com/2012/02/how-mailinator-compre...

fla · on June 25, 2018

Thanks !

jorangreef · on June 25, 2018

Have you considered content-dependent chunking for your deduplication?

At Ronomon, we use https://github.com/ronomon/deduplication

andris9 · on June 24, 2018

Very cool! In WildDuck IMAP server I use a bit different deduplication method (I deduplicate decoded attachments, not lines) that does not give 90% reduction but still more than 50%: https://github.com/nodemailer/wildduck/blob/master/README.md...

anothergoogler · on June 24, 2018

Thanks for the writeup. What are you using to handle the WS connections on the server?

zinxq · on June 24, 2018

import javax.websocket.MessageHandler;

Stayed away from frameworks. Mailinator suffers from "do it yourself" syndrome where I wanted write most things myself for the educational value. No Spring, no web frameworks, straight servlets. Custom SMTP server.

(It does use Redis pub/sub but I did write my own first, Redis was just too easy not to rely on)

rufugee · on June 24, 2018

Reminds me of the Plenty of Fish story in terms of avoiding frameworks and rolling it yourself. In hindsight, would you consider this approach an accelerant or a deterrent? I know I've often found myself thinking about Plenty of Fish when I'm on page 400 of the lastest 1000 page hip framework book :-)

bluedino · on June 24, 2018

Is it really written in Java?

I love when popular services are written in Java, PHP, or even something crazier like ColdFusion. Just to spite the naysayers.

keypusher · on June 25, 2018

Almost every large tech company still uses Java in some way on the backend, and I doubt that will change soon. Google, Twitter, Amazon, Netflix, etc. PHP has a significantly worse reputation, although Facebook is known for a very large PHP footprint. ColdFusion, well... I haven't seen anyone in recent years using that in production.

nickpsecurity · on June 25, 2018

That same attitude is why I kept pushing people to do one in Fortran. Then this showed up:

https://news.ycombinator.com/item?id=11938405

Now, we need a useful app in Unlambda that people actually paid for. Then, you show them the code and tutorials when they're interested in doing extensions. Observe, record, and share their reactions. (villainous laughter)

http://www.madore.org/~david/programs/unlambda/#intro

augbog · on June 24, 2018

Woohoo Job security though right?

willvarfar · on June 24, 2018

Some related previous discussion https://news.ycombinator.com/item?id=3617074 2012

And a blog post by me in reply to those comments, which is how I remembered it :) https://williamedwardscoder.tumblr.com/post/18065079081/cogs...

swah · on June 24, 2018

Even though Paul says that the service now runs on multiple nodes, I think the general idea of your post is even more important than 10 years ago.

(I miss blogs..)

efiecho · on June 24, 2018

Very useful service that I have used for many years. However, in the updated version of Mailinator I can't find the "Alternate Inbox Name" anymore. Do anyone know why this function was removed?

detaro · on June 24, 2018

It still shows alternate domains for me on the front page, or am I misunderstanding what you are looking for?

efiecho · on June 24, 2018

Previously, if you created the inbox e.g. joe@mailinator.com an "Alternate Inbox" name would be generated e.g. M8R-yrtvm01@mailinator.com. You could hand out this address and the mail would still end up at joe@mailinator.com. This would make the inbox a little more private.

rando444 · on June 25, 2018

A relatively unique name coupled with an alternate domain seems like it would solve roughly the same problem, no?

kchr · on June 25, 2018

Or evwn more spammy. ;-)

privateSFacct · on June 24, 2018

Before I used mailinator I was using the + addressing feature at google. I just did username+site@ . The thing I found (at least back a fairly long time ago) - a lot of folks you don't even think DO in fact sell your emails or get bought and sell your email, or maybe it's their order processing or shopping cart software. I couldn't fully figure it out.

It seems to matter a lot less now. I assume my email is out there, but google does a remarkable job filtering my inbox (paying customer if that makes a difference). And I've added DMARC / SPF / DKIM as well to my outbounds (went from thousands of forged outbound to basically none so presumably spammers are also watching some of these signals for whatever reason).

I always loved how fast mailinator was, and once I had that for all those random quick orders I used it. Now I've just started using amazon for just about everything - so less of a need for the mailinator emails.

svdr · on June 24, 2018

If you use DMARC, your domain is less interesting for spammers, because lots of providers will now bounce the emails they send using your domain.

cosmie · on June 25, 2018

> I couldn't fully figure it out.

In a previous career, I spent a lot of time sourcing, vetting, and managing data vendors. While I didn't buy data from shady vendors (our clientele would have skewered us, and I leveraged that fact any time I was pushed to favorably evaluate a shady broker). But in the course of my job, I got a lot of insight into the ways things like that work. Here are just a handful of ways that it could happen, only the first two of which the website owner had any influence over:

- Unscrupulous analytics providers. In the course of collecting analytics for the wbesite owner, these things can passively (or actively) capture things like form submission data and query string parameters. An innocuous clause in the ToS could be allowing them to package that up and sell it, separate from the core service they were offering the website owner. It's not just an issue with free analytics services; a website owner could be using an expensive service with the expectation of privacy, yet still have this happening to their site data.

- Unscrupulous plugin/service providers. Think stuff like AddThis[1][2], which the website owner used to provide a particular and potentially useful functionality to their site, without realizing that the plugin/service provider was using that privileged access to their user base to do unintended things (like setting ad tracking cookies, like AddThis does). Again, this isn't just limited to free services, but paid services do stuff like this too.

- Users give access to their email data directly, and whatever they gave that access to is collecting your info and reselling it. Unroll.me is a really good example of this[3]. If you use Gmail, it's good to periodically check your Permissions page[4], and click on each app to be sure of what does and doesn't have any access to your email data (it only takes read, not write access to siphon off stuff). Note though that you still might not be safe - those permissions are for things you've granted permissions to through Google. That's not always a prereq - for example, if your email is synced to your phone, an app on your phone could have permission to access that data on your phone and indirectly read all your incoming emails that way, and won't show up on that Google Permissions page.

- For every Onavo that gets publicly called out[5], there are countless other "helpful" tools that people install to MitM their own traffic. Most of them are owned by companies far less publicly visible and smaller than Facebook, making them far more likely to do shady stuff with the data they collect. At least Facebook keeps their collected data internal, since it's in their interest not to sell it to competitors.

- AdWare which comes bundled as part of the installers from downloads on sites like CNet, FileHippo, SourceForge, etc are not just used for AdWare. That toolbar it's installing as part of its installation flow isn't just able to do things like usurp your default search or inject ads into pages. It can also see everything you type and everything you submit on every page you visit (or local PDF that you happen to open in that browser, which has even deeper privacy implications). While virtually anyone on HN might not be largely impacted by this particular vector, they tend to get caught by the next one.

- Useful, non-adware browser plugins[6][7]. It was downloaded (and used for) something specific, is generally helpful for that and not at all considered some spammy adware toolbar that actively make unwanted changes to your webpages and traffic. But oh hey you also gave it access to all of your browser data, and they're having a good ol' time leveraging that access passively. Developers aren't immune to this, and generally love browser plugins. I can't recall what it was at the moment, but last year there was a big news splash about some popular extension in the developer community getting bought and having some minor change made to integrate in the new owner's paid service offering. The code was on Github, and people went and forked it from before that particular commit had been made. But when was the last time you audited/sanity checked the current code running in all of your browser extensions? Just because it played nice before doesn't mean it does now. And for non-developers, they never think to (or have the capacity to) look into that in the first place.

That's not even an exhaustive list. But provides a bit of context around just how ridiculous it is to keep up with all the ways your privacy gets exploited that could lead to your email ending up on an email list. And generally, the exploitation doesn't involve directly marketing to you in such a traceable way as you getting an actual email, but rather in selling market research, competitive intelligence, or targeting data/capabilities to AdTech companies. None of which you ever get exposed to as an end user, so you never even become cognizant of the volume of transactions involving exploiting your data.

[1] http://www.addthis.com/data-solutions

[2] Not saying that AddThis sells emails, it's just a well known widget used on a lot of sites that has a revenue stream most site owners and end users aren't ware of. Which both compromises user privacy and got them a buyout from Adobe for $200mm.

[3] https://www.nytimes.com/2017/04/24/technology/personal-data-...

[4] https://myaccount.google.com/permissions

[5] https://www.csoonline.com/article/3254571/security/facebooks...

[6] https://www.bleepingcomputer.com/news/security/-particle-chr...

[7] https://www.express.co.uk/life-style/science-technology/9718...

sorokod · on June 24, 2018

The AgingHashmap is basically a rate limiting device. Nowadays, Guava provides something like this "out of the box".

Oh, and thanks for Mailinator.

CapacitorSet · on June 24, 2018

I must say I'm surprised a Java application could handle 2.5M emails/day (29/second) in less than 1 GB of RAM.

brianwawok · on June 24, 2018

Why?

Would it surprise you to learn that many of the worlds financial markets run on a handful of Java servers? Message rates of thousands of messages per server is not uncommon.

Java is very fast. Very weird people in 2018 think it is not.

tonyarkles · on June 24, 2018

I think one of the reasons people are startled by the RAM usage figures here is that a Spring Boot app with a bunch of extra jars/"starters" added to it idles at ~500-600MB without any load at all. And can take 30-60 seconds from "java -jar..." to being able to successfully handle its first inbound HTTP request.

brianwawok · on June 25, 2018

Well, idle memory usage is not a super useful metric to measure for an app that uses a GC. And in fact the comment was about speed... speed and memory usage can often have inverse relations (think caches)

Startup time is not super useful when using a framework with 200,000 lines of code. Try creating a webserver using Java + a servlet, you will be serving requests in < .5 seconds on a modern machine...

tonyarkles · on June 25, 2018

Oh, don't get me wrong, I totally agree. You can definitely create lean & mean stuff in Java, it just doesn't seem to generally be the Culture of Java.

And while idle memory usage isn't an amazing metric, it's still pretty indicative of bloat. For us, in the current state of things, memory usage is what determines our cloud costs. Setting heap maximums and things seems to somewhat work, although the Spring apps with a max heap set to 512MB still seem to be able to overrun their cgroup max memory of 1GB. There's the new experimental options to have the JVM look at its cgroup limits and manage memory based on that, but due to reasons we don't have a new enough JVM yet.

For me and my tastes, Go's pretty tough to beat in the overall productivity/efficiency/reliability set of tradeoffs. I fully admit that that's a taste thing. It's not even that it's amazing at any particular thing, it's actually not really that amazing at all. It just doesn't have anything that particularly annoys me. And the community is awesome: they really don't do the 200kloc framework stuff; the attitude is generally "why do you need all that?" and I agree.

dTal · on June 25, 2018

Conversely, I'd say that if you can manage to use over a gigabyte of memory to handle a couple of megabytes per second of traffic, you're probably doing it horribly wrong.

bluedino · on June 24, 2018

Is there a container/vm out there that's ready to serve up a private Mailinator-type instance that you can just point a domain to?

0x6877 · on June 25, 2018

I'm working on something similar but it's still a work in progress. And it's not quite point a domain to it and your done type setup just yet. It takes advantage of mailgun and uses webhooks rather than hosting an email server yourself. https://github.com/haydenwoodhead/burner.kiwi

dylz · on June 24, 2018

https://github.com/m242/maildrop is old, but usable.

heinrichf · on June 24, 2018

(2007)

sctb · on June 24, 2018

Thanks, updated!