I could see 20% false positives on spam for Linus equating to 0.1% of false positives across the board since I suspect the people emailing Linus are 200 times more likely to be running their own mail server than the general public.
Yes, Google seems to have flipped some switch or pushed some change that takes well established domains and senders from that domain and for reasons that are not well understood written them off as spam. In theory they are getting a huge 'not spam' signal back at HQ but I agree with Linus that they screwed up big time. Stuff they should know wasn't spam and have in the past not classified as spam, now suddenly is. Algorithm update fail.
This started hitting several of my mailing lists a while back. Not just personal servers, but several major web mail providers, most notably Yahoo and AOL. Some theories floated around that it had to do with a change to header/certificates, but I don't recall if I saw any actual confirmation of that.
For some sender's I don't even have the "report as spam" button, even though most of the shit I get from them is ads I didn't ask for. Just because it's from a well recognized company domain.
And the moment someone accidentally clicks the SPAM button you'll find yourself with weeks of pain on a low volume mail server.
Because, as an individual, you won't qualify for their FBL service and "mysteriously" you'll have weeks of everyone saying you end up in their spam folder.
And they don't tell you whether you meet those criteria, until after you go to the trouble of logging in and serving a DNS TXT record for ownership verification, as I just found. Granted, I didn't expect to qualify, but it would have been nice if they'd told me up front.
Yes, but at least they're using the same verification mechanism as elsewhere so if you have already added the domain to Google Webmaster for example under the same Google Account, it will be automatically verified.
Fun, but it doesn't scale. Google can only do this because they have a near-monopoly on email. What would you say if I gave you tools to whitelist yourself on my email server? You'd tell me to get my spamfilter straight, or more likely, simply ignore me.
I'm not against Gmail, just like I'm not against Outlook.com or Yahoo mail or something. It's just that providing tools only work for players in a power position (i.e. Google) who can afford to ignore small players (i.e. me), and what's more, this further strengthens their power position: the better they can detect spam so more people will start using it (the postmaster tools are there to help people prove they are good, thus helping Gmail distinguish).
I had exactly the same setup, but I had to create 10 fake Gmail accounts, add the email address to the address book and flag several emails as "not spam" before it was useable. Google Mail just ignores the fact that there are private people who want to have their own mail server to be independent from Gmail.
Were they set up initially, or after you noticed problems? I wonder if prior messages routed to the spam folder that people haven't marked as non-spam count against you for a certain prior period.
I added them after I noticed problems really. Postfix and Dovecot have somewhat of a learning curve (I started up and trashed a few VM's before I got it right). I ended up using IRedMail, the defaults are pretty much Gmail ready.
I don't know about prior messages counting against you, given what I've seen it seems to makes sense. Without insider info we can only speculate.
This hit us at work really badly: we use GApps, and everything from our internal list server (which we can't replace with GMail aliases because $REASONS) was being sent to spam ... and there was literally no way to whitelist this across the company automatically.
You know how Google treats its free customers with utter contempt? I can assure you, they treat paying customers with the same contempt.
Pretty ironic, then, that many suggestions in comments (on the source) are "you should just run your own mail server". Maybe two 'wrongs' do make a right..
It's what happens when you can control such a large portion of a resource. Just imagine, Google can at any moment blacklist just about anyone. Most websites and companies they can do nothing to stop it.
Haven't a clue what you mean. I run several mail servers and have no issues whatsoever with GMail. And I run through all the spam filtering, too, and only recall one that was inadvertently marked as spam but, iirc, the mailer was the problem.
I wrote a daemon that monitors a mailbox that users can submit spam to, for spam that makes it through the other filtering layers. The daemon finds the original message in the user's mail directory, reads the headers, tracks down the originating network using whois, and then blackholes the network and emails me a notification about it.
Some networks are whitelisted for practical purposes. Google's one of them.
If necessary, I could provide a list of the subject lines of the emails that have been reported. A YC company, Zenefits, is one offender that comes immediately to mind.
While we're waiting for Monday, here are some subjects of spam emails I've received from Gmail's SMTP servers recently. Mentions of my domains replaced by example.com:
Example.com SEO Issues
Tough Times With Example.com? Needs Attention!!?
Get ranked higher: Example.com
Google optimization for Example.com
Get ranked higher : example.com
Website Audit Report to increase website traffic
Example.com - audit report now available
Give a glamorous new look to your website
Web Design Proposal example.com
Organic SEO Promotion For Example.com
Back. Here's a recent selection of subject lines, extracted from spam daemon replies. The "Zenefits + Problem" ones are funny .. my dba is "No Problem", and they're not smart enough to correctly handle that I guess, which makes sense, since they're also dumb enough to spam me. I've never replied to any of their messages.
"Re: Zenefits + Problem"
"Higher Targeted Traffic: Associatedtechs.Com"
"Re:re: UL,CE,ETL Split-core Current Transformer(0.333V or 1A/5A output) ,Rogowski coil,hall AC-DC transducer"
"Poor support processes could be costing you customers"
"Get ranked higher:"
"Higher Targeted Traffic: Associatedtechs.Com"
"Re: Zenefits + Problem"
"Re: Zenefits + Problem"
"Your Website...!"
"Get ranked higher: associatedtechs.com"
"Google Update for: associatedtechs.com"
"How to increase your website traffic and generating leads??"
"Zenefits + Problem"
"Mobile Apps Development"
"Digital marketing proposal- www.associatedtechs.com"
This isn't a complete list, sadly, since it turns out the daemon isn't logging ban attempts against whitelisted networks. I should fix it so that it does.
I decided to do a quick check of my gmail; there were 151 emails in spam, and literally not one I would want in my inbox.
It seems entirely likely that Google is weighting whether something comes from a private mail server very heavily, and Linus, being who he is, gets a lot of email from private servers.
Not saying this isn't a problem, but it probably doesn't affect 99% of email users.
Thanks for pointing out the real numbers. In my defence, I was thinking "conservative for the near future", so the next 5 years or so. An of course giving 9M users a bad experience isn't that great either.
If we're being pedantic, it's probably worth noting that the "99%" was a completely arbitrary assumption, and trying to run statistics based on one anecdote and an unfounded assumption is not going to produce accurate results.
True. I suppose the main point is that with a billion users, even some "small fraction" of users having problems, is going to translate to a lot of people having problems.
Honestly, if Google took support even half-seriously (or: they considered eg: users of gmail their customers, rather than just their advertisers) -- these kind of issues wouldn't be so bad.
I do think it's just a question of time before Google relegates itself to irrelevance through a strictly inferior product though.
True. We send thousands of emails a week and all of a sudden a large portion weren't even delivered. Turns out it's all DMARC and once those policies were fixed (assigned at all) the delivery denials ended.
Oddly, I have had the opposite problem. This past week I have had to delete 4 or 5 obvious V!@g&a!!!!!-type spam emails from my priority tab of my inbox. I can't remember the last time I had that happen. Something isn't working quite right.
Have been seeing the same thing here as well. With spam messages coming through as normal mail.
I have also been having any messages that are replies to messages I send being diverted to my spam folder randomly. It is happening on both my personal gmail account and on a business account.
It seems lately Google has really been less focused on the core components that made them successful in the first place. I have found their search results seem to be returning more spam sites than before. My vote is to get Matt Cutts to come back and start cleaning up spam again :)
True, but the spam filter could certainly learn to weight private mail servers as Ham on a per-user basis. Perhaps the learning algorithm can't generalize the feature to new unseen private make servers. However, Google certainly has the engineers who can add that functionality, they just need to properly manage them to get it done.
Yeah, its part of why I gave up on running my own mail server and signed up for Google Apps back when it was free. Its just not worth fighting the spam filters for side projects.
It was probably something along the lines of "Well consumers are what matters and they all use major services".
Speaking as someone who is at least 200 times more likely than the general gmail-using public to receive mail from completely normal, mainstream Chinese email addresses... I'm still mad at gmail for just assuming mail from China must be spam. It's not spam!
(There's been improvement - for example, recently I received mail from someone I'd corresponded with in the past, and it wasn't initially marked as spam. Gmail used to be more aggressive than that, such that it would be marked as spam unless it was a direct reply to an email I had sent.)
(...for extra irony, that recent message was "I'm stuck in England and can't get home without a few thousand dollars". Her account had been hacked.)
Value judgements aside, that's not what I'm postulating. It's more like "unknown" mail servers (meaning any mail server with low enough volume that Gmail doesn't have an opinion about it yet).
What about using it as one of many criteria (maybe whether the email seems to contain gibberish being another, which patch files might be classified as).
Out of all the people in the world who regularly get mailed "gibberish" patchfiles, and not only fail to mark them as spam, but continue to interact regularly with the senders - do you not think it's reasonable to assume Gmail might notice Linus has been doing this for two or three decades?
I wondr what other forms of "gibberish" Gmail classifys as spam? GPG encrypted mail? Mail containing public ssl keys or CSRs? ANy foreign language not regulalrly hard in Bro-ville, South Bay?
What about the possibility that 'vanity' email servers are more likely to have something non-obvious misconfigured? Maybe, for example, they send spam rejections back to the envelope from address (generating backscatter which looks like spam), rather than rejecting spam within the SMTP session?
FWIW, I just checked my mail from the last week, and had 4 false positives, out of ~80 mails. That's WAY higher than I've traditionally had with Gmail, where I might expect that many in a year.
Yeah, whatever the problem here is, it's almost certainly very user-specific, depending on the mix of email they get, and Linus is probably somewhat atypical.
I saw his original post on G+, and of course immediately went and checked my gmail spam folder.... and... no false positives at all, 100% correctly identified spam.
He said it mostly affected mailing list messages. His subscriptions come from a single server, so random spam classification of messages in the middle of some thread doesn't really make sense. Not as a sending-server issue anyway.
You're probably right. Mailing lists (of the discussion group variety, not marketing mail) often have problems with spam filter false positives, most commonly due to DMARC policies.
There's not really a great solution to that at the moment - either you technically violate RFCs by having your your discussion group software modify some headers, or you deal with other kinds of breakage.
Doing header rewrites is effective for reducing FPs due to DMARC, but adoption is far from universal - off the top of my head I'm not even sure if Mailman supports that at the moment.
You only have to rewrite headers if your mailing list is actually modifying the mails i.e. doing a MITM attack on the mail flow. Some mailing list admins feel very strongly about footers, subject line tags etc and then claim they "must" rewrite the From header, but I am not sure it's technically required.
Discussion groups retransmit messages, which is enough to fail authentication in a lot of cases.
Here's an example: you have an address @google.com, which has a DMARC policy of 'quarantine'. You send a message from this address to a discussion group, which in the process, resends your message from a non-google server, thus failing DMARC.
Google's DMARC policy says that if an ISP receives a message from a @google.com From address and the message fails DMARC, that ISP should place the message in the spam folder.
So it boils down to: does a list operator change the From address in distribution group mail to use a list address they own in order to pass DMARC, or do you deal with the filtering consequences of failing DMARC for many domains?
The whole point of DKIM is that messages can be relayed without breaking authentication, because it uses digital signatures instead of sending IP. So I think it wouldn't break
... IF the body is not modified, and the header signature matches, AND headers retain DMARC alignment... the reality is that retransmittal (as opposed to just relaying) almost always does one or more of these.
Here's an example from a Google email engineer's recent post to the Mailop list, which is running Mailman software.
Authentication-Results: mx.google.com;
spf=neutral (google.com: 2001:41c8:51:83:feff:ff:fe00:a0b is neither permitted nor denied by best guess record for domain of mailop-bounces@mailop.org) smtp.mail=mailop-bounces@mailop.org;
dkim=neutral (body hash did not verify) header.i=@google.com;
dmarc=fail (p=REJECT dis=NONE) header.from=google.com
That message says the body was modified. The solution is simple: don't do that. Your original message said DKIM breaks if you simply relay mail, but it isn't correct.
The way you worded the sentence is classically used to indicate "well, this is an anomoly, because he is not a normal user, and is maybe even doing something absolutely abnormal, making this a trade off between the needs of the many vs the needs of the few, one he can easily bypass by doing something different inside his niche"; it took me reading a few of your later replies before I felt you were just providing an explanation for how Google's math could be flawed, as opposed to providing an explanation for why this might be an acceptable casualty.
Well, as he said, some emails on the middle of threads were marked as Spam
Funny thing that happened to me, a mail from Google was marked as Spam. This was a long time ago, and it was from a mailing list, but apart from that, it was a legitimate mail.
This happens because of DMARC. It does actually make sense, in a way.
DMARC allows a domain to say "email that claims to come from my domain must be signed by me. If it doesn't, burn it with fire, no exceptions". So Gmail is only following the instructions laid out by the sending domain.
This is helping to make the email ecosystem a lot more robust by ending the problem of From forging. Ordinary users rarely realise that the From header is otherwise meaningless so phishing them can be very easy.
However it does not play well with mailman's default settings, and a lot of mailing list admins refuse to help the email ecosystem become more secure (whilst often PGP signing their own mails, doh). So DMARC creates a lot of noise in the technical community from people who have to/want to use mailman based lists.
I got problems this week with mailing lists routed through apache servers & for which I had filters created, etc. And this happens not only for mailing lists
very funny, right after I saw this, I decided to check my spam folder just to see if anything important has been filtered out, and I saw an email sent by google marked as a spam:
this email is sent by google when logging in google account from a new machine. they tag their own email as spam ...
Hi xxx,
Your Google Account xxx@gmail.com was just used to sign in from Chrome on Windows.
Don't recognize this activity?
Review your recently used devices now.
Why are we sending this? We take security very seriously and we want to keep you in the loop on important actions in your account.
We were unable to determine whether you have used this browser or device with your account before. This can happen when you sign in for the first time on a new computer, phone or browser, when you use your browser's incognito or private browsing mode or clear your cookies, or when somebody else is accessing your account.
Best,
Yeah, good point. I guess I was feeling unnecessarily cynical.
If I were on the jury, I'd definitely buy it. Their counsel would have to really bone up the argument that it was such equal treatment, they were flagging their own messages. And the prosecution would have to really stretch things, probably entering into conspiratorial territory in order to make a case.
Then again, I've found myself perplexed by jury decisions on tech-related cases more than once. Although having sat on a jury, I can see how such decisions might be made.
Another great one is when you're having a conversation with someone, and after several back and forth emails, the next one goes to spam. Sent from the same device, same headers, etc. Presumably it triggered some keywords. But I mean, come on! Obviously if I've replied to this person four times in a row, their fifth email is not spam.
I realize that everything takes time to implement, and developer time is not infinite, but this one seems like pretty low-hanging fruit.
Not long ago, I found in my spam mailbox several emails sent by Google's recruiters (from @google.com domain). I was, at the time, going through their interview process.
Yes, my first reaction was also thinking those were indeed fake google emails. but I checked the logging in time and the from email address, they are legit (I got two emails like this filtered this week).
those emails were sent from no-reply@accounts.I.google.com
I run my own mail server with full SPF, DKIM and SRS, routing the mail through a relay at a reputable VPS provider on high-reputation IPs. Over the last few months, there seems to be this pattern where if I email someone @gmail who I've never mailed before, they don't seem to ever get it. I wonder if this is the issue.
Ditto, this. Some recipients find mails in the spam folder, others claim they aren't even in there, but have simply vanished. All were accepted for delivery by gsmtp.
I had a problem with my personal server for a little while; all my SPF, DKIM, etc was all set up for IPv4. I have a couple IPv6 interfaces on my machine, and my mailserver was delivering to GMail over IPv6!!
This is one of the reasons I use Google Apps for Business for email (that and I find running mail servers tedious) as deliverability is consistently simpler.
Having to use a Google product instead of your own mail server, in order to have your email delivered to Google customers, sounds like anti-competitive business practices to me. You'd think they'd be a wee bit more careful with something like that given EU interest in them...
It also isn't enough. Some of our gmail users' email (sent through the gmail smtp server with oauth) will be marked as spam in gmail. Very frustrating!
Your mail server isn't real competition to Gmail. Yahoo and Outlook probably are, but considering Gmail handles their mail fine, I'm not sure "anti-competitive" is the right angle here.
I doubt there are very many lawyers on the spam protection team. I also doubt the people on that team have any intention of harming minor-league competition. So it's not particularly surprising that they aren't more careful about avoiding something that a) is not well-known to them, and b) they don't think they are doing.
The problem with this apart from the cost is that all the other big email servers are more likely to mark your email as spam if it comes out of google apps :(
Do you actually mean Google Apps (i.e. Gmail) or are you thinking of Google App Engine? Because I don't think most other providers particularly distrust Gmail.
I also run like this and have had no problems. I don't commonly send emails though (mainly receive) so it is possible that I just haven't sent enough to see the issues.
I've had a problem with false positives in my spam folder for months. A large percentage of the email newsletters I subscribe to end up in my spam folder every day, and clicking Notspam doesn't help. I can Notspam a certain newsletter every day for a week, and then the next day that same newsletter will end up in my Spam folder once again.
I'm starting to think that Notspam signals have no effect at an individual level. Either that or the button is simply a placebo.
Fortunately, the false positives for personal correspondence from individuals are still extremely rare, at least for me.
Yep, Gmail's spam filters work based on the collective judgment of Gmail users. The core of your and linus' problem is a lot of people use the 'mark as spam' button as unsubscribe button for mailing lists.
I find this trend of "follow the majority" quite disturbing - it's as if they're implicitly saying that everyone should think the same way and punishing those who don't follow. What's spam to me may not be spam to you, and vice-versa.
Then again, having a personalised spam filter for each user would probably consume a huge amount of resources...
Not sure why you are down-voted. Perhaps because everyone (that run their own mail) generally runs individual filtering per account. Typically spam assassin will score an email, but filtering (based on that, and other criteria) is up to the individual user (eg: by having a white-list, choosing spam score to treat as spam etc).
As mentioned further up, some scoring works well for many users, but not for all, such as marking eg: Russian/Chinese/Not-spoken-here-by-most language as spam.
I really see no reason for why Google should be so bad at classifying email as they apparently are.
That idea sounds compelling at first, but the data doesn't support it. There are plenty of email marketers who are focused on a non-technical audience (who presumably use 'mark as spam' to unsubscribe frequently) and which have no problems with spam folder placement.
There's a spectrum, and if a given sender looks considerably worse than average, they're more likely to get filtered.
If anything, if a newsletter is getting filtered, it's more likely to be the marketing manager's fault - perhaps they don't adequately monitor deliverability, or they don't test their content, or they don't use activity segmentation... etc.
Come to think of it, if they're going to have a "Never send it to Spam" flag in filter config, shouldn't it default to TRUE? If you're taking the trouble to set up a filter, it's probably because you are interested in those messages.
How easy is it to unsubscribe from those newsletters? Is there a one-click unsubscribe link that doesn't require you to login or enter anything before unsubscribing you?
If I've subscribed to a mailing list or newsletter and there isn't a one-click unsubscribe I'll click the Spam button to get it out of my inbox instead of going through their procedure.
One-Click Unsubscribe is paramount for mailing lists and newsletters not getting marked as Spam.
If your newsletter has a good reputation then clicking "mark as spam" in Gmail prompts the user to automatically unsubscribe instead of marking spam, or to do both. If your newsletter has marginal or bad reputation or does not offer automatic unsubscription then that doesn't appear.
Happened to me as well.
Some of the eMail were just "updates" email that I like to receive but if they get lost is not a big deal.
But a couple of them were very important one, and to make things worst, they were answer to email in which I was in CC. So, a colleague of mine send an email to someone and CC me. The second person answer and that mail is marked spam for me but not for the person who wrote the original eMail.
Doesn't make sense that an answer to a legitimate conversation is by default a legitimate eMail?
A big problem I've had is that many of these are "business relationship/transactional" e-mails, which play by different rules.
My address is my first name + last initial (neither of which are all that uncommon), and this is made much worse by Gmails idiotic ignoring of periods in addresses. There is a dude in Denver, CO who is absolutely convinced his e-mail address is tyler.e@gmail.com. It isn't. I'm really sick of getting his AT&T and car insurance e-mails.
This happens to me all the time as well. I also have firstname.lastname@gmail.com and apparently a lot of other people seem to think they do as well.
Or at least, they have firstname.lastname1@gmail.com and people easily forget to add the number.
I wish there was a better way to deal with this type of situation other than constantly sending "please fix your address book" emails. Email is a broken system.
I'm not sure how you go from: Unless you have Google Apps for Business (or whatever), there are no vanity domains for gmail; to: email is a broken system?
Gmail.com is certainly broken in the sense that they want to cram 10 billion users into a single domain. It's ridiculous marketing/brand-motivated UX failure.
Since forever most mail services had a few vanity-domains, so people could get first.last@wherever.com. But no, Google doesn't want to provide email, they want to provide "Google Mail".
Apologies for the rant, but I can't stand it when big companies create problems through stupidity.
I dont believe I said anything about vanity domains, so I dont know what you mean by that.
I meant that email is broken in the sense that when some stranger mistakenly thinks that your email belongs to them, and continues to give it out or sign it up for mailing lists, you have absolutely no recourse. If you have an email address that like mine is easily mistaken for other ones you get incorrectly-addressed personal emails many times a day. There is no way to find the actual intended recipient or get in contact with that person to say "hey you seem to be confusd, stop using my email address". And there is no really good way to filter those emails, since after all they are coming to your correct address.
I think this is the kind of problem that's difficult to appreciate unless it happens to you frequently.
The problem with email is that anyone can email you if they have your address.. thats why we have so much spam. I dont know what the solution is, but it would be much better if the recipient had to opt-in to the conversation somehow as well.
I believe he means that if tizz.dogg@gmail.com was already taken, Google should offer tizz.dogg@loopyloop.com rather than tizz.dogg1@gmail.com. In fact they shouldn't even show that as an option.
Since Google Is now a domain registrar they could create the new domains on the fly.
Indeed. Google/Gmail do two strange things: a) While they support the age old username+whatever@gmail.com in order for users to hand out special addresses to mailinglists etc (eg: user+facebook@gmail.com, user+lkml@facebook.com) -- they don't distinguish on dots in the username (so username == user.name == u.ser.name etc). And b) they don't offer other domains than gmail.com, which leads to strange things like smith1, smith79 etc.
As for there being "no recourse" -- apart from spam, that's just wrong. It's much faster to reply with a "This is not your Smith"-mail, than it is to write a "return to sender" on an envelope. Same thing for getting phone calls from a different timezone etc.
[ed: I do agree that it's a bit more difficult with people that don't know their own address -- still think it should be quicker to reach their contacts via email than via comparable means.]
>> It's much faster to reply with a "This is not your Smith"-mail
Right, and I've sent literally hundreds of those emails. They almost never do any good, because while one person may fix your address in their contacts list, the original person who gave out the faulty address is still out there, unaware that they're giving out bad info. I always ask if the email sender can tell the intended recipient about this when they figure out the right address, but that rarely works. Anyway, I know this is a very specific problem that only affects a small fraction of people, but it's extremely annoying.
I dont really see how allowing other domains would help.. that just shifts the issue to the domain string instead of the user string. I guess it gives people more options. But one of the main benefits of gmail addresses is that it's so common. Everybody knows it, so nobody ever misspells the 'gmail' part at least.
I have a rather uncommon first.last combination. Still, there is a woman in a flyover state married to a guy with my same name who seems to think my gmail address is his. I am building up quite the profile of their family. Thanks to Gmail I know where he works, what car she drives, where they went to college and where they fill prescriptions. Fortunately for them, I have no desire to use any of this information.
It is annoying however especially since most of the spam in my spam folder is addressed to her through my email address.
I suppose I could, but I"ve seen MANY variations, and tbh there are enough mis-sends to tylere@gmail.com that it wouldn't really help.
TBH I hardly use personal e-mail these days, it's basically a bucket that receipts and confirmation gets dumped in to, in which case search works well enough. Most actual conversation is done via Facebook or IM, etc.
Which is presumably done so that if you forward the newsletter to someone else they can't (accidentally or maliciously) unsubscribe you by clicking the link.
When it's trivial to re-subscribe if you want to, I'd prefer that they do include a one-click unsubscribe. I'll deal with anyone who maliciously unsubscribes me from a mailing list that I want to be subscribed to, and this will highlight who my "good" friends are anyway.
I'd assume cybojanek is talking about majordomo-like mailing lists, the sort used by lots of open source projects, which typically require an email sent to a specific address with "unsubscribe" in the subject line and so on and so forth.
Curiously, having to email <mailinglist>-unsubscribe@server.com to unsubscribe from a mailling list is, to me, preferable to having to login to a website in order to unsubscribe from the same mailing list.
I think it's because I'm beginning to resent the idea of every website and its dog requiring that I have a user account before I'm allowed to even browse the content.
You're not alone. I much prefer majordomo to any other web-based mailing list system. I just chalked it up to a greybeard quirk; HN at large seems to be the opposite.
Relatively little. Gmail's filtering system just seems to be increasingly erratic, so one Meetup group email ends up in the Spam filter whilst another appears with high importance. If anything, the problem appears to be the opposite: de-emphasising the sending organisation.
How much of this is caused by marketing mailing lists sending messages to people who never subscribed?
I know I get a fair amount of unsolicited marketing list messages that do have an Unsubscribe link, which I click, but I also mark as spam because I never subscribed to it in the first place (of course, I'm also not using Google Mail, I'm using FastMail with a personalized SpamAssassin filter, but I assume it will still influence the global default SpamAssassin filter).
Sounds like there should be another choice after you report something as spam: Spam - ads I didnt ask for. Spam - fraud attempt, Spam - I unsubscribed 5 times but still get updates, Spam - my ex keeps stalking me, etc.
After reading this, I went through my spam folder and it's looking overall quite okay EXCEPT that all the comments on Google+ in response to things I posted these past few weeks are marked as spam. All of them.
Yup: Gmail is marking comments originating from Google+ and written by legitimate users as spam.
Wow, just checked my spam folder and there was an important email marked as spam by Gmail. It was from a known contact I had already corresponded with.
Every spam filter has false positive. You should make a habit of checking Spam folder periodically, whatever email services you employ (unless you do not enable spam filtering).
What would be really fantastic is if Google let you set your own spam threshold. I can't even imagine it would be too difficult. Presumably they determine a numeric 'spam likelihood' number for each incoming email already, so this would just mean being able to customize the threshold that that number is compared against. Obviously entering it numerically wouldn't be very user friendly, but even 5 levels from most to least aggressive (like Spam Assassin and such offer) would be extremely helpful.
Even better would be if you could have different handling for different levels, like black-holing or auto-trashing the absolutely-definitely spam, making it easier to occasionally scan the regular spam box. I get something over 1000 spam emails per day, so it's just not feasible to give even a cursory look over them to find the false positives. I can't even imagine what it would be like for someone like Linus.
Unfortunately, that would draw attention to the fact that the spam filter isn't perfect, and would require users to make choices with tradeoffs, so I can't imagine it's a very attractive option for Google.
I am subscribed to the wine-devel, wine-bugs and wine-patches mailing lists (https://www.winehq.org/forums). Having the exact same issue.
It seems to very easily flag discussion about .exes as spam, it's really disappointing. It's been several years and the filters haven't improved, despite me religiously flagging spam/not spam in those lists.
In the end I just gave up and set up filters to specifically prevent marking incoming emails on those lists as spam. It misses the odd linkedin invitation, but it's not like it was catching it before...
You are "flagging spam/not spam in those lists" but Google may then associate that list (not the content) as being spam or not. Google sees that mail comes from the list and decides based on that.
This is why I never mark mailing list email as spam, even if it is in fact spam.
The unfortunate thing about gmail is its gotten worse than AOL was to deal with on the spam front. I feel like there are historical lessons worth learning regarding where all those @aol.com users went.
1. Everything became "spam"
2. They got to a point where they believed they were the standard
3. Nobody could do anything about it
I can think of many places where this same situation has played out. Its yet to work long term without disastrous results after a reign of technical darkness. That doesn't seem to stop people from thinking it won't happen to them.
Its fine to aggressively fight spam. If you choose to error on the side of false positives then its in your own interest to provide reasonable recourse. If not, you've left a very large gap that somebody else will come in and fill. Just as google did.
It might not be related, but I've been seeing some spam in my gmail inbox in the past month. It seems that something has upset the balance. For example, this went to my inbox:
Same. After years of almost zero spam, recently I've been getting stuff like that about once a week in my primary inbox. Very annoying when my phone dings only to show me obvious spam.
I have the same problem, before these emails would go into the SPAM folder but now even some obvious 'YOU HAVE WON THE LOTTERY !" emails are arriving into the inbox directly.
Have you checked the mail raw text. Maybe it's just disguised as gmail but sent from another third party server. Or spammers found a way to spam within gmail or hacked gmail. Both had happened at least once.
A few months ago, I wrote a simple Android app and put it on the Play Store. Now every two weeks I receive unsolicited spammy emails about ad campaigns and increasing user awareness. Funny how those get through the filter.
They also started delaying or outright rejecting some mail more aggressively so you don't even get to find it in spam. A few days ago I received a confirmation code from my CA sent to hostmaster@ the next morning after I requested it.
What's even worse they rejected email to postmaster@. I know you can adjust the spam filter sensitivity somewhere in Google Apps but come on, you should not reject any mail to postmaster by default.
The real shocker is that Linus Torvalds uses gmail, where you have no control over anything regarding your account (exhibit A: look an awesome new spam filter which you can't turn off!). I would never have thought he'd do that.
FWIW after reading this article, and comments here, I decided to check on the gmail account of a small not-for-profit organization I belong to (I'm the unofficial IT guy). I was shocked, there were 4127 spam messages, and just 98 unread items in the inbox. Slogging through the spam I did find some non-spam mail, but altogether that was <1% of the 4127 spam items.
Of course gmail deletes spam more than 30 days old, so how does it happen than an obscure educational non-profit gets over 4000 spam messages a month? Gmail must be a huge spam magnet, but still a mystery how those messages find their way into this spam bucket. (Unless in the past somebody had abused the account and the email address is on a thousand spammers lists...)
In any case hard to be certain what criteria the spam filter uses to declare a piece to be "spam". Not all the misclassified emails were sent from "private" servers, it would be useful if it was more clearly specified.
Subscribing to high volume mailing lists from gmail is something I would actually advise against.
I had legitimate emails bouncing because the mailing list had put me over the maximum number of emails that a free Google account can recieve in a day. I didn't even know there was such a limit until I hit it.
I now run my own mail server and have none of these or the other problems outlined here.
I've been on the other end of this for a few days - basically any company I do business with who has their e-mail on Google simply doesn't get my e-mails. Sometimes I get an active bounce (i.e., a reject due to my originating address), sometimes... Nothing. No pattern, either. Same destination, different behaviors.
The mail simply does not reach my suppliers'/partners' inbox, and as a result we're all losing time and patience with this
The really funny thing? Some of those people I work with are @google.com.
(And yeah, my corporate domain is clean, SPF'd up to the wazoo, etc.)
I've been actively interviewing for the past few weeks, and I've noticed that a very large number of emails from companies and recruiters (mostly recruiters) have been marked as Spam or at least shunted off to a non-Inbox folder. I don't have any specific custom filters in place, so this is 100% Gmail's doing. I find that interesting and frustrating -- and none of these companies/recruiters have ever seen this before! Seems like a relatively new phenomenon, at least for these people.
This is mainly due to the fact that so many recruiters have poor practices. I have a rule that if I'm contacted by a recruiter (unsolicited) I politely ask them to remove my details from their database. I label their email as unsubscribed and archive the email in Gmail.
When I get a second email from a recruiter that previously has been marked as unsubscribed then I click that 'mark as spam' button.
I've been doing this for eight years, since I'm quite happy with my work situation and have actively tried to remove myself from their databases over that time. However my CV still seems to be floating around, even though I've also deleted it from every online job portal I was signed up to.
At some point most recruiters need to feel that pain, because they just don't listen.
I'm guilty of this too, but it would really be great if we as a society could develop some impulse against putting all of our eggs in the first shiny basket we see.
Been having similar issues as of late. A few very important emails got classified as spam; not nearly 20%, but still enough to compromise my confidence in the system.
Linus's 20% is not a false positive rate -- how has nobody caught this? Let's assume that "the 0.1% false positive rate [Google] tried to make such a big deal about last week" is actually a false positive rate by the normal definition.
The false positive rate is defined as (the number of false positives over (false positives + true negatives)). However, Linus quite obviously calculated the 20% as (false positives over positives), i.e. (200 / 1000). If Linus happened to have 200,000 true negatives (i.e., non-spam messages that were not flagged as spam), a number that I'm making up because he did not disclose one, then his false positive rate would be (200 / 200,200) ~= 0.1%.
Think about it... whether his email address was harvested by all the spambots in the world or none at all would have no effect on the fact that out of 200,200 legit messages, 200 were incorrectly flagged. This is why the false positive formula doesn't even include true positives! The 800 true positives (actual spam messages) don't matter to the formula. Therefore, neither does the 1000 total (true+false) positives. Don't divide by it.
I've been fighting this for months. I run a simple mail forwarder for our local school PTA (so that president@pta gets to whomever is the current president, etc).
I've enabled every possible thing to make it work (DKIM, SPF, who the hell knows) and mails forwarded to my members with gmail (or Google domain hosted) accounts are always getting the forwarded email in their spam folders.
I had a similar issue at some point, where for a month or two, quite obvious "not spam" emails were getting caught in the filter. Nobody else on the internet seemed to be having the issue, and then it suddenly one day stopped happening again. I rarely mark emails as spam/not spam, so I don't think it was anything I did.
I looked at my own spam folder, there were 51 messages. ~15 of them were from recruiters and probably meant for me, ~20 of them were newsletters I had probably signed up for, but didn't miss at all and the rest were split evenly for people trying to sell me prescriptions and people trying to use my bank account to hold money. Overall, 50% of the messages were actually FOR me, but it's totally fine that I didn't see them. I'm definitely wondering how many of Linus' messages were actually important enough to not belong in spam. I understand that he probably receives a lot more mail than I do, but I'm wondering how many of those messages he actually missed. Other than the ones that are part of threads he's replied to.
> We also recognize that not all inboxes are alike. So while your neighbor may love weekly email newsletters, you may loathe them. With advances in machine learning, the spam filter can now reflect these individual preferences.
I really wonder how these preferences are reflected if someone hasn't been using spam filters much. I get maybe at most a dozen spam emails escaped a year, and the rest of the emails go through (all mailing lists... I am on a lot of mailing lists) so I get about 50+ emails per day from just DL. So I barely ever need to mark something as spam, or even more something to another folder, so how does Google know what's my preference? It sounds like people who don't actively mark spams are less likely affected...
From looking at that, it appears to be a set of tools that high-volume senders of email can use to help ensure they don't get incorrectly flagged as spam. It seems to do nothing to help high-volume receivers of email.
The false positives with GMail really sucks. Case in point literally all Microsoft emails relating to billing on some of my products simply go the spam (my credit card recently updated and I forgot to update it everywhere). So thanks to this post I noticed and can go fix it!
Checked all my spam boxes on my various gmail accounts, seems nothing bad has happened with mine.
At least they're nowhere near as bad as Outlook. I have one of my domains on their free Live Domains (grandfathered plan, can't get it for free anymore, similar to google apps) - and 90% of my emails end up in spam, even if they're from a reputable company with sane mail setups such as Digital Ocean, Github or even Google.
To make it worse, with Outlook you can't turn off the spam filter, and it's known that Microsoft sometimes SILENTLY drops emails for various reasons so they never even make it to your spam box...
Sadly I've yet to find any decent replacement mail service for my domains that's free (or very cheap) and of decent quality.
A Googler on HN was kind enough to get in touch with me and help bring the problem to the attention of the right people at Google, who fixed things on their end and made a few suggestions on my end. Presumably, Linus Torvalds will get about 1000x as much Google love as I did.
Good idea to check often though: I just discovered a few LiberWriter customers in my own spam filter... :-/
All of the emails my mom sends me are tagged as spam. I have a label that forces them to go to my inbox, but I still get a warning that gmail identified her emails as spam, and there's no clear way of over-riding it.
I fear SMTP is going to go the way of RSS at the hands of these giant corps. Closed protocols within machine gun lined walled gardens are the future. Sorry old idealistic computer hipies, we've failed you.
I think that the parent is talking about how much power Google has over mail delivery within its own system, which encompasses a larger number of people. If Google decided to switch to a non standard protocol, then smtp'ers would be left high and dry as Microsoft and Yahoo possibly follow suit to maintain an advantage.
Speculation of course but it does point out the problem with Google becoming too big.
SMTP is the only mainstream mail protocol that leaves the end user in control. That must bug companies like Google something fierce.
I still run Outlook 2010 with Paul Graham's SpamBayes plugin. It's far from perfect -- a whitelist would sure be nice, and how hard can it be to automatically refrain from marking anything as spam if I've emailed that sender before? -- but the fact that I can train it myself and adjust its classification parameters covers a multitude of sins.
I've resisted a lot of peer pressure to use server-side email over the years, both commercial and FOSS. I fully expect to be using SpamBayes in 10 years, probably with a wheezing, clunking copy of Outlook in a VM.
Interesting. A few days ago I stopped receiving pull request emails as well. Granted I routinely delete these after I process them - but I was alarmed when I checked my spam and saw them there.
I've had the same experience with the LKML lately (been subscribed for about 8 months, generally spam filter has had v. few false positives), can't remember exact numbers, but big, big chunks have been incorrectly labelled.
There seems to be a fair bit of spam sent to the LKML, I don't know whether there's been more lately, but perhaps the large amount of email sent to many people for the LKML and the fact there's a decent amount of real spam sent there, combined with a more aggressive setting is an explanation?
Data point: of the 28 messages in my spam folder just now (recently cleared down after picking out real LKML messages), 8 are LKML, though all are legitimately spam atm.
This is an individual who, in response to broken version control systems wrote his own version control system which he uses, primarily, to control versions of the operating system kernel that he wrote.
How is it possible that the above described person is moving his wrist around going clickity-click with a mouse just to read an email ?
Even if gmail keyboard shortcuts were anywhere near as good or fast as alpine (they aren't) you still have big bloated responsive AJAX/js pages loading for every single action.
With alpine over ssh I can use email on a bad 2G cellular connection with no hassle at all while google won't even load the inbox.
We've been seeing a huge number of messages sent through Google Groups via our Google Apps domain flagged as spam. I'm not sure what has triggered it lately, but it's almost impossible for us to communicate through our Google Groups email addresses anymore.
Gmail spam filter is definitely not working well for me. Obvious spam is considered legit and emails from senders that I constantly report as "not spam" still get into the Junk Mail folder. Outlook.com's spam filter is doing a way better job for me!
Every week I usually get about ~200 spams. However last week the number was really high, some ~600. And without checking I deleted all of them. Now I am getting worried, if I deleted any legit email :(
Gmail is not my primary email, but still I do get important emails.
Funny, I used to get a ton of false positives in my gmail Spam folder (mailing lists, marketing I subscribed to, forwards from another address) but with the recent changes, I have just 1 false positive of 476 in Spam.
I wouldn't be surprised if somewhere at Google, some team are looking at Gmail stats and congratulating themselves on how much more "spam" they've blocked with their latest algorithm.
I could concur on the false positives -- I've had to repeatedly salvage mailing list posts marked as spam. Normally from the mailing lists where I lurk and do not actively contribute, but still...
All I see in my spam folder is people trying to sell me sex, drugs, credit, and fake Ray-Ban sunglasses. A quick review of 200 spam messages found one false positive from a whitewater rafting company.
I get about 600 emails a day because I forward everything* from all my email addresses through a single email account, and I only see 9 spam messages in my spam box, and they are all spam.
Just about the only thing that google persistently misclassifies as spam for me are logwatch emails. There must be a large number of people who don't know how to turn them off...
lot of variables when it comes with spam detectio ... I guess google has oversimplified some controls, and lot of private server with good reputation are closed outside.
In our company , ( we are a hosting email solution ) lot of time and human resources are spent to monitor and work with the work produced by the spam filters. I guess it is too soon for this kind of solution to be sostituite with AI.
So it seems this is clearly happening. The question is what to do about it?
1: Switch from gmail. (I've done this - using Fastmail)
2: Increase publicity to drive Google to change
3: It's arguably abuse of monopoly power --> take it to court
Spam-detection is pretty much solved, with black-lists and white-list. In some countries it's illegal to send spam, so you can safely add them as white-list's.
Gmail is terrible in so many ways - Being randomly locked out of your account, the clunky user-unfriendly interface, the difficulty of marking certain senders as spam, ...