Facebook already has access to a larger repository of photos going back over a decade and all the years inbetween along with decent face recognition to create a much bigger dataset than resorting to a hashtag challenge. But I guess that wouldn't be newsworthy.
Facebook already has access to a larger repository of photos going back over a decade and all the years inbetween along with decent face recognition to create a much bigger dataset than resorting to a hashtag challenge. But I guess that wouldn't be newsworthy.
This was addressed in the article:
In various versions of the meme, people were instructed to post their first profile picture alongside their current profile picture, or a picture from 10 years ago alongside their current profile picture. So, yes: These profile pictures exist, they’ve got upload time stamps, many people have a lot of them, and for the most part they’re publicly accessible.
But let's play out this idea.
Imagine that you wanted to train a facial recognition algorithm on age-related characteristics and, more specifically, on age progression (e.g., how people are likely to look as they get older). Ideally, you'd want a broad and rigorous dataset with lots of people's pictures. It would help if you knew they were taken a fixed number of years apart—say, 10 years.
...
In other words, it would help if you had a clean, simple, helpfully labeled set of then-and-now photos.
It tries and fails to address it. I could barely count the number of zeros I would have to put in front of the 1 in the percentage of Facebook's photo data this meme covers. And a good portion of their dataset will have EXIF timestamps. Training an algorithm on the meme only would be insane waste of their data set.
It used to be the case (still is?) that the dates on uploaded photos weren't applied to the photo album. I remember having to go through holiday snaps and change the date on each from the upload date to the actual date. The images were also resized down from what was uploaded.
So, if they have the originals with the full EXIF data, I'd like to be able to use that for my old photos!
Chances are they have and you won't be able to use them.
Why would it be so ? Because it profits facebook which is the only reason facebook exists. It profits them to have original with EXIF for data mining and you gave them permission to do so while also giving them the data, and it profits them to not make them available to you to save on bandwidth and processing costs.
eh, I think not. There is a reason why they resize the photos.
To save space. Even at facebook scale the amount of space they save by doing this must be enormous.
Additionally, images used in AI are usually scaled down a lot more, 224x224 for something like resnet50. which means that they do not need your high quality original and the smaller one they generated are fine.
while I'm sure they already have a great dataset going back far, most of that dataset will be a small fragment of the world that already had internet 10 years ago, in the mean time a lot of people in other parts of the world have come online and may have old pictures of themselves
Lots of people have said this, but Facebook has upload directly from phones to their server, right? So I would be shocked if they aren't trusting the time tags from direct uploads and metadata set by the camera (which they strip before displaying, but surely they ingest it) more than they trust upload times.
Also, people tag themselves all the time, and if you upload a photo with people, you can tell quickly that they already identify people with high accuracy, because they suggest tags for you.
Seems you're missing one thing here, rewind 10 years back and the iphone 3G got released at the end of that year so not many people were using phones to take pictures and even less uploading them as their data plan did not allow when coverage allowed to do so.
You're really reaching now. People were regularly using phone cameras long before smartphones. By the time smartphones were a few gens in it was ultra common, and a big benefit of them (that even non-technical people could understand) was that you could use wifi whenever possible and avoid data charges (which really weren't much higher then than now in most markets; "unlimited" plans were more common too). Also the 3g has a 2mp camera, so we're talking about pictures that are 2-3MB at most. The suggestion that people were shy about uploading relatively small photos to Facebook circa ~2008, and that this supports this flimsy story in any way, is sheer nonsense.
I'm no fan of Facebook - am a long-term outright refusenik actually - but the conspiracy theories are getting out of hand. There is zero substance to this article, it's wildly speculative clickbait.
In 2008 Facebook itself was in its infancy. Orkut was still the largest social network, only toppled around 2011. The majority of the world was definitely not uploading any phone pictures anywhere.
That's quite an overstatement, to put it very mildly. Facebook was allowing open non-.edu signups by 2006, and the buzz around it from it's school success was immense. By 2008 it certainly wasn't seeing a critical mass of boomers and other late(st) adopters, but it was still huge by any measure - 100 million users, and growing with unprecedented speed.
People were absolutely already uploading phone pictures to FB and other sites by then; I think there may even have been Facebook apps shipping on non-smartphones by that time, it was one of the earliest things carriers used to flog data plans. I agree that "the majority of the world" wasn't uploading phone pictures anywhere by then, but then I'd be surprised if that rather high bar has been reached today either.
The matter at hand is collecting user pictures for mass scale machine learning. I and everyone I know didn’t even join FB until 2011-12. 100m is nothing compared to the current 2.2B user base who is posting annotated 10 year old pictures of themselves. This is to counter the parent comment that “they already have this data anyway”, not an absolute statement on FB growth.
The percentage of people lying about it, at least before the stage where people start piling on with jokes, is probably much lower than the general noise in the data set if you were to try to find the photos you want without help.
I know that Facebook's actions mean that it no longer deserves the benefit of the doubt but this seems like a non story that someone really wants to be a story.
Thinking about it lest say 10,000 people respond, is that even enough data to move the needle? Which photo do you use for the old vs recent? There is alot of cleanup that manually needs to be done for this to be a decent data set. Basic common sense says this is a non story.
I did my post undergrad research in 2000 in neural nets and the data sets were our biggest limiting factor, second was computation time. 10,000 data points was a huge set back then and still wasn't enough for most tasks.
You continue to post unsubstantive and/or uncivil comments after we've asked you many times to stop. So, we've banned the account. We're happy to unban accounts if you email us at hn@ycombinator.com and we believe this will change.
Why do they need to be Indian? Sure the data would reflect that a plurality of global low wage workers are from India, but there are also plenty of low wage workers who are not ethnically Indian or geographically based in India (such as all of the Americans working on r/beermoney).
Did you really just belittle my 60+ hour work weeks and 45 minute commutes as some frivolity?
Do I have a choice about whether or not I have to hold down a job? No. No I don’t.
Do I have a choice about the level of competition I must bring to the table to simply tread water? No. No I don’t.
If I had a choice, I’d never program another piece of code again, but hey. Yeah, beer money. Like I have a life whereupon I enjoy it enough to drink beer.
I did not interpret matte_black's comment as belittling at all, the post being responded to literally talked about people on the r/beermoney subreddit.
"My flippant tweet began to pick up traction. My intent wasn't to claim that the meme is inherently dangerous."
He spun an offhand comment into full-on opinion manipulation, because we're now reading his article on the topic that not only has a clickbait headline, it seems to imply there's something more to the story, when in fact there isn't.
So now we've got all this digital ink spilled on the (entirely hypothetical) topic, and plenty of eyeballs buying it with their attention. But all of it is vapor, even at the admission of the authors.
The issue with internet nowadays to be honest. As soon as you got a bit of traction with a tweet or with a blog post people try to milk it in order to market themselves or other narcissistic interests.
It's not the internet. It's media in general. They need to create an artificial story to make a profit. Now that the internet is a profit making center, the media and its tactic has found a welcome home in the internet.
Hence the "arctic blast" about to "ravage the east coast". Or as someone who has lived in the northeast, just winter. Or any other superlative clickbait. Everything is a crisis, everything is a disaster.
You are replying to an article written by someone named "Kate O'Neill", in which they include a photograph of themselves via Twitter, and you are referring to them as "he"?
Facebook and Google's facial recognition software is so advanced that they have no real use for photos of people explicitly tagged 10 years apart.
Google Photos has been able to track my goddaughter from literally her first photo (when she looked like an alien) to now (5 years later), with about 2 photos per year.
The subtle point here is that people have became so suspicious of these platforms that everything they do is observed with a sharper eye on privacy, speaking of which... I wonder how Portal is doing?
A funny trend I've been seeing is posts from /r/conspiracy being mined by journalists for their stories. This exact idea was posted a few days ago to Reddit, and is not this tech writer's idea.
Reporters combing Reddit for story ideas has been going on for years and years. Heck, it might be worth an experiment to see how quickly someone can start from scratch and put together a portfolio of clips this way. The pitches practically come prewritten.
I saw it as well, honestly I think journalists (the kind that are closer to glorified bloggers) have been doing this for quite awhile though. I can't remember what site it was for sure, maybe cracked, but I remember looking at the recent articles and it looked like a tl;dr of some of the top reddit posts of the week.
This research has been in existence for over a decade[1]. Clearly this would be a valuable dataset, but as others have mentioned, Facebook has probably the most valuable dataset. Realistically, the biggest hurdle in modelling aging is in children. The bones/muscles/everything are so elastic that it makes it difficult to accurately predict how they will look. The primary use for this tech has been for catching high-valued individuals that have gone in hiding or children kidnapped into human trafficking (hence the focus on modeling child face growth).
This entire thing sounds more like someone made a joke about "Big Brother always watching" and people without a real understanding of what's possible freaked out when they realized it is.
This is idle speculation from somebody who has no idea what they're talking about. Because it was published on Wired it has gone viral.
It's just technical enough that most people who don't have a clue think that it might be right so they spread it.
Anyone who knows about data processing, programming, or AI knows that it's a very stupid idea due to easy-to-implement fault tolerance (such as random dropout) in machine learning models.
This seems more likely to be a marketing move to me than a covert request for AI training data. They’re struggling with engagement, so they seeded the 10 Year Challenge causing users to invoke the powerful emotion of nostalgia, made easy thanks to Facebook keeping all of your photos safe ;)
This is saying a lot about what people think of Facebook these days. I don’t believe that FB wants to gather data here. This is probably an idea coming from their marketing department. But hey, why would you not think they’re evil after everything they’ve done?
What I like about this tweet is that society starts changing and realizing what possible things could be done with information that is shared. Good to see more critical thinking evolve when it comes to social media.