Hacker News new | past | comments | ask | show | jobs | submit login
A Nixon deepfake, a 'moon disaster' speech and an information ecosystem at risk (scientificamerican.com)
126 points by mrkn1 on Aug 14, 2020 | hide | past | favorite | 78 comments



This Youtube channel has lots of other great examples of whats possible with voice synthesis today: https://www.youtube.com/channel/UCRt-fquxnij9wDnFJnpPS2Q

Some of my favorites include Sinatra singing ABBA's Dancing Queen [1], six presidents rapping the NWA classic [2], and Milton Friedman rapping 50's P.I.M.P. [3]

[1] https://www.youtube.com/watch?v=zo_w4KGifug

[2] https://www.youtube.com/watch?v=mAZVp-n-5TM

[3] https://www.youtube.com/watch?v=4mUYMvuNIas


I'm struggling to understand how they're great examples. They sound absolutely atrocious and nowhere near believable. They sound really, really bad, like something an amateur with zero editing experience could put together with cheap equipment. They're so bad they're difficult to listen to, especially the Sinatra one.


> I'm struggling to understand how they're great examples.

Taking 'Six U.S. Presidents read "Fuck Tha Police" by N.W.A.' I certainly agree there are long sections where the synthesis is obvious - stilted sounding, overlong pauses between words, changes in tone in the middle of a sentence, background noises / changing audio quality accompanying different words and speakers, and so on.

But there are sections where it manages to generate several words in a row without those problems. Like at 23 seconds, the section "so help your black ass? / Ya goddamn right / Well, won't you tell everybody what the fuck you gotta say?" sounds pretty realistic.

If you blame the bad sections on lack of training data for older presidents, take the good sections as proof of what's possible, and imagine the entire recording will be that good in 3-5 years, then it's an impressive demo.


Agreed those are not the best examples.

This one (JFK reading the Rick & Morty copypasta) strikes me as much more convincing however:

https://www.youtube.com/watch?v=BkH2zEHJ2Ng


The Trump/Darth Plagueis copy pasta got a chuckle out of me.

https://www.youtube.com/watch?v=LEzIAixNkFI


Agreed; the Nixon deepfake in the OP is pretty good, but all three of these sound immediately artificial.


This sounds like a GPT3 response :)

Note I said "great examples of what's possible today" not "great examples of JFK recording NWA's Fuck the Police".


> This sounds like a GPT3 response :)

If you're going to try to insult someone, own it and leave off the smiley. Don't hide behind that weak passive aggressiveness.


My apologies, it wasn't meant as an insult. I was trying to make a point and added a smiley face because I thought it would certainly be misconstrued as insult without it.


I think the team behind this comment thread accidentally substituted the GPT instance tuned for Reddit for the one tuned for HN.


Why would you prefer actual aggression to passive aggression?

For what its worth, I don’t think the OP was intending to be insulting; this seemed more like an inside joke designed to foster camaraderie.


People who act like bot appraisers on the first sign of disagreement only foster annoyance.


Skin in the game.


> Why would you prefer actual aggression to passive aggression?

Probably for the same reason I disliked the Sinatra content so much, too obviously fake.

The whole of the parent comment was that single remark I quoted, before they edited the comment.

They were intending to lob a passive aggressive insult on the lighter side. They didn't like my comment, so they fired off a shot about my comment being equivalent to a questionably human, possibly mediocre GPT3 text.

I don't mind that sort of mild insult in response, I just prefer it without the unnecessary facade of the smiley.


I thought they were saying the defense was like defenses of gpt-3, that it will be good someday, or it's good if you are very selective about the output, not that your comment was poorly written.

One of those readings is just musing about the state of all this "almost there" tech, the other is just randomly insulting someone's writing.

That does seem like pretty dangerous ambiguity. Akiselev maybe wrote a blue gold dress of comments where one reading is mean-spirited. I doubt it was intentional, but could have been clearer.


It's pretty cool though I prefer H. W. Bush's feature with the Geto Boys [1] and in terms of presidential rap, Bill isn't the best Clinton [2]

[1] https://www.youtube.com/watch?v=6IJCFc_qkHw&t=213

[2] https://www.youtube.com/watch?v=7NltUHA22oQ


This is a fake, sure. Maybe it's not perfect, but the software is improving. So within a few years, I'm guessing that fakes will be indistinguishable from true recordings.

But this was based on the text of an actual speech written for President Nixen, to be delivered if the mission failed. So isn't it likely that he practiced it, just in case? And if he did, it might have been recorded, more or less accidentally. As Reagan's quip about nuking Russia was.

And so as others note, it becomes crucial to look at the historical context.


> As Reagan's quip about nuking Russia was.

Oh this is interesting, do you have any sources where I could read more about it?


While testing an audio setup in 1984, Reagan joked that Russia had been outlawed and “we begin bombing in 5 minutes”.

https://en.m.wikipedia.org/wiki/We_begin_bombing_in_five_min...


It is one thing to make apocalyptic predictions - they could be reasonable conclusions of possibilities, unreasonable, or anywhere in between. They can be discussed on their own merit. Heck even why they personally conclude something may be discussed and considered, and something to try to understand even if disagreed with or considered wrong. But the "here is how you should feel" leading bullshit is a clear run-arround for critical facilities that takes far too many people.

Nothing makes me write a source off as a bad actor more than panic mongering that /tells you/ how to feel as opposed to leaving it to you to decide. It is not only a sure sign of a manipulator because of scared people being easier to manipulate. It may be unduly harsh, caustic, or sharp but I have absolutely lost patience for this tactic.


With you 100% but unfortunately it works too well. All of the top grossing news outlets abuse the hell out of this technique. I don't believe it'll get any better and more now then ever people will need to learn to think about where their news is coming from.


I am reading 2000 year old Cicero, and it’s actually very insightful in the art of orating. As you say, an issue can be called upon its own merit.

However critical thought is not a skill we have invested in, going by what is happening in the world, and the US in particular.


I don't think that the prediction "deep fakes pose a risk to an information ecosystem" can be described as apocalyptic. I feel like I'm missing sonethis as this article seems like a fairly un-opininionated puff piece.


So...you are saying that no one should ever make an impassioned argument?


The Onion's take on apocalyptic predictions: https://politics.theonion.com/conservatives-warn-radical-kam...


Did everyone forget about Prop 8 out here?


I was kind of expecting that this was going to demo some step forward, but this has all the same tells has the last several high profile deepfake demos.

> Voice is clipped

> Chin

> head bobbing


Knowing the speech was a fake and paying attention to it "skeptically" I noticed the double head-bob as an odd thing which seemed suspiciously convenient as a transition between cut-n-pastes.

Now I'm going to have to go and watch archival Nixon footage to see if the same thing is there or not :)


> head bobbing

I wonder if you could use a GAN to do a post-production fix on them. Teach it what actual people's head movements look like, and then get it to stabilize the image as a second pass.


For now, and you're looking for it.

I remember when cellphone cameras first came out going "ffs, who'd want that? it's a tiny, low quality photo".


What made it 'real' was the actual speech itself. It would have been hard to 'fake' that.

The 3 minutes of 'intro blast off' (that could have been 10 seconds) and the Flash-like landing page are really problematic.


An AI can't fake a speech about a subject they never spoke about... yet. Though, to be honest, I think it wouldn't be hard.

1) Collect a corpus of speeches.

2) Use style-synthesis techniques, just like in the art-style synthesis demos.

3) Input a speech about a spaceship disaster and run it in the "style" of Nixon.


> An AI can't fake a speech

Nonsense.

"It is with a heavy heart that we must inform you that Apollo 11 has failed to return from the moon. We have lost contact with our crew, Neil Armstrong and Edwin Aldrin. When they left earth seven hours ago their fate was unknown but now it is certain. They died as men should, ready to die for what they believed. This loss is a tremendous loss for our nation, for their families, and for mankind but their sacrifice was not made in vain: It is a testimony to courage, determination, and human achievement that will be remembered by all who witnessed it today and for all those who come for all time so long as man walks this world or any other. For this tragedy will not mark the end of space exploration; it will strengthen our resolve to continue advancing space technology and human knowledge. We have enjoyed the fruits of their labor and shared the pride of their achievement just as we now share their isolation and grief. We will honor their memory by continuing that work and by taking that faith and that dream into a future they will not live to see but helped to build. We will remember them and we will remember what they stood for: The greatness of man and the hope for a better tomorrow."

This isn't a first shot output, I had it retry a bit and guided it a little, mostly to get it to write a longer speech instead of a short quote. I think it's much better than I could have written on my own in a minute or two.


It's an impressive draft, better than what I expected, but it needs a lot of polishing. For example:

> We have enjoyed the fruits of their labor

Or this:

> This loss is a tremendous loss for our nation, for their families, and for mankind

the official speech use an increasing order, and put "friends" as an intermediate step between "family" and "nation". You want to put "family" so your don't sound insensitive and as a blow under the belt. You want to put "nation" because you want to squeeze a few votes from the tragedy and also avoid been blamed. You want to put "mankind" because you want to hide that this was a stunt in the middle of the cold war. (The official speech says "people of the world".)

There is a small risk that GPT-3 is just retrieving a distorted version of the speech from the multiples sources that were available. Like if you or me are forced to rewrite it from memory. Let's try a different scenario, like: "Yuri Gagarin got toasted during reentry, and for some reason we decided to not hide it."


You're kidding me. It's not a great speech but it is incredible for a machine-generated speech. Unbelievable. I love it!


I know that discourse is at an all time low, but you could try reading past the 6th word in my post.

In other words: Try generating a speech about "Apollo 11 disaster" without previously existing text corpus about any space disasters.


Wow, did you make this using GPT-3? Is the first sentence yours and the rest generated?


All the text in quotes was GPT-3 generated-- with a little help from me (e.g. when it went in the wrong direction-- e.g. ending the speech too early-- I made it go back and try again or clipped out the dumb part and had it continue).

I prompted it with some mission description and said that a speech was prepared for president Nixon that in case the astronauts were stranded.

GPT3 doesn't yet do consistently GREAT output without some guidance, partially as an artefact of the generation procedure. But with a little help it does very well.

The issue is that if you just take the most likely symbol it'll rapidly go into a loop of just copying text or other degenerate behaviour. So instead, everything uses the model by sampling it-- taking less likely choices by chance weighed by the model output. Unfortunately, that means that an unlucky draw will occasional paint it into a corner. If you see that happening you can just go back and try again and you get much better output.

If that is a fair comparison depends on what your application is... if you need to to run unsupervised, it isn't consistently great. If you just need a first draft out of it or some raw ideas to turn an hour writing task into a 5 minute one, it's great for that.

I don't think this kind of manual assistance is much of a cheat either-- a real speech writer also gets exactly this sort of help from others.

[And FWIW, I did this via the GPT3 based mode in the ai dungeon video game. ... I don't have access to the GPT3 API.]


I remember reading that 1 minute of an important speech of a politician requires about an hour of preparation by a professional speech writer. And I guess that a failure speech is even more difficult to write.

The current level of AI like GPT-3 can generate fluffy text, very good fluffy text, but still can't generate a text for this kind of speech.


> but still can't generate a text for this kind of speech.

I disagree. Maybe the output I gave above wouldn't quite pass for the output of one of the greatest speech writers of the time, but most speeches are rubbish and I think what you can get out of GPT3 well operated is not at all rubbish.


Yes, but I wonder if that would have been specific enough.


I was expecting the Walter Cronkite part to be the fake thing here in a clever twist.

They'd pass through the Nixon questions and then say "Alright, but what about Walter Cronkite?" My true advocacy would be to use another contemporary big three anchor from the era as the deep fake, then show the real Cronkite footage during the quiz at the end

That's the real lesson IMHO, not if you can tell when you're prompted to listen for it, but when you aren't.


People have been doing this since the cannonballs photo in the Crimean War.

If you think a fully fabricated video is the key to lying to people I have a poker game you can join.

Here's a discussion of how far you can distort video today, in a way that is way more subtle and harder to rebut than a total fabrication:

https://youtu.be/tEGqepsFTbI


If deepfakes are so good, why do they always look and sound so crap?


They don't need to be perfect. They just need to be convincing.

Consider MP3 files. Any audiophile will tell you that a compressed MP3 file is a piss-poor representation of the true sound. The best experience is to listen to a band live, then vinyl, then FLAC.

And yet the majority of people listen to MP3 files. They strike the right balance of file size and sound clarity for an overwhelming amount of systems, doubly so since online streaming took off. So now that people have become accustomed to the sound of MP3, they are not used to anything else. They are convinced that an MP3 file is the "true sound" of the song.

Right now deepfakes is in its infancy. Personally I think it is improving at an alarming rate. If I shared the moon video on Facebook I am willing to bet most people would think it is real. They don't notice the clipped speech, the subtle double head nods. Their minds are not as critical of video as you and I are. So to them, they are convinced it is real.

What happens when only machines can detect if something is authentic or a fake? What happens when not all site administrators scan for videos and fail to mark them as illegitimate? What happens when courts use deepfakes unknowingly as evidence to convict someone, or impeach a president, or fire a worker for sexual harassment? We have already seen what happens when "verified" twitter accounts are compromised - what if a CEO puts out a video announcing some controversial new endeavor, or admission to fraud?

There are very real concerns about this technology and I believe it will very soon become a weapon that takes misinformation and turns it into very real consequences.


Misinformation is already a huge issue; deepfakes can only exacerbate and compound the issue in ways we can't even imagine at this time.

Imagine deepfake used to create false alibis, 911 calls, etc... That's probably not even the tip of the iceberg in coming years.

The fact we went in KNOWING it's a deepfake gave us an unmeasurable advantage/bias. For uninformed individuals that "stumble" upon this on the web, one can only imagine how it'll playout. This thought actually terrifies me... imagine a hypothetical (and conservative) 10% overall improvement AND production price reduction to this video every 6-12 months.

@SamuelAdams: Wish this site had a messaging system. Would be interesting to have a 1 on 1 discussion on this topic with you.


You mention phone calls. Right now robocalls are very common. But what if those random calls legitimately sounded like your parents, or your spouse, or your kids. Furthermore, what if they sounded like they were in real trouble and really needed money fast? This technology opens up whole new arenas to more legitimate scams.


This is already being done to scam high profile targets.

Example: https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice...


I'm sure deepfakes will be used in fake kidnappings, social engineering, scams, etc. I can't begin to imagine the scale of damage this can cause. The reality is that a "well" executed 10 second deepfake can cause public panic, etc. The potential for abuse is just too high.

Its not a matter of if, it's a matter of WHEN this is economically and technologically available to the masses. The next 5-10 years is going to be a very confusing time to be alive...


How is the best experience to listen to a band live? I have had literally the exact opposite experience in life. Concerts are lots of fun, but when its comes to listening to the actual music at the concert, unquestionably not even close to the quality of a sound booth.


They definitely don't all look crap, although speech style transfer is still pretty rudimentary. The first point I'd make is that you can mask most of the defects of current deepfakes simply by degrading the recording quality. A mediocre voice deepfake sounds pretty damned credible when played through a cellphone with poor signal and a bit of background noise. A mediocre video deepfake appears wholly believable if you post-process it to look like low-res CCTV. For adversarial applications, this would tend to increase rather than decrease credibility - we would expect a covert recording of someone doing something illegal or shameful to be of poor technical quality.

Secondly, we do legitimately need to be concerned about what deepfakes will look like in five or ten years time. We're making substantial algorithmic improvements in efficiency and quality, coupled with a multi-billion-dollar race to improve computational performance for deep learning tasks.

Deepfakes may never improve (unlikely), they may gradually improve with better algorithms and more compute power, or there may be a sudden breakthrough in either area that makes 100% convincing deepfakes commonplace. We could wait until that moment to start thinking about social and technological countermeasures, but I wouldn't recommend it. If there's anything to be learned from 2020, it's that we should be investing a lot more in preparing for low-probability/high-magnitude risks.

https://www.youtube.com/watch?v=Ho9h0ouemWQ

https://intelligence.org/2017/10/13/fire-alarm/


You're getting down voted, but I agree. While the footage represents a technological marvel, it doesn't fill me with dread that there is some sort of deep fake genie waiting to jump out of the bottle.

I think the choice of speech is also quite telling. They chose a target with low video and audio resolution and a lot of interference that could mask some of the imperfections of their algorithm. Despite all this, oddness induced by the deepfake process is still quite evident.


This is absolutely remarkable.


Definitely felt a little off, but if I didn't know that this was a deepfake, I would probably accept it as real. As usual, I don't know if that fits more in to the "remarkable" or "disturbing" category.


Eh, I'm not entirely sure I would have. The effect is near identical to when you have one person behind another, pretending to be the front person's arms and hands - the motions just don't match up right.

Maybe people who use faces more than full body language are more affected by this particular one?


I kind of agree, but it's hard to separate knowing that it's fake from my reaction to it. If this were presented without that context, I might feel like it's off, but not necessarily question it, especially if it came from a source I believed to be legitimate.


I also think, on top of all your points (which I agree with), keeping it shorter would make it more convincing. I think with the current usage of social media, short clips created through some synthetic means will be harder and harder to identify.


It _felt_ real. Disturbing.


I remember watching a TV ad where Michael Jordan around the age he was when he retired played 1-on-1 against himself as a college player. This was 15-20 years ago & I wrote an essay for high school English class on how that ad shows we can't believe our eyes when it comes to what we see on TV. It was visually much more convincing than the examples I see here, so I can't help but think deepfakes haven't quite caught up with professional editing, at least not yet.


For the majority of those on HN who didn't witness the live event this is a pretty good summary. It really took me back to that day as our entire family was glued to our black and white TV watching the man known to all as Uncle Walter. Watching President Nixon twitch a bit was the only tell that this wasn't real.

Got me to imagining the video where Eisenhower confesses D-Day was a failure, the actual speech he wrote does exist.


It works as a PoC, and it could fool me if I saw it at the edge of my vision, on my kitchen's TV. But otherwise it looks odd, and most importantly it sounds like the weirdest vocoder in existence.

We still can't truly fake instruments, never mind the human voice.


> Beyond this illegal and harmful use of face swapping

Is face-swapping illegal? What law is it breaking?


I really hate the fake static in this video - as a 7-yr old I watched the moon landing in 1969 and nobody's TV was that bad! If I had a TV that bad I would either repair it or throw it away!


My website has 40+ celebrity and cartoon voices:

https://vo.codes

I'm working on voice conversion (voice to voice for TeamSpeak/Discord) now.


Any chance you would be willing to share the (finetuned) models? For science? I’m interested in detection and want to look at artifacts common to speech audio generated by different models all based on the same overall implementation.



Deepfake stuff is not a threat, it's a vaccine. There finally is a chance people will realize they can believe nothing and nobody, news lie and mislead and they absolutely have to triple-check everything before they take that in consideration.


This destroys democracy. Because you can't check everything yourself, or even a tiny amount of things, so you have no meaningful basis on which to tick that box on the ballot.


Is it that much worse than the effect that photoshop had? We have been able to fake pictures/images for a long time now. What changed is that people no longer blindly trust pictures. I assume something similar will happen with video.


You never had it. It has always been an illusion. Now this just got harder to ignore.


Great. Now what?


Trust in media with a reputation. Trust in journalism to unearth attempts of deceit. Quality journalism and reputation will become more important and powerful than ever


The "can't check everything yourself, or even a tiny amount of things" argument applies to everything. It's even more work to verify - for example - a small segment of a real hour-long Trump rally, carefully chopped off in the middle of a sentence to push a bullshit interpretation of what's being said, like I caught Snopes doing a while back.

Hell, even text suffers from this problem. Back when the UK (briefly) hit their target of 100,000 Covid-19 tests a day, the BBC News website pushed a bullshit claim that Germany was already averaging that many a month earlier. This played well into all the existing narratives about British exceptionalism, Brexit, inferiority to Europe, and the incompetence of our Covid response. It was also trivially verifiable as untrue - the German testing numbers were up on the RKI testing website, in English, and not only were they nowhere near that a month earlier, they were still well below it when the BBC published that claim. Someone at the BBC had mistaken the number of tests German labs had the capacity to process a day for the actual number of tests, which was particularly bad since literally every other part of Germany's testing process was more of a bottleneck than lab capacity. The BBC then kept this claim up and prominently linked to the article it was in on their front page for a month after they knew it was untrue. A large proportion of the UK population probably saw it. How many people do you think spotted the error? (They've since decided that because they've memory-holed it, they don't have to append a correction.)


What is the Democratic country you're from? Didn't know they existed outside a few small tribes.


3m06s to 3m15s seemed a bit cheesy and stagey


Has something positive came out from the world of IT in the last 20 years?


It happened


Bullshit. Easily 80% of broadcast news today is fake, either overtly, or by omission. It only exists to advance a narrative, not to inform or support debate. As Denzel Washington quipped: "If you don't watch the news, you're uninformed. If you do watch the news, you're misinformed."

The ability to produce fake news artificially as well won't tip the scales even slightly, because nobody with a brain trusts the "news" anyway.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: