Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming Software Technology

Adobe Is Working On 'Photoshop For Audio' That Will Let You Add Words Someone Never Said (theverge.com) 161

An anonymous reader quotes a report from The Verge: Adobe is working on a new piece of software that would act like a Photoshop for audio, according to Adobe developer Zeyu Jin, who spoke at the Adobe MAX conference in San Diego, California today. The software is codenamed Project VoCo, and it's not clear at this time when it will materialize as a commercial product. The standout feature, however, is the ability to add words not originally found in the audio file. Like Photoshop, Project VoCo is designed to be a state-of-the-art audio editing application. Beyond your standard speech editing and noise cancellation features, Project VoCo can also apparently generate new words using a speaker's recorded voice. Essentially, the software can understand the makeup of a person's voice and replicate it, so long as there's about 20 minutes of recorded speech. In Jin's demo, the developer showcased how Project VoCo let him add a word to a sentence in a near-perfect replication of the speaker, according to Creative Bloq. So similar to how Photoshop ushered in a new era of editing and image creation, this tool could transform how audio engineers work with sound, polish clips, and clean up recordings and podcasts. "When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative," reads an official Adobe statement. "We have developed a technology called Project VoCo in which you can simply type in the word or words that you would like to change or insert into the voiceover. The algorithm does the rest and makes it sound like the original speaker said those words."
This discussion has been archived. No new comments can be posted.

Adobe Is Working On 'Photoshop For Audio' That Will Let You Add Words Someone Never Said

Comments Filter:
  • by jenningsthecat ( 1525947 ) on Friday November 04, 2016 @08:11AM (#53211993)

    When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative...

    When recording suspects, police would often like change or insert a word or a few words in order to manufacture evidence by changing part of the narrative.

    FTFY

    OTOH, if it's really good enough to be undetectable, it might cause a lot of legitimate and unaltered recordings to be thrown out of court on the grounds of reasonable doubt.

    • by John Smith ( 4340437 ) on Friday November 04, 2016 @08:28AM (#53212099)
      I never said that. It was clearly SoundShopped in.
      • by dwywit ( 1109409 )

        This sounds shopped. I can tell by the {samples/notes/octaves/pitch/frequency/timbre/vibrato}

    • Police already know how to prove these sort of thing haven't been tampered with: it's called "the chain of evidence". These techniques are simply going to have to be applied to recorded testimony, that's all.

      • Police already know how to prove these sort of thing haven't been tampered with: it's called "the chain of evidence". These techniques are simply going to have to be applied to recorded testimony, that's all.

        First, it's called "Chain of Custody", and it already exists for recorded evidence.

        Second, when the "Custody" continues to be in the LEA's hands, who's to say that someone at the LEA didn't replace the recording in the Evidence Bag with a SoundShopped one?

      • by Anonymous Coward

        Or you know, shouting stuff that isn't happening. STOP RESISTING, GET YOUR HANDS OUT OF YOUR POCKETS, when the person is on the ground handcuffed and being beaten...

    • When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative...

      When recording suspects, police would often like change or insert a word or a few words in order to manufacture evidence by changing part of the narrative.

      FTFY

      OTOH, if it's really good enough to be undetectable, it might cause a lot of legitimate and unaltered recordings to be thrown out of court on the grounds of reasonable doubt.

      It'll add a whole new level to "Its a shop. I have a lot of experience with shops and also you can tell by the audiopixels".

    • When recording voiceovers, dialog, and narration, people would often like to change or insert a word or a few words due to either a mistake they made or simply because they would like to change part of the narrative...

      When recording suspects, police would often like change or insert a word or a few words in order to manufacture evidence by changing part of the narrative.

      heheh yeah the advertised use-case scenario is about as believable as private mode in browsers designed for shopping for gifts for your wife without her knowledge. But then again if your wife is regularly checking your internet history you probably have other issues to deal with.

  • Isn't that Adobe Audition?
    • Yeah, pretty much. Though I'd argue that, at least for singers, the photoshop of audio is Melodyne (or autotune, though most pros I know use the former).

  • ...just in time for the next US Presidential Election cycle.
  • by sinij ( 911942 ) on Friday November 04, 2016 @08:24AM (#53212069)
    I have perfect name for this future product - Adobe Trump. This way when you use it to make people say awful things you are trumping them.
    • by zuki ( 845560 )
      I knew someone was going to beat me to suggesting something like this...
      • by sinij ( 911942 )
        As a side note, it will be interesting to see what this will do to politics. This gives so much plausible deniability to politicians. 47% comment? Edited! Grab by the *&$#? Edited! Then it is paid experts vs. paid shills and the public will not be able to tell the difference.
        • Between the sheer auditability of our digital pasts and ubiquity of recording equipment, to the ease at which 'embarrassments' could be fabricated... does that lead to a post-shaming society? Or the opposite and we'll all be buying Reputation Insurance in case we become unemployable because our social media got hacked and spouted off obscenity?
          • by sinij ( 911942 )

            does that lead to a post-shaming society?

            Sir, I will have you know that you are a shameless optimist. We don't tolerate your kind around these parts. Please show yourself out.

        • by e3m4n ( 947977 )

          yes, you will have experts at lip reading and body language arguing both for and against the authenticity of a recording. So just like attorneys, these paid 'experts' are going to be the only winners in this shit. In the end you will still be as confused as ever, but a lot poorer as a result.

  • I am pretty sure it was posted on Slashdot (can't find it), but the Boston Globe reported in 2002 that scientists at MIT could convincingly alter video to make it appear that someone said something they didn't, with only 2 minutes of footage:

    http://www.rense.com/general25... [rense.com]

    (Link to article on Boston Globe is dead.) They couldn't alter the audio convincingly, or at least didn't try. However, I also recall seeing on Slashdot (10+ years ago; also can't find it) that someone (Bell? MIT?) could take about 2,000

    • Post-truth politics won't matter when someone releases convincingly altered video and audio of a public figure doing something that they never did.

      There will be a need for cryptographically verifiable video, where you can prove what camera video came from, and what happened (or didn't happen) to it during the editing process. It's not impossible. It's just hard and there will be a gap between needing it (now) and when it happens.

      • by Anonymous Coward

        Cryptographically verifiable video's been available for well over a decade from some professional DVRs. It's just not in common use. Typical IP cameras and the like don't sign their video, but a good recording device usually will.
        You can still spoof the video or audio feed into the recorder, but the path from the DVR to the courts can be demonstrated to be unaltered.

  • Can do this already (Score:3, Informative)

    by The Eight-Bit Link ( 2447312 ) on Friday November 04, 2016 @08:56AM (#53212251)
    I did this sort of thing for a class project with Audacity. The person I was working with constantly flubbed their lines, so I had to stitch their lines together using things they didn't screw up until I had completed lines. It's really not hard, this just automates the process.
    • by barakn ( 641218 )

      You apparently didn't even bother to read the summary. You stitched together things that were said. This creates things that weren't said.

  • Others have called out how this will impact politicians and law enforcement. On a slightly different note, how many voice actors have recorded twenty minutes of dialog in the past? How many of their contracts give them control over how the movie studios use those recordings or mandate that the studios give them royalties for using "remixed" versions of their voices?

    For example why pay (a bunch of money to) Mark Hamill to provide the voice for a new animated version of the Joker when you can use this tool
    • Even if that were true, the voice actor certainly has control over their name.

      The studio would not be able to claim the actor plays a part in any of the promotional material for the movie.

      • Sequels.

        Especially low-budget / direct-to-video. The characters are established, so you only need to sell them on its existence. No need to use the actor's name. The actor signed away their "likeness" for promotion of the original movie. One could always argue that the sequel is "promotion" for sales of the original movie on DVD/Blu-Ray.

        • by dwywit ( 1109409 )

          Contracts are about to get a bit more specific - you can always use a different actor in a sequel, but big names (e.g. Mark Hamill) aren't going to sign contracts allowing a production studio to continue to use their voice over someone else's face.

      • by dwywit ( 1109409 )

        Also, what's to stop the actor from publically stating that he or she didn't record the audio for this particular project.

        There's always contracts, I suppose, but any decent agent will include clauses prohibiting the production company from creating an entirely "synthetic" performance - small alterations would be OK, it'll save a lot of money having to do pickups.

    • will argue that it's different enough

      I'm sure the original contracts included provisions for using the artist's "likeness." Most people assumed photos, but who's to say they can't just interpret the contract to have already included these rights?

    • On a slightly different note, how many voice actors have recorded twenty minutes of dialog in the past? How many of their contracts give them control over how the movie studios use those recordings or mandate that the studios give them royalties for using "remixed" versions of their voices?

      "In a world..."

      (And, of course, the obligatory link. [youtube.com] I can't hear that phrase without thinking of this. Although Pablo Francisco does a pretty good job. [youtube.com])

  • Currently, politicians and the powerful elites are rarely heard from in person, anyway. We get to see the results of their secret meetings and closed-door sessions through carefully crafted press releases and the societal changes we see every day. The controllers, I'm sure, positively love this technology, because it will give them an additional outlet to turn the screws on the little guys.

    Think about it, ubiquitous mobile video was probably the last tool that was still on the side of the people. Now, when

  • Like Photoshop, Project VoCo is designed to be a state-of-the-art audio editing application.

    It's in TFA, so I guess /. isn't to blame. Nice job, Verge editor.

  • Our entire historical "record" is suspect, not even audio can be believed anymore....

  • Innocent my ass. (Score:4, Insightful)

    by geekmux ( 1040042 ) on Friday November 04, 2016 @09:07AM (#53212307)

    "...So similar to how Photoshop ushered in a new era of editing and image creation, this tool could transform how audio engineers work with sound, polish clips, and clean up recordings and podcasts."

    Enough with this "innocent" sales bullshit. I am far more concerned about how this tool can and will be used against me, in a court of law, forcing me to hire enough expertise to defend against shit I never said.

    The average citizen can't even remotely afford a good legal defense these days. This is going to make that even more difficult by having to hire appropriate audio experts to analyze audio recordings to determine if they've been manipulated or not.

    And no, this isn't like Photoshop, where often the only tool that is necessary to validate manipulation is the human eye and common sense (yeah, I'm talking to you magazine editors, who still feel the need to digitally alter some of the most naturally beautiful humans on the planet.)

    • by ghoul ( 157158 )

      Well this will just lead to all audio evidence being thrown out and only eyewitness accounts being considered evidence.

      • Because it's so terribly difficult to tamper with perception and memory.

      • Well this will just lead to all audio evidence being thrown out and only eyewitness accounts being considered evidence.

        Sorry, but you are fucking delusional.

        In a courtroom, all audio will be accepted as authentic, and the accused will be responsible for financing the experts to prove otherwise.

        This will be done in this way in order to support the business community that offers expert testimony and expert analysis to refute otherwise, much in the same way that the legal community only recognizes those with a medical degree when providing legal testimony against related cases.

        In short, a courtroom isn't going to recognize you

  • by Joe_Dragon ( 2206452 ) on Friday November 04, 2016 @09:19AM (#53212359)

    voice authentication systems to be bypassed with ease now.

  • How long before we get Adobe DNA editor where DNA sequences from blood found at the crime scene can be edited to match the DNA of the suspect?

  • by RogueWarrior65 ( 678876 ) on Friday November 04, 2016 @09:55AM (#53212633)

    Can it add subliminal suggestive messages too? Of course, I'm being facetious but when, not if, this technology is misused, wouldn't it be a good idea to embed subliminal audio watermarking so that juries and the media will know that the audio is faked?

  • by ripvlan ( 2609033 ) on Friday November 04, 2016 @10:06AM (#53212701)

    The audio from Neil Armstrong can finally be corrected.

  • > Project VoCo can also apparently generate new words using a speaker's recorded voice

    In the Godfather saga, mafia top dog Don Vito di Corleone refused to ever speak on the phone, for fear of the FBI recording his words and editing the tapes into fake conversations. His wisdom retro-actively justifies the high position he achieved in the Cosa Nostra.

    In other news:

    > the software can understand the makeup of a person's voice and replicate it, so long as there's about 20 minutes of recorded speech

    The sam

  • What if this technology is already available and being used in combination with man-in-the-middle attacks for the modification of communications in real time? A state-sponsored malicious actor can even start wars between unsuspected countries. Governments don't need to wait for Adobe to write software for their cyberwar arsenals.

    The only way to (try and) guard against this that I can think of is cryptographically signing and verifying all important communications, whether between country leaders or between

  • So you say you have this technology that creates words that people never actually said.
    But it sounds just like their voice.
    Clearly this is how Skynet begins!

    Sarah: "No, I can't tell you where I am mom. I was told not to say."
    Mom: "Oh, but honey, I need to know where I can reach you. You tell me to hide out here in the cabin like some kind of fugitive and you won't tell me what's going on? I am worried sick dear."
    Sarah: "Ok. Here's the number...."
    Mom: "Ok. Go ahead.... Uh-huh. I've got it."
    Sarah: "I love yo

  • If this happens...audio recordings should no longer be considered as valid or legal evidence. Now someone can actually not have said something but have a recording of them "saying it".

    Of course, should Trump win...there will be no fair trial and just a police state of Judge Dredd's running around killing people for breathing wrong.
  • ...and I notice she was sitting on her SWEET CAN I GRAB HER SWEET CAN

  • Hopefully this will mean we'll see more John Wayne movies...
  • Lol, no way will this ever be abused or used for nefarious purposes. *cough*

    It's getting to the point where no amount of "evidence" will be able to "prove" or "disprove" anything.

    I have incontrovertible photos, video, and audio that show you killed Bob Smith, and you have incontrovertible photos, video, and audio that show you didn't. As for 3rd party witnesses, maybe their audio/video data was hacked and modified, and maybe it wasn't. Who can say?

  • ...similar to how Photoshop ushered in a new era of editing and image creation, this tool could transform how audio engineers work with sound, polish clips, and clean up recordings and podcasts.

    And similar to how Creative Cloud has made vassals out of people who work on photographs, videos, websites, and presentations, Project VoCo will do the same to people who work with audio.

  • A giant leap backwards for forensics

  • Releasing software allowing the editing of spoken words in audio recordings is probably the best way to ensure people know this capability exists. Everyone knows about Photoshop and the kinds of things it makes possible so that the old phrase "the camera never lies" is known to be obsolete. If audio speech editing capabilities were somehow kept from the public the potential for abuse would be much greater.

This is the theory that Jack built. This is the flaw that lay in the theory that Jack built. This is the palpable verbal haze that hid the flaw that lay in...

Working...