Show HN: PDF to Podcast – Convert Any PDF into a Podcast Episode

simonw · 2024-06-13T00:18:09 1718237889

I always go straight for the prompt with this kind of thing - it's here: https://github.com/knowsuchagency/pdf-to-podcast/blob/512bfb...

It starts like this:

    Your task is to take the input text provided and turn it into
    an engaging, informative podcast dialogue. The input text may
    be messy or unstructured, as it could come from a variety of
    sources like PDFs or web pages. Don't worry about the
    formatting issues or any irrelevant information; your goal is
    to extract the key points and interesting facts that could be
    discussed in a podcast.

The way this uses different OpenAI TTS voices for the different roles is really neat!

zachthewf · 2024-06-13T02:37:12 1718246232

I wonder what (if anything) is the impact of the leading spaces on each line of the multiline string, which are an artifact of wanting to keep the prompt pretty within code.

Hopefully not much, but I've heard horror stories about trailing spaces...

simonw · 2024-06-13T03:09:25 1718248165

As far as I can tell that only really affects the smaller models - GPT-4 / Claude / Gemini all seem pretty much impervious to weird whitespace in my experience.

cchance · 2024-06-13T02:39:31 1718246371

I imagine you could force this even further by specifying the names of the researcher and interviewer, and giving details of the structure of the episode

cush · 2024-06-13T02:17:52 1718245072

It might be a good idea to toss some kind of audio disclaimer at the beginning of the podcast that cites the source and that the audio is completely fabricated. Reason being, the "Attention is All You Need" example on your site has Anya Sharma (an actual AI researcher who is unrelated to the Attention paper) on as a guest. Not sure if this is intentional or a hallucination, but it seems like a huge liability

weare138 · 2024-06-13T03:42:09 1718250129

I tried the example physics article and it just made up a physicist to 'interview' that wasn't mentioned in the article.

xzjis · 2024-06-13T09:27:34 1718270854

Human podcasters hallucinate too.

cush · 2024-06-13T13:59:43 1718287183

Yeah definitely. I specifically only have dead presidents on my webdev podcast. When I interviewed George Bush Sr. last week, he said the housing crisis is overblown and we should all be focusing on moving our React apps to Vue

wenbin · 2024-06-13T00:35:56 1718238956

Awesome project!

However, I find that when I realize a podcast is generated using AI and synthetic audio, I immediately lose interest. For me, the value of podcasts lies in authentic human conversations, and AI-generated content just doesn’t have the same appeal.

Probably it's just me being obsessed with old-school podcasts, though. I do believe there are listeners (not sure if many or few) who don't mind if a podcast is AI-generated.

malloryerik · 2024-06-13T02:47:18 1718246838

Funny, I've been using even primitive text-to-speech on PDFs for years and while nothing compares to an excellent human reader, I find TTS often better than a mediocre human reading. This is mainly because I don't get upset at (and then have to forgive) a machine when it says the "Loovree" instead of the Louvre or in an economic history book pronounces "Keens" for John Maynard Keynes (sound like "Kaynes"). Also the dead neutrality of a machine's reading can jar me less than a numbskull and/or phony human rendition. I must say though that excellent voice actors are to me heaven.

stufffer · 2024-06-13T15:28:43 1718292523

What extensions/apps would you recommend?

I have tried to set up something similar with text-to-speech browsers extension but I loose my place if I have to close and reopen.

malloryerik · 2024-06-14T03:17:00 1718335020

On Mac with a pdf I just select say a chapter and let it read. Footnotes can be a problem. I usually use iOS now and I wrote PDF before but realize I use .epub files mostly. You can set up iOS to read entire pages. I use the local iOS Books app and have it so a two-finger swipe from the top of a page starts reading. It will usually turn pages by itself but can be a bit janky. I choose a good quality voice and have spent ten or twenty minutes rigging it up in Settings.

It's all far from perfect.

satvikpendem · 2024-06-13T04:08:02 1718251682

That's interesting, for me, podcasts are just news articles or books that I don't have the time to sit down and read. The only time I listen to podcasts and audiobooks are when I am walking around or doing chores. Yes, many podcasts have a human element to them that is nice, but just as many are still useful without a human, as for these ones, I'm primarily there for the information itself, not who conveys it.

keiferski · 2024-06-13T06:25:26 1718259926

It’s almost certainly the case that the most profitable and popular podcasts are ones built around the personality of their host(s) and not because the content is merely in audio form. So while this tool is useful for listening to information instead of reading it, the likelihood of a major podcast being entirely AI-generated is pretty low.

spaceship__sun · 2024-06-13T01:57:20 1718243840

Just a tangent, fans are obssessed with certain artists, say, TSwift, because of their personality rather than pure voice and lyrics. That's why concerts are so fucking popular.

leobg · 2024-06-13T07:40:43 1718264443

I tried the same thing for my kids:

Take some article or book written for adults. Maybe some archaeological discovery, interesting stuff from HN. Or science books from the 1960s.

Then have it turned into a conversation between the father and a curious, seven year old daughter. And convert it to audio with two different speakers.

While it’s been fun to build this, I never ended up letting my kids use it. It just feels wrong. The educational equivalent of Harlow’s Monkeys.

randomcarbloke · 2024-06-19T08:31:00 1718785860

why does this feel wrong, it seems like supervised it could be very beneficial.

david1542 · 2024-06-12T12:47:50 1718196470

Looks good. As other people said, it's risky to give you my OpenAI key, so I'd make the app run locally with React maybe. Moreover, it'd be good to give an approximation of the price. It's kinda scary to click "Submit" and later on see that I was charged $3 by OpenAI.

rahimnathwani · 2024-06-13T00:19:38 1718237978

The page has a link to the code, so I guess you can self-host it: https://github.com/knowsuchagency/pdf-to-podcast

edward-ca · 2024-06-12T16:16:26 1718208986

Looks like a fun project!

Do you have any samples of the audio? It would be great to hear what it's like before trying it out.

Also, have you considered doing this all in client side JS? Would be a good way to protect the API key (at least in this demo case).

elicash · 2024-06-13T01:45:56 1718243156

At the bottom of the page, there are examples.

eggbrain · 2024-06-12T16:13:28 1718208808

I think it would probably help to take the PDF up front, do a combination of checking the DPI and page count to get an estimated word count (as OCRing to get an exact word count might be costly on your end), and then return back a “price preview” at which point the customer just pays the price to get their podcast.

Like others have mentioned, I’d be scared to accidentally upload a 100 page PDF only for it to cost me $100 without me really knowing up front.

unraveller · 2024-06-13T08:19:15 1718266755

Sounds exactly like the way the simply news podcast is put together. That is 100% ai for each topic (ai, tech, business, science etc) and combines multiple recent papers/stories for a hundreds of daily podcasts.

https://simply-ai.podbean.com

https://www.simplynews.ai

InfiniteLoup · 2024-06-12T08:05:02 1718179502

Love the idea, as I find never enough time to sit down and read but could listen to it while running or commuting. However, I'm hesitant to hand over my OpenAI API key to a website that's not under my control. No idea though how the trust problem can be solved.

WhackyIdeas · 2024-06-12T23:57:08 1718236628

All I can think of is some form of middle man.

An escrow agent.

mikae1 · 2024-06-13T05:20:18 1718256018

Cool! But the really cool thing were would be a service that converts the contents of a text RSS/Atom feed to a podcast with a podcast feed. Imagine your favorite blogs being podcasts that you could listen to on the go.

Swizec · 2024-06-13T05:29:47 1718256587

I’ve been poking at this on and off! Got to the point where you can use a CLI command to turn any URL into a podcast episode. Even describes images and embedded code snippets so you can get the full experience.

Got distracted by other priorities so I haven’t done the RSS bits yet. In large part because that’s just boring old engineering stuff instead of playing with new toys. But I intend to get back and finish this thing by the time I start training for my next marathon. Need lots of listening material when that happens :)

Until then, hope this helps: https://github.com/Swizec/rss-to-podcast

rubiquity · 2024-06-13T05:14:05 1718255645

How do you tell it where to put the Athletic Greens advertisements?

archsurface · 2024-06-13T01:12:20 1718241140

I never listen to a podcast on less than 1.5x because there is already too much crap conversation, and I only want the nuggets of value; so I would only use tts for listening to text.

m4rc3lv · 2024-06-13T07:06:55 1718262415

This is cool! It works nice. Too bad the audio is only in English, even if you submit a PDF in another language

WalterBright · 2024-06-13T02:34:04 1718246044

Should add an option to get Swedish Chef output, Bork Bork!

iancmceachern · 2024-06-13T02:37:12 1718246232

I want this too

elphinstone · 2024-06-12T08:26:25 1718180785

Can I download as an mp3 for later playback or archiving?

ofcrpls · 2024-06-13T00:04:59 1718237099

If the examples are anything to go by, then yes, they are providing a link to a mp3 to download.

cchance · 2024-06-13T02:38:24 1718246304

Can you imagine this, with an RVC pass to do voice transfer... what a time to be alive.

Just wondering why the choice of OpenAI TTS instead of elevenlabs?

iJohnDoe · 2024-06-13T05:41:53 1718257313

Congrats on launch! Brilliant.

spaceship__sun · 2024-06-13T01:57:41 1718243861

Nice work but I gotta provide my own OAI key? Why not just run one of the API demos at this point.

davidw · 2024-06-13T00:36:56 1718239016

Can someone make something going the other way?

I don't like podcasts. I tune out after about 30 seconds of chit chat and intros and blah blah blah and end up missing stuff and can't search for it or copy and paste it.

contingencies · 2024-06-13T00:44:11 1718239451

Agreed. It's so annoying having good content buried in audio. For interviews of note on youtube, last week I cracked and spent 2 hours writing a yt-dl based ripper that converts the whole thing in to an html linkified webpage via intermedia VTT, opening the resulting subtitles based transcript file in the browser so you can easily scan and click anywhere you want to see the video and it will open in a new window at exactly that point. Not perfect but saves AGES.

simonw · 2024-06-13T00:39:30 1718239170

I run MacWhisper on my laptop, and often dump podcast MP3s into it, extract the Whisper transcript and then feed that through a long context model like Claude 3 Haiku/Opus or Gemini Pro 1.5/Gemini Flash using my https://llm.datasette.io/ tool to answer questions against that transcript.

barfbagginus · 2024-06-13T00:24:43 1718238283

I don't like podcasters because they usually muddle through stuff and approach things in a kind of non-productive superficial way that drives easy engagement rather than hard work results.

That said if it's a topic that I'm really really ignorant about, a little podcast/YouTube can be helpful. For example Yannick kilchers YouTube videos, especially how he annotates and breaks down the math equations, can be very useful if the paper's domain is new to me.

I think about it as pre-reading the paper.

A more focused first and second reading mode, may I propose, would add even more value. In these modes, the paper would be read more faithfully.

A problem that text to speech has when you feed it a regular PDF is that it will choke on titles, headings, footers, inline citations, page numbers, acronyms, abbreviations, numerical tables, charts, and diagrams.

So I would like to build or see something that conversationally reads the PDF as if it were a peer reading to me, unpacking abbreviations, mentioning titles and authors and years of citations (when I want that), describing charts, and perhaps even letting me interrupt to discuss specific misunderstandings I'm having.

There's obviously a challenge that reading a paper is an active engagement depending on your own knowledge state. We might gloss over formulas, footnotes, and citations on a first read, for example.

Still, a low hanging fruit would be a converter mode that accurately strips out page numbers and headers. There is little in this world more aggravating than listening to a 30 page paper, and having to hear that paper title and authors repeated an additional 15 times because it's reading the header.

barfbagginus · 2024-06-13T07:27:04 1718263624

Stop downvoting me you delirious sheeple!

I accidentally wrote 'podcasters' instead of 'podcasts'.

I mean I'll grant that podcasters are the scum of the Earth but. But I didn't intentionally mean to insult them there. [Here I'm just doing it for fun, lol.]

And I swear to God and warn you!

You all are going to make me start a podcast if you end up downvoting this comment too! Is that what the world needs!?? For me to start an AI generated podcast!?!! Don't make me do it!!!