In this tutorial we will use Whisper to transcribe a YouTube video. We will use the Python package "Pytube" to download convert the sounds into a MP4 file. You can find the repo of Pytube here
First, we need to install the Pytube Library. You can do this by running the following command in your terminal:
For this tutorial I'll be using this "Python in 100 Seconds" Video.
Next, we need to import Pytube, provide the link to the YouTube video, and convert the audio to MP4:
The output is a file named like the video title in your current directory. In our case, the file is named Python in 100 Seconds.mp4 Now, the next step is to convert audio into text. We can do this in three lines of code using whisper. First, we install and import whisper. Then we load the model and finally we transcribe the audio file.
Installing Whisper libary
Load the model. We'll use the "base" model for this tutorial. You can find more information about the models here. Each one of them has tradeoffs between accuracy and speed (compute needed).
And now we can print out the output.