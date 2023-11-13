Let’s explain to you what is WhisperOpenAI’s artificial intelligence system for transcribe audio files to text. There are many tools to transcribe audio to text, but most of them tend to fail. However, this AI, whose v3 version has just been presented, has arrived to offer the best results.

We are going to start this article by explaining in a simple way what Whisper is and how it works internally. And then, we will end by telling you two ways to use it freely to transcribe your texts.

What is Whisper

Whisper is a technology that uses artificial intelligence to transcribe audios. This means that you upload an audio file to their system, and this technology analyzes everything that is said in the audio and writes it to you in words so you don’t have to.

There are jobs like in journalism in which many colleagues have to transcribe interviews. This is normally a tedious task in which you listen to the audio and write down everything that is said, having to stop from time to time and investing a large amount of time and there is strength in it. With this tool, the transcription is done by an AI.

In this case, most classic free tools tend to give too many errors, confusing words or placing others incorrectly, and even inventing numbers or not including expressions. This means that you end up having to go over everything, and it doesn’t save much time either.

And what OpenAI proposes is a much more reliable tool when making your transcriptions. It is not free from having specific errors from time to time, but it is much more efficient than most of them, it is very fast and very effective. And what’s more, it can be used freely.

How Whisper works





Whisper, in its current third version, is an automatic speech recognition or ASR system. It is a technology that uses artificial intelligence to process an audio file that you have sent, and analyze the content, detect all the words that are they say and then write to you in text what is said in the audio.

To achieve this, in its third version this artificial intelligence has been trained with more than a million hours of audio, which is already much more than the 680,000 hours used in its second version. With this, errors have been reduced by 10 and 20 percent.

Currently, Whisper has an error rate of less than 5% when transcribing from Spanish, something that makes it one of the best tools to do so. He can also transcribe English and other languages, and even Detect when you switch from one language to another during the conversation in the audio.

Among its advantages is the fact that it can correctly interpret even pauses in the conversation, using this understanding to place commas and periods in a correct way depending on the length of the pause.

Whisper is a language model, a foundation on which applications and resources can be built. Come on, a company can create a website and connect it to this model through its API to create a transcription tool or a translator.

To this end, Whisper is available in various sizes, so that it can be included in various types of applications depending on your needs. You have from a version that needs less than 1 GB of VRAM and is trained with 39 million parameters to its largest model, with 1.55 billion parameters and requirements of about 10 GB of VRAM.

How to use Whisper





Whisper is an open source AI, and has a Github page with technical instructions on how to download and run it. This requires somewhat advanced knowledge, and is not available to users with less experience.

Alternatively, you can use Whisper at replicate.com/openai/whisper. Whisper is open source, meaning it can be downloaded and used on web pages. And Replicate is a portal where you can use various artificial intelligence models, including Whisper.

On this website, you can upload the audio file you want and choose the model you want to use. For example, you can use the v3 model in any of its versions. You will be able to use it freely with your files, although for advanced use you will need to register.

Imagen | Bogomil Mihaylov

In Xataka Basics | Transcribe audio to text: 17 free tools