Is there a program that translates English speech? How to translate audio or video into printed text. My Transcription Experience

Hello fellow freelancers!

I think you completely understood this from yesterday’s article. Let's move on.

Today I want to tell you which programs will help significantly simplify the entire transcription process. There is more than one transcription program you can use, and there are several options for easily translating audio and video into text.

But I will talk in detail about how to do transduction and in what ways. Today is just a detailed review of these programs with all the advantages and disadvantages.

Download Express Scribe(official site)

Unfortunately, there is no version of this program in Russian, but it is very simple, intuitive and free.

Main advantages:

  • Convenient field for typing text. No need to switch between the player and the text document.
  • Change the playback speed of the audio track so you can take dictation.
  • Customizable hotkeys to play, pause, and rewind recordings.
  • Adapted to work with Word.
  • Arrangement of time codes.


  • In English. Although this does not interfere with work one bit.

In the next article I will analyze in detail how to work in it and what hotkeys to use.

LossPlay program

Download the LossPlay program

A simple and also free player for transcripts.

Main advantages:

  • Customizable hotkeys.
  • Play audio and video files.
  • Changing the recording playback speed.
  • Arrangement of time codes.
  • Customizable rollback after a pause.
  • Adapted to work in Microsoft Word.


  • Sometimes you have to switch between windows.

3 video lessons on working with LossPlay

Lesson 1

Getting to know the player, how it installs and works.

Lesson 2

Inserting time code into decrypted text.

Lesson 3

How to increase the productivity of transcribers by fine-tuning the program.

Speechpad online service

A very simple online speech recognition service. With it, you can dictate text by voice and then edit it by saving or copying it to a text document.

In fact, this service can replace regular Google documents, which have a voice typing function.

Such programs exist to facilitate the work of transcribers. You can write your opinion below in the comments or leave a review about what you use. I wish you all good luck and see you in the next article.

It must be said right away that there is no program that automatically recognizes and translates speech into text. At least for now. Therefore, transcribing records into text is currently done only manually. This article provides an overview of programs that help make this difficult process more convenient, faster and of higher quality.

RPlayer V1.4 This program has many capabilities for processing audio files, but here we will only consider using it to transcribe audio.

To make it easier to transcribe audio files, the program provides a simple text editor with an audio player at the top of the window. There are the following key combinations: left Alt – cursor down arrow – stop playback, left Alt – cursor up arrow – resume playback from a position minus five seconds from the position where playback was stopped (a very convenient function for the transcriber). The same key combinations work from all Windows programs, which allows you to use, for example, Microsoft Word for transcription.

The program is easy to use, all you need to do to work is load a recording into the program, start playback and type the text using the above key combinations to stop and resume.

A program for processing and transcribing audio recordings into text RPlayer V1.4 distributed free of charge; You can download it, as well as get more detailed information about it on the developer’s website

Dragon Dictate is an American program for speech recognition (naturally, English), used to translate speech into text, as well as to transmit voice commands to a computer. The domestic versions “Dictation”, “Combat”, “Gorynych”, “Dictograph” were based on Dragon Dictate.

Frankly speaking, all domestic programs leave much to be desired. You will have to tinker a lot with setting up the programs - adjust them to the timbre of your own speech, add new words to the dictionary. However, the more you work in these programs, the more they “get used” to your voice and understand you, but this takes time, and a lot of it.

It is unlikely that any of them can be considered a full-fledged program for transcribing speech into text. Even under ideal conditions, in the absence of noise, with clear pronunciation, there are a lot of errors. In addition, you have to constantly be distracted by viewing the entered text and constantly correct inaccuracies. As a result, the decryption speed is 2 times lower than when entering manually from the keyboard.

As for transcribing interviews, seminars, conferences, etc., the named programs are not at all suitable for such work, since they only understand the speech of their “owner”. Those who want to get to know the various “gorynych” better will find a large number of free versions of these programs on the Internet.

Transcriber "Caesar" from the Center for Speech Technologies. A convenient program, easy to use, work is carried out in the Microsoft Word editor (in 2003 and 2007), there is a function for noise reduction and voice slowing, which is sometimes very helpful when transcribing “complex” recordings.

For those who have extensive transcription experience and good typing speed, on the contrary, there is a recording acceleration function. Playback control is carried out both automatically on the transcriber panel and using a special foot pedal, which is provided with the program. There is an automatic spell check that will help you avoid mistakes and typos.

It is very convenient that you can set the names of the participants in advance and then enter them with one keystroke, which also speeds up the work. "Caesar" supports audio transcription in all common formats. This program has only one drawback - it paid.

You can find out more detailed information about the program on the website of the Center for Speech Technologies

Instead of "Caesar" you can no less successfully use record player AIMP- an excellent free option, you can download it on the official website

In the settings, adjust the “Skip back a little” and “Skip forward a little” functions - they are needed to listen to the last fragment again or go forward a fragment.

By calling up the equalizer, you can reduce the Speed ​​value and increase the Pitch value. In this case, the playback speed will slow down, but the pitch of the voice (if you select the Pitch correctly) will not change. Try to choose these two parameters so that you can type text almost synchronously with the sound, only occasionally stopping the recording. If everything is convenient and set up correctly, then typing will take much less time.

Audio transcription software Express Scribe easy to download for free online. Supports a huge number of recording formats and has integration with Microsoft Word. Rewinding is available at the press of one button; the rewind interval in seconds can be adjusted to any value.

The playback speed also changes, and there is a noise reduction function. The program is good, easy to learn, even though the interface is in English, so we can safely recommend it for work. TextService actively uses this program to transcribe interviews, round tables, conferences, seminars, etc.

We hope the article is useful and will help you in choosing programs for transcribing audio recordings. Considering the snail's pace of development of speech analyzers, it seems that the work of stenographers, typists and operators will be relevant for a very long time. And maybe this is for the better, because no program can fully understand human speech or correctly interpret speech and emotional accents. Plus, stenographers and typists will keep their jobs - the fight against unemployment)

In turn, we would like to remind you that we are ready to transcribe audio recordings into text on any topic for you at any time of the day or night, which of course will save your time.

08/23/2014. I unexpectedly discovered that I can use hidden features in Windows 7 and 8 for transcription. It may not always work, but it worked on two of my computers - an old laptop and a new all-in-one. The sequence of actions is as follows - open the control panel, then select sound and the recording devices tab in it. There we press the right mouse button and in the contextual window that appears, select me - show hidden devices.

A hidden audio mixer will appear. We make it available and then the default recording device.

After this, a column will appear in front of the mixer, indicating that it is being used for recording.

And that’s it - we can start translating audio into text in the transcription module, while we hear the sound from the speakers and no repeaters are needed.

User Victor shared his experience of installing a stereo mixer if it is not in the system.

Using a virtual cable

Today I found a free replacement for the Virtual audio cable (VAC) program. The alternative is almost freely distributed (there is a strange donationware license) on the website

True, the audio repeater program offered there did not work for me, but I managed to create a virtual cable and I was able to recognize audio without a microphone.

When translating audio using programs that create a virtual audio cable, an unpleasant phenomenon occurs - the text accumulates in the preview field and does not end up in the resulting field. After about 5 minutes of recognition, the program turns off and the “error network” error is displayed. Since the error is outside the notepad code, it cannot be simply corrected (although you can get around it, for example, by turning off recording at certain intervals).

From 07.11.13. Made a forced transfer from preliminary results to the resulting field when the text length exceeds 300 characters. Now the problem is practically solved (12/17/2014 Now there is a special Phrase buffer length field).

12/15/2013 For comparison, here are the results of the mp3 transcription of a 2.5-minute excerpt from a recording of Pushkin’s fairy tale, downloaded from the popular site The bit rate of the recording was 128 kbps, the speakers and microphone were the most ordinary.

The result of audio translation using speakers and microphone

The result of audio translation using the VB-CABLE program

Setting up a virtual cable

1. Download virtual cable, unpack it into a folder and run either VBCABLE_Setup.exe, or VBCABLE_Setup_x64.exe(depending on the bit size of your Windows)

2. Open the recording device management window and do CABLE Output default device.

3. Open the playback device management window and do CABLE Input default device.

4. Now you can start transcribing. After these manipulations, the sound will go from the audio output to recording, and the microphone will stop working. To get it back to work, you need to roll back the changes made (return everything back).

Using a Physical Cable

2.06.2014. User Vladimir Gusev suggested using a 3.5 jack-3.5 jack cable for transcription. One end of the cable is inserted into the speaker output, and the other into the microphone input. The quality with this method is close to the quality obtained with vbcable, but there is no unpleasant effect of text accumulation in the preview buffer. He also suggests using a multiplier cable for sound control.

We were asked a question on Facebook:
“To work with text, I need to transcribe 3 hours of voice recording. I tried to upload an audio file with a picture to YouTube and use their text decoder, but it turned out to be some kind of gobbledygook. Tell me, how can I solve this technically? Thank you!
Alexander Konovalov"

Alexander, there is a simple technical solution - but the result will depend solely on the quality of your recording. Let me explain what quality we are talking about.

In recent years, Russian speech recognition technologies have made significant progress. The percentage of recognition errors has decreased to such a level that it has become easier to “pronounce” other text in a special mobile application or Internet service, manually correcting individual “misprints” - than to type the entire text on the keyboard.

But in order for the artificial intelligence of the recognition system to do its job, the user must do his. Namely: speak into the microphone clearly and measuredly, avoid strong background noise, if possible, use a stereo headset or an external microphone attached to the buttonhole (for the quality of recognition, it is important that the microphone is always at the same distance from your lips, and that you yourself speak at the same volume ). Naturally, the higher the class of the audio device, the better.

It is not difficult to adhere to these conditions if, instead of accessing the Internet speech recognition service directly, you use a voice recorder as an intermediate intermediary device. By the way, such a “personal secretary” is especially indispensable when you do not have access to the Internet. Naturally, it is better to use at least an inexpensive professional voice recorder rather than a recording device built into a cheap MP3 player or smartphone. This will give a much better chance of “feeding” the received recordings to the speech recognition service.

It’s difficult, but you can persuade the interlocutor you’re interviewing to follow these rules (one more tip: if you don’t have an external clip-on microphone in your kit, at least keep the recorder next to the interlocutor, and not with you).

But “taking notes” at the required level automatically at a conference or seminar is, in my opinion, almost unrealistic (after all, you will not be able to control the speech of the speakers and the reaction of the listeners). Although there is a rather interesting option: turning professionally recorded audio lectures and audio books into text (if they were not superimposed with background music and noise).

Let's hope that the quality of your voice recording is high enough so that it can be transcribed in automatic mode.

If not, with almost any recording quality you can decrypt in semi-automatic mode.

In addition, in a number of situations, the greatest saving of time and effort will be brought to you, paradoxically, by decoding in manual mode. More precisely, the version that I myself have been using for ten years. 🙂

So, in order.

1. Automatic speech recognition

Many people advise transcribing voice recordings on YouTube. But this method forces the user to waste time at the stage of loading the audio file and background image, and then during the process of clearing the resulting text from timestamps. Meanwhile, it’s easy to save this time. 🙂

You can recognize audio recordings directly from your computer using the capabilities of one of the Internet services running on the Google recognition engine (I recommend or All you need to do is do a little trick: instead of your voice being played from the microphone, redirect the audio stream played by your computer player to the service.

This trick is called a software stereo mixer (it is usually used to record music on a computer or broadcast it from a computer to the Internet).

The stereo mixer was included in Windows XP - but was removed by the developers from later versions of this operating system (they say for copyright protection purposes: to prevent gamers from stealing music from games, etc.). However, a stereo mixer often comes with audio card drivers (for example, Realtec cards built into the motherboard). If you do not find the stereo mixer on your PC using the screenshots below, try reinstalling the audio drivers from the CD that came with the motherboard or from its manufacturer’s website.

If this does not help, install an alternative program on your computer. For example, the free VB-CABLE Virtual Audio Device: the owner of the above-mentioned service recommends using it.

The first step You must disable the microphone to use in recording mode and enable the stereo mixer (or virtual VB-CABLE) instead.

To do this, click on the speaker icon in the lower right corner (near the clock) - or select the “Sound” section in the “Control Panel”. In the “Recording” tab of the window that opens, right-click and check the boxes next to the “Show disconnected devices” and “Show disconnected devices” items. Right-click on the microphone icon and select “Disconnect” (in general, disconnect all devices marked with a green icon).

Right-click on the stereo mixer icon and select “Enable”. A green icon will appear on the icon, indicating that the stereo mixer has become the default device.

If you decide to use VB-CABLE, then enable it in the “Recording” tab in the same way.

And also in the “Playback” tab.

Second step. Turn on audio recording in any player (if you need to transcribe the audio track of a video, you can also launch the video player). At the same time, download the service in the Chrome browser and click the “Enable recording” button in it. If the recording is of sufficiently high quality, you will see how the service transforms speech into meaningful text close to the original before your eyes. True, without punctuation marks, which you will have to place yourself.

I recommend using AIMP as an audio player, which will be discussed in more detail in the third sub-chapter. Now I’ll just note that this player allows you to slow down the recording without speech distortion, as well as correct some other errors. This can somewhat improve the recognition of not very high-quality recordings. (Sometimes it is even advised to pre-process bad recordings in professional audio editing programs. However, in my opinion, this is too time-consuming a task for most users, who would much faster type text by hand. :)

2. Semi-automatic speech recognition

Everything is simple here. If the recording is of poor quality and the recognition “chokes” or the service produces too many errors, help the matter yourself by “embedding” into the chain: “audio player – announcer – recognition system.”

Your task: listen to recorded speech using headphones and at the same time dictate it through a microphone to an online recognition service. (Of course, you don’t need to switch from microphone to stereo mixer or virtual cable in the list of recording devices, as in the previous section). And as an alternative to the Internet services mentioned above, you can use smartphone applications like the free Yandex.Dictovka or the dictation function on an iPhone with the iOS 8 operating system and higher.

I note that in semi-automatic mode you have the opportunity to immediately dictate punctuation marks, which services are not yet capable of placing in automatic mode.

If you manage to dictate synchronously with the recording playing on the player, the preliminary transcription will take almost as much time as the recording itself (not counting the subsequent time spent correcting spelling and grammatical errors). But even working according to the scheme: “listen to a phrase - dictate - listen to a phrase - dictate” can give you a good saving of time compared to traditional typing.

I recommend using the same AIMP as an audio player. First, you can use it to slow down the playback to a speed at which you are comfortable working in simultaneous dictation mode. Secondly, this player can return the recording for a specified number of seconds: this is sometimes necessary in order to better hear an illegible phrase.

3. Transcript of voice recording manually

You may find in practice that you get tired of dictation in semi-automatic mode too quickly. Or you make too many mistakes with the service. Or, thanks to your speed typing skills, you can create ready-made corrected text on the keyboard much easier than using dictation. Or your voice recorder, microphone on a stereo headset, or audio card do not provide sound quality acceptable for the service. Or maybe you just don't have the ability to dictate out loud in your work or home office.

In all these cases, my proprietary method of manual decoding will help you (listen to the recording in AIMP - type the text in Word). It will help you turn your post into text faster than many professional journalists whose typing speed is similar to yours! At the same time, you will spend much less effort and nerves than they do. 🙂

What is the main reason why energy and time are wasted when transcribing audio recordings in the traditional way? Due to the fact that the user makes a lot of unnecessary movements.

The user constantly reaches out to either the voice recorder or the computer keyboard. I stopped playback - typed the listened passage into a text editor - started playback again - rewinded the illegible recording - etc., etc.

Using a regular software player on a computer does not make the process much easier: the user has to constantly minimize/expand Word, stop/start the player, and also move the player slider back and forth to find an illegible fragment, and then return to the last listened place in the recording.

To reduce these and other wasted time, specialized IT companies are developing software and hardware transcribers. These are quite expensive solutions for professionals - journalists, court stenographers, investigators, etc. But, in fact, for our purposes only two functions are required:

  • the ability to slow down the playback of a voice recording without distorting it or lowering the tone (many players allow you to slow down the playback speed - but, alas, in this case the human voice turns into a monstrous robotic voice, which is difficult to perceive by ear for a long time);
  • the ability to stop recording or roll back it for a specified number of seconds and return it back without stopping typing or minimizing the text editor window.

In my time, I tested dozens of audio programs - and found only two available paid applications that met these requirements. I bought one of them. I searched a little more for my dear readers 🙂 - and found a wonderful free solution - the AIMP player, which I still use myself.

“Once you enter the AIMP settings, find the Global Keys section and reconfigure Stop/Start to the Escape (Esc) key. Believe me, this is the most convenient, since you don’t have to think about it and your finger won’t accidentally hit other keys. Set the items “Move backward a little” and “Move forward a little”, respectively, to the Ctrl keys + back/forward cursor keys (you have four arrow keys on your keyboard - select two of them). This function is needed to re-listen to the last fragment or move forward a little.

Then, by calling up the equalizer, you can reduce the Speed ​​and Tempo values ​​and increase the Pitch value. At the same time, you will notice that the playback speed will slow down, but the pitch of the voice (if you select the “Pitch” value well) will not change. Select these two parameters so that you can type text almost simultaneously, only occasionally stopping it.

Once everything is set up, typing will take you less time and your hands will be less tired. You will be able to transcribe the audio recording calmly and comfortably, practically without lifting your fingers from typing on the keyboard.”

I can only add to what has been said that if the recording is not of very high quality, you can try to improve its playback by experimenting with other settings in the AIMP Sound Effects Manager.

And the number of seconds for which it will be most convenient for you to move backwards or forwards through a recording using hotkeys - set in the “Player” section of the “Settings” window (which can be called up by pressing the “Ctrl + P” hotkeys).

I wish you to save more time on routine tasks - and use it fruitfully for important things! 🙂 And don’t forget to turn on the microphone in the list of recording devices when you get ready to talk on Skype! 😉

3 ways to transcribe voice recordings: speech recognition, dictation, manual mode

Hello, friends. Today is the last article in the series about the profession of a transcriber, in which I will tell you how a beginner can do transcription as simply and quickly as possible.

I will show you using the example of one of the ones we talked about yesterday. I will also give an interesting way of how you can transcribe records into text using speech recognition.

Method 1

Express Scribe is a professional program that is used by almost everyone who translates audio and video recordings. It has all the necessary features that are required.

After installing this program and launching it, you will see the following window.

Unfortunately, it does not have an interface in Russian, but everything is clear in it and no special settings are required. Just install it and go.

The convenience of this program lies in the fact that you do not need to switch between player windows and a text document; you can listen to the recording and type text in one place.

Step 1. To load your files that need to be transcribed, click the "Load" button or simply drag them from your folder where they are located to the topmost window.

Step 2. Study or write yourself a cheat sheet of what hotkeys you will need in your work.

Standard hotkey settings:

  • F9— playback of the recording.
  • F4- pause.
  • F10- playback at normal speed.
  • F2- playback at low speed (50%).
  • F3- playback at high speed (150%).
  • F7- rewind.
  • F8- flash forward.

It’s convenient that the play and pause keys are configured for different hands, and after a while muscle memory will easily remember them.

Step 3. In the lower right corner of the program, set the recording playback speed that is comfortable for you. You can slow it down to the point where you can type without pausing.

Step 4. You can start transcribing.

Also for recording, you can adjust the audio channels so that the sound is better and clearer, just turn on the recording and move the scales to the best quality.

Step 5. After you have translated from audio to text, copy the resulting text into a Word document to save and edit later.

Method 2

The second method is not to type the text on the keyboard yourself, but to have it typed automatically using voice recognition services.

In Google Docs, this function is located in the “Tools” -> “Voice Input...” tab or launched by the keyboard shortcut CTRL+SHIFT+S.


Here are two completely simple ways that will help you transcribe and earn (for some people the first) money via the Internet.

The profession of “transcriber” is very simple and anyone can do it, so you can’t earn much here. I encourage you to check out other interesting specialties in the book I recently reviewed.

If you have any questions or suggestions for me, you can always write them to me below in the comments to this article. I wish you good luck in mastering this area and good income in remote work!