How to make automatic subtitles for a program to automatically generate subtitles using speech to text? Upload a custom caption file

I have a video that I want to create subtitles for. Is there a program that can do rudimentary speech to text to

  1. set the correct start/stop of each individual subtitle
  2. create rudimentary text subtitles (using some speech in the text)

I know about gnome subtitles. However, creating these subtitles requires a lot of effort. You need to choose your own start and stop for each sentence.

Youtube has the functions listed above (creates rudimentary text subtitles with correct timings using spoken text). However, I would prefer not to upload the video to Youtube to get my subtitles. Is it possible to do subtitles effectively in Ubuntu?

Update: I plan to use only and you don't need to hard code them into the video. My biggest requirement is that the program automatically finds the start/stop for each sentence, so I write the text in it.

Update #2: There is Speech-to-Text software for Linux with the CMU Sphinx package. It is possible to use CMU Sphinx with a captioning program according to Also, one captioning tool knows about this CMU Sphinx feature, http://groups. (web tool) however there is no link in the latest source code where they added CMU Sphinx. The quest goes on to find a program that uses CMU Sphinx for rudimentary speech-to-text (which also got the timings right), just like Youtube does.

I used Aegisub for Windows a few years ago and was very happy with it. Apparently it is available for Linux. It's quite self-aware.

Aegisub only creates a subtitle file, such as a .srt file. To combine video and subtitles to create hard-coded subtitles, you still need to use a second program.
On Windows I used VirtualDub, but it is not available for Linux. You can find a suitable program for Wikipedia.

There are also other subtitle editors

I don't remember Aegisub having functionality to automatically set the beginning and end of a spoken sentence in a subtitle file. And I don't see any mention of such a feature anywhere on the site. However, using (keyboard shortcuts) it is quite easy to set these times manually.

Is there any program that has such a feature (in any OS)?

I haven't found a way to get the subtitle program to automatically add rudimentary subtitles by analyzing the voices in the video.

So the alternative I use is

  1. Upload a video to Youtube (privately, for example) and use the built-in tool to automatically generate rudimentary subtitles.
  1. Add videos to and manually create timeframes for each sentence if the automated way in Youtube does not work or the proposals are maturing.
  2. Use GNOME Subtitles (in Software Center) to clear subtitles and correct any timings.

I personally like the Gnome subtitles available in the repositories.

Sudo apt-get install gnome-subtitles

Okay, I found some tool that looks nice and looks like a subtitle workshop - subtitle editor (apt-get install subtitleeditor).

Trying to compare it with Gnome subtitles, the subtitle editor looks more advanced.

For KDE, a good subtitle editor is Subtitles. Install it using the command

Sudo apt-get install subtitlecomposer

or using subtitles link link

Conversion settings

Camtasia Studio has the ability to automatically create subtitles by converting speech to text.

The capabilities of this conversion depend on the operating system used on the computer. To check the conversion capabilities in the Camtasia Studio window in the menu Tools select team Speech, and then in the submenu - the command Speech properties(Fig. 8.16).

Currently, Microsoft Windows does not have modules for Russian speech recognition.

To start the conversion process, go to the tab Captions(Fig. 8.1) press the button Speech-to-text. When you launch it for the first time, a window appears (Fig. 8.17), using the links you can access the settings of the speech recognition module.

Please note that this module belongs to the operating system and not Camtasia Studio. Setting is possible only if the recognition language matches the operating system interface language. However, the conversion of speech into subtitle text is possible in any case, provided that the operating system contains the necessary module.

If you plan to recognize and convert your own speech into subtitles, you can click on the link Start voice training(Fig. 8.17) to train the recognition module.

  • In the next window that appears, click the button Next.
  • In the next window (Fig. 8.18), read out the phrases that appear there. There will be a lot of phrases. The training process may take 30 minutes or more.

When finished, a window will appear (Fig. 8.19). You can continue your workout by pressing the button More Training. To end the workout you must press the button Next.

This feature is not yet available in the new Creative Studio. To go to the classic interface, click Classic version in the menu on the left.

If you want your content to be understandable to all viewers, add subtitles to it. You can enable automatic subtitle generation on YouTube. It is based on speech recognition technology using machine learning algorithms.

Automatic creation of video subtitles

Automatic subtitle creation is available for the following languages: English, Spanish, Italian, Korean, German, Dutch, Portuguese, Russian, French and Japanese.

Subtitles will be added to the video if the feature is available for your language. Please note that processing time depends on the complexity of the audio track. Therefore, subtitles may not appear immediately.

Our specialists are constantly improving the technology, but sometimes speech is recognized incorrectly. Typically, errors in automatic captions are due to incorrect pronunciation, accent or dialect of the speaker, or extraneous noise. In this regard, we advise you to view the subtitles that were created automatically and, if necessary, edit them.

Here's how to check subtitles:

  1. Log in to your account and click on the channel icon in the upper right corner of the page. Go to Creative studio, open the section Video Manager and select Video.
  2. Find the video you want and click on the drop-down menu to the right of the button Change.
  3. Select Subtitles.
  4. Find subtitles that were automatically generated. It is very easy to distinguish them - in the "Published" list, which is located to the right of the video, next to the language of such subtitles it will be indicated (automatically).
  5. Read the subtitles and change or remove them if necessary.


Problems with automatic subtitle creation may occur for one of the following reasons:

  • Subtitles are not yet available as the complex audio track is still being processed.
  • Automatic subtitle generation is not available for the selected language.
  • The video is too long.
  • The video has low audio quality or contains speech that YouTube cannot recognize.
  • The video begins with a long sequence without sound.
  • Several people speak at once.

Automatic creation of subtitles for broadcasts

You can enable automatic subtitles only for English-language broadcasts with normal latency.

Automatic subtitles are not saved after the end of the broadcast. When viewing the broadcast in recordings, they are created anew.

Read more about subtitles for live broadcasts.

Let's imagine a situation where there is a video in German (Japanese, Korean, English) and you need to quickly find out what they are talking about. But your ability to understand spoken language in this language is very poorly developed or absent. What to do?

Let's talk about some tricks that can be useful in such a situation.

1. Download subtitles

Having a text version of the video helps a lot in this situation. It can be copied into Google Translate or read with a dictionary.

Copy the link to the video and paste it into the form at. Works with Youtube, DramaFever, ViKi, DailyMotion, OnDemandKorea, Drama, Vlive, VIU. Subtitles are downloaded as an .srt file. You can open it with any text editor.

The service offers automatic translations into other languages. They can also be downloaded in the .srt file. But if you know at least a little about the language spoken in the video, then it is better to download subtitles in the original language and thoughtfully translate them yourself.

2. Looking for videos with subtitles

If Downsub gives a message about the absence of subtitles, then you can try to search for a copy of it on YouTube, but with subtitles. This can be done using advanced search.

3. Automatically create subtitles

If there is no version with subtitles for the video you are interested in on YouTube, then you can upload the original file to your channel (don’t forget to specify the “Access via link” or “Limited access” setting) and use the automatic subtitle creation function.

If you do not have the source file with the video, but only have a link to it, then try downloading it using the website

How to automatically create subtitles? Very simple. YouTube spontaneously tries to translate all videos in Russian, English, French, German, Spanish, Italian, Dutch, Portuguese, Korean and Japanese into text.

A link to automatically generated subtitles appears some time after the video is loaded. For a three-minute video, they had to wait more than five minutes. Subtitles for videos posted on your channel can be downloaded directly from YouTube.

Sometimes the text turns out to be quite similar to the video. But if the speech in it sounds incomprehensible, then the result can make you laugh and surprise. YouTube Help warns that subtitles may not be automatically generated for videos of very poor quality.

This method is not bad life hack for students. Asked to watch a three-hour video with a lecture? An automatically created file with subtitles and Command+F will help you quickly find which sections of the video cover the topics you need.

4. Convert audio to video

If you don’t have a video, but only an audio recording, then you can convert it to .mp4. For example, and upload the resulting file to YouTube.

5. Convert speech to text

If you do not need to translate the entire video, but only understand short individual fragments, then it is more convenient to use the plugin SpeechLogger for Google Chrome browser.

Naturally, you can use it to convert the entire video into text. The plugin is convenient for working with pieces of text (writing one phrase at a time and immediately correcting errors).

As with automatic subtitles, the quality of the result is a lottery. The leisurely speech of a person with good diction on a simple everyday topic can be recognized perfectly. And the plugin can ignore a quick monologue with background noise.

6. Change playback settings

The method is very banal, but very effective. If you reduce the playback speed by half, the sound is perceived completely differently. This applies not only to people, but also to speech recognition plugins/applications. The slower the pace, the fewer mistakes they make.

7. Awakening your brain

This method is suitable for those who understand printed text in a foreign language very well, but are a little slow in understanding spoken language. This happens when you have to read articles and books every day, but watch videos/listen to audio much less often or almost never.

Before watching a video on a certain topic, you need to take several articles on the same topic (so that there are more terms) and listen to them using the plugin SpeakIt. At the same time, do not forget to carefully read the text and correlate it with the sound. In the extension settings, you can change the female voice to a male one, which sounds more pleasant and clearer.

For some, 20 minutes is enough to awaken the skills of understanding oral speech (provided that you have practiced them at some point), but for others it is significantly longer. The effect is the same as when visiting another country. At first there is a slight shock, but within a few days all the words and phrases that were once learned are remembered and the speech of the people around gradually turns from background noise into something meaningful and understandable.

In conclusion, it is worth recalling that translation is a very inexpensive service. In the case of English, it will cost no more than 100 rubles for each minute of audio/video decoding + 200-400 rubles per page of translated text. For other languages ​​it will be a little more expensive.