Voice programs for computer. Voice control does not work. What could be the reasons and how to fix it? How to train speech recognition and improve accuracy

A person approached me with a request to write a program that would allow him to control a computer mouse using his voice. Then I could not even imagine that an almost completely paralyzed person who cannot even turn his head, but can only talk, is capable of developing vigorous activity, helping himself and others to live an active life, gain new knowledge and skills, work and earn money. , communicate with other people around the world, participate in a social project competition.

Let me give here a couple of links to sites, the author and/or ideological inspirer of which is this person - Alexander Makarchuk from the city of Borisov, Belarus:

To work on the computer, Alexander used the “Vocal Joystick” program, developed by students at the University of Washington, funded by the National Science Foundation (NSF). See melodi.ee.washington.edu/vj

I could not resist

By the way, on the university website (http://www.washington.edu/) 90% of the articles are about money. It's hard to find anything about scientific work. Here, for example, are excerpts from the first page: “Tom, a university graduate, used to eat mushrooms and had difficulty paying his rent. Now he is a senior manager at an IT company and lends money to a university,” “Big Data helps the homeless,” “The company has committed to pay $5 million for a new academic building.”

Am I the only one who finds this annoying?


The program was made in 2005-2009 and worked well on Windows XP. In more latest versions Windows program may freeze, which is unacceptable for a person who cannot get up from his chair and restart it. Therefore, the program had to be redone.

There are no source texts, there are only individual publications that reveal the technologies on which it is based (MFCC, MLP - read about this in the second part).

It was written in the image and likeness new program(about three months).

Actually, you can see how it works:

Download the program and/or watch source codes Can .

You don’t need to perform any special actions to install the program, just click on it and run it. The only thing is that in some cases it is required that it be run as an administrator (for example, when working with virtual keyboard“Comfort Keys Pro”):

It's probably worth mentioning here other things I've previously done to make it possible to operate a computer hands-free.

If you have the ability to turn your head, a head-mounted gyroscope may be a good alternative to the eViacam. You will get fast and accurate cursor positioning and independence from lighting.

If you can only move the pupils of your eyes, then you can use a gaze direction tracker and a program for it (this may be difficult if you wear glasses).

Part II. How does it work?

From published materials about the Vocal Joystick program, it was known that it works as follows:
  1. Cutting the audio stream into frames of 25 milliseconds with an overlap of 10 milliseconds
  2. Receiving 13 cepstral coefficients (MFCC) for each frame
  3. Verifying that one of the 6 stored sounds (4 vowels and 2 consonants) is pronounced using a multilayer perceptron (MLP)
  4. Translating found sounds into mouse movements/clicks
The first task is notable only for the fact that to solve it in real time, three additional threads had to be introduced into the program, since reading data from the microphone, processing sound, playing sound through sound card occur asynchronously.

The last task is simply accomplished using the SendInput function.

It seems to me that the second and third problems are of greatest interest. So.

Task No. 2. Obtaining 13 cepstral coefficients

If anyone is not in the know, the main problem of recognizing sounds by a computer is the following: it is difficult to compare two sounds, since the two are dissimilar in outline sound waves may sound similar from a human perspective.

And among those involved in speech recognition, there is a search for the “philosopher’s stone” - a set of features that would unambiguously classify a sound wave.

Of those features that are available to the general public and described in textbooks, the most widely used are the so-called Mel-Frequency Cepstral Coefficients (MFCC).

Their history is such that they were originally intended for something completely different, namely, to suppress echo in the signal (an educational article on this topic was written by the respected Oppenheim and Schafer, may there be joy in the homes of these noble men. See A. V. Oppenheim and R. W. Schafer, “ From Frequency to Quefrency: A History of the Cepstrum".

But man is designed in such a way that he is inclined to use what is best known to him. And those who worked on speech signals came up with the idea of ​​using a ready-made compact representation of the signal in the form of MFCC. It turned out that, in general, it works. (One of my friends, a specialist in ventilation systems, when I asked him how to make a summerhouse, suggested using ventilation ducts. Simply because he knew them better than other building materials).

Are MFCCs a good classifier for sounds? I would not say. The same sound, pronounced by me into different microphones, ends up in different areas space of MFCC coefficients, and an ideal classifier would draw them side by side. Therefore, in particular, when changing the microphone, you must retrain the program.

This is just one of the projections of the 13-dimensional MFCC space into 3-dimensional space, but it also shows what I mean - the red, purple and blue points are derived from different microphones: (Plantronix, built-in microphone array, Jabra), but the sound was pronounced alone.

However, since I can’t offer anything better, I’ll also use standard method– calculation of MFCC coefficients.

In order not to make mistakes in the implementation, in the first versions of the program the code from well was used as a basis famous program CMU Sphinx, more precisely, its implementation in C, called pocketsphinx, developed at Carnegie Mellon University (peace be with them both! (c) Hottabych).

The pocketsphinx source codes are open, but the problem is that if you use them, you must write text in your program (both in the source code and in the executable module) containing, among other things, the following:

* This work was supported in part by funding from the Defense Advanced * Research Projects Agency and the National Science Foundation of the * United States of America, and the CMU Sphinx Speech Consortium.
This seemed unacceptable to me, and I had to rewrite the code. This affected the performance of the program (in better side, by the way, although the “readability” of the code has suffered somewhat). Largely thanks to the use of the “Intel Performance Primitives” libraries, but I also optimized some things myself, like the MEL filter. However, testing on test data showed that the obtained MFCC coefficients are completely similar to those obtained using, for example, the sphinx_fe utility.

In sphinxbase programs, the calculation of MFCC coefficients is carried out in the following steps:

Step sphinxbase function The essence of the operation
1 fe_pre_emphasis Most of the previous reading is subtracted from the current reading (for example, 0.97 from its value). A primitive filter that rejects low frequencies.
2 fe_hamming_window Hamming window – introduces attenuation at the beginning and end of the frame
3 fe_fft_real Fast Fourier Transform
4 fe_spec2magnitude From the usual spectrum we obtain the power spectrum, losing the phase
5 fe_mel_spec We group the frequencies of the spectrum [for example, 256 pieces] into 40 piles, using the MEL scale and weighting coefficients
6 fe_mel_cep We take the logarithm and apply the DCT2 transformation to the 40 values ​​from the previous step.
We leave the first 13 values ​​of the result.
There are several variants of DCT2 (HTK, legacy, classic), differing in the constant by which we divide the resulting coefficients and a special constant for the zero coefficient. You can choose any option, it won’t change the essence.

These steps also include functions that allow you to separate the signal from noise and from silence, such as fe_track_snr, fe_vad_hangover, but we don’t need them, and we won’t be distracted by them.

The following substitutions were made for the steps to obtain the MFCC coefficients:

Task No. 3. Checking that one of the 6 memorized sounds is being pronounced

The original Vocal Joystick program used a multilayer perceptron (MLP) for classification - a neural network without newfangled bells and whistles.

Let's see how justified the use is neural network Here.

Let's remember what neurons do in artificial neural networks.

If a neuron has N inputs, then the neuron divides the N-dimensional space in half. Slashes backhand with a hyperplane. Moreover, in one half of the space it works (gives a positive answer), but in the other it does not work.

Let's look at the [practically] simplest option - a neuron with two inputs. It will naturally divide two-dimensional space in half.

Let the input be the values ​​X1 and X2, which the neuron multiplies by the weighting coefficients W1 and W2, and adds the free term C.


In total, at the output of the neuron (let’s denote it as Y) we get:

Y=X1*W1+X2*W2+C

(let’s skip the subtleties about sigmoid functions for now)

We consider that the neuron fires when Y>0. The straight line given by the equation 0=X1*W1+X2*W2+C precisely divides the space into a part where Y>0, and a part where Y<0.

Let us illustrate what has been said with specific numbers.

Let W1=1, W2=1, C=-5;

Now let's see how we can organize a neural network that would work in a certain area of ​​space, relatively speaking, a spot, and not work in all other places.

It can be seen from the figure that in order to outline an area in two-dimensional space, we need at least 3 straight lines, that is, 3 neurons connected to them.

We will combine these three neurons together using another layer, obtaining a multilayer neural network (MLP).

And if we need the neural network to work in two areas of space, then we will need at least three more neurons (4,5,6 in the figures):

And here you can’t do without a third layer:

And the third layer is almost Deep Learning...

Now let's turn to another example for help. Let our neural network produce a positive response on the red dots, and a negative response on the blue dots.

If I were asked to cut red from blue in straight lines, I would do it something like this:

But the neural network does not know a priori how many direct ones (neurons) it will need. This parameter must be set before training the network. And a person does this based on... intuition or trial and error.

If we select too few neurons in the first layer (three, for example), we can get a cut like this, which will give a lot of errors (the erroneous area is shaded):

But even if the number of neurons is sufficient, as a result of training the network may “fail to converge,” that is, reach some stable state that is far from optimal, when the percentage of errors is high. Like here, the top crossbar rests on two humps and won’t move away from them. And underneath there is a large area that generates errors:

Again, the possibility of such cases depends on the initial conditions of training and the sequence of training, that is, on random factors:

- What do you think, would that wheel, if it happened, reach Moscow or not?
- What do you think, will the neural network work or not?

There is another unpleasant moment associated with neural networks. Their "forgetfulness".

If you start feeding the network only blue dots, and stop feeding red ones, then it can easily grab a piece of the red area for itself, moving its borders there:

If neural networks have so many shortcomings, and a person can draw boundaries much more efficiently than a neural network, then why use them at all?

And there is one small but very significant detail.

I can very well separate the red heart from the blue background with straight line segments in two-dimensional space.

I can quite well separate the statue of Venus from the three-dimensional space surrounding it with planes.

But in four-dimensional space I can’t do anything, sorry. And in the 13th dimension - even more so.

But for a neural network, the dimension of space is not an obstacle. I laughed at her in small-dimensional spaces, but as soon as I went beyond the ordinary, she easily beat me down.

Nevertheless, the question is still open: how justified is the use of a neural network in this particular task, taking into account the disadvantages of neural networks listed above.

Let's forget for a second that our MFCC coefficients are in 13-dimensional space, and imagine that they are two-dimensional, that is, points on a plane. How could one separate one sound from another in this case?

Let the MFCC points of sound 1 have a standard deviation R1, which [roughly] means that the points that do not deviate too far from the mean, the most characteristic points, are inside a circle with a radius R1. In the same way, the points that we trust in sound 2 are located inside a circle with radius R2.

Attention, question: where to draw a straight line that would best separate sound 1 from sound 2?

The answer suggests itself: in the middle between the boundaries of the circles. Any objections? No objections.
Correction: In the program, this boundary divides the segment connecting the centers of the circles in the ratio R1:R2, which is more correct.

And finally, let's not forget that somewhere in space there is a point that represents complete silence in MFCC space. No, it's not 13 zeros, as it might seem. This is one point that cannot have a standard deviation. And the straight lines with which we cut it off from our three sounds can be drawn directly along the boundaries of the circles:

In the figure below, each sound corresponds to a piece of space of its own color, and we can always say which sound this or that point in space belongs to (or not to any):

Well, okay, now let's remember that space is 13-dimensional, and what was good to draw on paper now turns out to be something that does not fit into the human brain.

Yes, but not so. Fortunately, in space of any dimension there remain such concepts as a point, a straight line, a [hyper]plane, a [hyper]sphere.

We repeat all the same actions in 13-dimensional space: we find the dispersion, determine the radii of the [hyper]spheres, connect their centers with a straight line, cut it with a [hyper]plane at a point equally distant from the boundaries of the [hyper]spheres.

No neural network can more correctly separate one sound from another.

Here, however, a reservation should be made. All this is true if the information about sound is a cloud of points that deviate from the average equally in all directions, that is, it fits well into the hypersphere. If this cloud were a complex figure, for example, a 13-dimensional curved sausage, then all the above reasoning would be incorrect. And perhaps, with proper training, the neural network could show its strengths here.

But I wouldn't risk it. And I would use, for example, sets of normal distributions (GMM), (which, by the way, is done in CMU Sphinx). It’s always more pleasant when you understand which specific algorithm led to the result. Not like in a neural network: The Oracle, based on its many hours of stewing over training data, tells you to decide that the requested sound is sound #3. (It especially bothers me when they try to entrust control of a car to a neural network. How then, in an unusual situation, can one understand why the car turned left and not right? Did the Almighty Neuron command?).

But sets of normal distributions are a separate large topic that is beyond the scope of this article.

I hope that the article was useful and/or made your brain squeak.

Very soon, all equipment, from phones to kettles, will be equipped with voice control. has been available for a long time and now secret laboratories of large corporations are working to improve this technology. But today you can take advantage of these future technologies and control computer equipment using your voice.

Voice control phone

For several years now, smartphones on the most popular platforms (Android, iOS, Windows Phone) have a built-in voice control system.


Siri is one of the best embodiments of artificial intelligence in modern technology. Siri is a voice assistant built into iPhone 4S smartphones that understands human speech and can conduct a dialogue with the owner of the smartphone. Siri allows you to control the basic functions of your smartphone, create tasks, search for any information, etc.


The video I prepared for you will tell you better about Siri. This is an excerpt from the iPhone 4S presentation right at the point where one of the iPhone developers talks about Siri (if the video is not visible, refresh the page):




Today, in Android smartphones, voice control is in no way inferior to Siri (in some places even superior) and performs almost the same tasks.

Voice control of your computer

In addition to the phone, you can teach you to understand commands and your computer. IN Windows Vista And Windows 7 There is also a built-in voice control system, but it is not yet available in the Russian version of the operating system. To use the English voice control system, for example, your operating system must be Ultimate or Enterprise and have an English language pack installed. But despite all these limitations, there are other options to start controlling your computer using your voice.


Type - one of the best programs that allows you to create various voice commands for computer. You record a voice command and assign an action to be performed after it is spoken. Typle does its job quite well. True, commands will have to be given in a clear, mechanical voice so that the program can recognize them. And the program can sometimes mistake extraneous sounds for a voice command. Therefore, do not be surprised if, after installing and configuring Typle, inexplicable events begin to occur on your computer.


Voice control.rf- cloud service and program Speaker from Russian developers with very good speech recognition. Speaker understands human speech much better. Another advantage of the program, unlike Typle, is that it begins to “listen” to commands only after pressing a command key - at the moment this is the mouse wheel. Thanks to this, the program will not execute commands when it is not needed. But in my opinion, using the wheel as a command key is not entirely convenient, because it is often used in other cases.


Voice control in the Opera browser. For fans of the Opera Internet browser, there is built-in voice control that allows you to control the main functions of the browser with your voice. Opera does not have the ability to create your own commands, but uses existing commands in English. But I think that few people will be interested in such functionality, when using a mouse and keyboard you can perform all the same actions with no less speed.

Voice control on Google

deserves special attention. Everyone knows that Google always creates high-quality products and services. Many have become convinced of this by starting, for example, to use Gmail. At the moment, there are two options that I know of for voice control of Google services.


The first one is searching for information using voice in the Google search engine. Helps you work much faster with the search engine.


Second - Google Translate , which allows you to dictate text (for now only) in English and automatically receive a translation into the desired language.


It is quite convenient to use voice input in Google Translate when reading text from an English textbook or, for example, product packaging, to quickly translate the necessary information into Russian.

Voice control in Google Chrome

OWeb extension- complements the existing voice control functions in Google Chrome. OWeb adds the ability to dictate text by voice on almost all sites where text input is implied - in search forms, in contact forms, in the comment field, etc. This is certainly not Siri, but it is also a great way to free your hands and save time on typing.


Watch the video in which I will show you the capabilities of the Oweb extension and examples of its use:



For people with disabilities, as well as simply for sybarites, OS developers have created voice control of the computer. It allows the user to enter information using their voice. After pronouncing certain words, the device begins speech recognition - converting the audio signal into digital information. After the entered information is correctly recognized, the program proceeds to the specified action algorithm - performs the function that is attached to a particular command.

Everything is quite simple. Speech is not always recognized correctly, so the computer voice control program is not intensively used to solve complex problems of operating system management. It is used to perform basic functions: opening and closing files, local and network search, etc.

History of voice control development

  • The first voice recognition system, Audrey, was created in the 50s of the twentieth century. She deciphered only numbers spoken in one voice.
  • In 1962, the first word recognition system was created. She deciphered 15 English words.
  • With the development of computers, the Dragon Dictate program was developed in 1990. It recognized up to 100 words per minute, but was expensive.
  • In the early 2000s, the Google Voice Search speech recognition application appeared on the iPhone. In 2010, a search system was added to Android.
  • Siri was included in the Phone 4S software in early October 2011;
  • Cortana, a voice assistant for Windows, was introduced in 2014.

Cortana and voice input capabilities today

Cortana is a virtual assistant in the Windows operating system. The service helps the user in planning tasks and reminds them of them.
For a specific request, the service will help to collect specific information, create a clear structure and present it to the user in the most processed form possible.
It’s interesting that immediately when turned on, the virtual assistant collects all the information about entered requests and personal data, trying to adapt as much as possible to each individual user.


Voice control of a Windows 7 computer through the use of a virtual assistant is impossible - it is integrated only into the tenth version. But, unfortunately, the developers did not bother to release a Russian-language version.
The main role is played by search, which in 10 can be opened through “Start”. This function detects almost any requests. If the entry is not recognized, you can enter the appropriate command into the pop-up window and the computer voice control program reads the text information.

The unpleasant thing is that it collects all the data entered via the keyboard and sends it to Microsoft.

Third party programs

Type

After the installation is completed, proceed to the next step - create an account. Here you need to come up with a key phrase, after which an activation notification will sound.


Next you will need to come up with and create voice commands, regardless of their purpose. The "dog" command can launch an application or perform a completely different action.




You just need to create a voice command and assign it to a specific action. Suitable only for performing basic operations - opening files, folders, etc. The functionality is limited.

Speaker

The functionality here is wider than in Typle.


Voice control of a Windows 10 computer gives the user the ability to open and close files, take screenshots, and turn off the PC.


Speech recognition takes quite a long time, over 3-4 seconds. This is due to the fact that speech is first converted into text, and commands are recognized by the computer from text information.

Laitis

This is a free program that allows you to both control your PC and dictate text. After installation, you need to register and then you can use it for your pleasure.


The autocorrect function when typing is interesting. You can say "Quotes" and the corresponding symbol will appear in the text.

Voice control capabilities via Yandex.string

By using this application, you can perform local or network searches for information and files, restart or shut down your computer. There is a function to open programs and sites.
To use the program, you must first download and install it.

But during installation, you should uncheck the boxes next to the items where the software manufacturer suggests installing a browser and changing its settings. Otherwise, the installation will take longer and the configuration in the browser will change.
Ultimately, the line is placed near the Start button. Say “Listen to Yandex” and a window will open.

Speak out the request.

After a pause, a search bar will open in the browser. It's fun to manage your search this way.
In general, voice control of a computer has not yet been developed, as it is only in our imagination. But the functions that are available today are already impressive and significantly help to move to a new level of PC use.

Have a great day!

Technology development does not stand still, providing more and more opportunities to users. One of these functions, which has already begun to move from the category of new products into our daily lives, is voice control of devices. It is especially popular among people with disabilities. Let's find out what methods you can use to enter commands by voice on computers with Windows 7.

If Windows 10 has a utility called Cortana already built into the system, which allows you to control your computer with your voice, then earlier operating systems, including Windows 7, do not have such an internal tool. Therefore, in our case, the only option to organize voice control is to install third-party programs. We will talk about various representatives of such software in this article.

Method 1: Type

One of the most popular programs that provides voice control over a computer on Windows 7 is Typle.

  1. After downloading, activate the executable file of this application to begin the installation procedure on your computer. In the installer's welcome screen, click "Next".
  2. The following displays the license agreement in English. To accept its terms, click "I agree".
  3. Then a shell appears where the user has the opportunity to specify the installation directory of the application. But you should not change the current settings without significant reasons. To activate the installation process, simply click "Install".
  4. After this, the installation procedure will be completed within just a few seconds.
  5. A window will open informing you that the installation operation was completed successfully. In order to launch the program immediately after installation and place its icon in the start menu, check the boxes next to the positions accordingly "Run Type" And "Launch Type on Startup". If you do not want to do this, then, on the contrary, uncheck the box next to the corresponding position. To exit the installation window, click "Finish".
  6. If you left a mark near the corresponding position when completing work in the installer, then immediately after closing it the Typle interface window will open. First, you will need to add a new user to the program. To do this, click on the icon on the toolbar "Add user". This pictogram contains an image of a human face and a sign «+» .
  7. Then you need to enter the profile name in the field "Enter your name". You can enter data here absolutely arbitrarily. In field "Enter Keyword" you need to specify a specific word denoting the action, for example, "Open". After this, click on the red button and after the sound signal, say this word into the microphone. After you say the phrase, click on the same button again, and then click on "Add".
  8. Then a dialog box will open with a question “Would you like to add this user?”. Click "Yes".
  9. As you can see, the username and the keyword attached to it will appear in the main Typle window. Now click on the icon "Add command", which is an image of a hand with a green icon «+» .
  10. A window will open in which you will need to select what exactly you will launch via voice command:
    • Programs;
    • Internet bookmarks;
    • Windows files.

    By checking the box next to the corresponding item, the elements of the selected category are displayed. If you want to view the full set, then check the box next to the item "Select all". Then select the item in the list that you want to launch by voice. In field "Team" its name will be displayed. Then click on the button "Record" with a red circle to the right of this field and after the sound signal, say the phrase that is displayed in it. After that press the button "Add".

  11. A dialog box will open asking “Would you like to add this command?”. Click "Yes".
  12. After this, exit the window for adding a command phrase by clicking the button "Close".
  13. This completes adding a voice command. To launch the desired program by voice, press "Start Talking".
  14. A dialog box will open stating: "The current file has been modified. Do you want to record the changes?". Click "Yes".
  15. The file save window appears. Go to the directory where you intend to save the object with the tc extension. In field "File name" enter its arbitrary name. Click "Save".
  16. Now, if you say into the microphone the expression that appears in the field "Team", then the application or other object specified opposite it in the area will start "Actions".
  17. In a completely similar way, you can write down other command phrases that will be used to launch applications or perform certain actions.

The main disadvantage of this method is that the developers do not currently support the Typle program and it cannot be downloaded from the official website. In addition, the recognition of Russian speech is not always correct.

Method 2: Speaker

The next application that will help you control your computer with your voice is called Speaker.

  1. After downloading, run the installation file. A welcome window will appear "Installation Wizard" Speaker apps. Just click here "Further".
  2. The license agreement acceptance shell appears. If you wish, read it, and then put the radio button in position "I accept…" and press "Further".
  3. In the next window you can specify the installation directory. By default, this is the standard application directory and this parameter does not need to be changed unless necessary. Click "Further".
  4. Next, a window will open where you can set the name of the application icon in the menu "Start". The default is "Speaker". You can leave this name or replace it with any other. Then click "Further".
  5. Now a window will open, where by placing a mark near the corresponding position you can place the program icon on "Desktop". If you don't need this, uncheck the box and click "Further".
  6. After this, a window will open where a brief description of the installation parameters will be given based on the information that we entered in the previous steps. To activate the installation, click "Install".
  7. The Speaker installation procedure will be performed.
  8. After its completion in "Setup Wizard" A message indicating successful installation will be displayed. If you need the program to be activated immediately after closing the installer, then leave a note next to the corresponding position. Click "Complete".
  9. This will open a small Speaker application window. It will say that to recognize voice you need to click on the middle mouse button (scroll) or on the key Ctrl. To add new commands, click on the sign «+» in this window.
  10. A window for adding a new command phrase opens. The principles of operation in it are similar to those that we considered in the previous program, but with broader functionality. First of all, choose the type of action you are going to perform. This can be done by clicking on the drop-down list box.
  11. The list that opens will contain the following options:
    • Turn off computer;
    • To restart a computer;
    • Change the keyboard layout (language);
    • Take (screenshot) screenshot;
    • I'm adding a link or file.
  12. If the first four steps do not require additional clarification, then when choosing the last option you need to indicate which link or file you want to open. In this case, you need to drag into the field above the object that you are going to open with a voice command (executable file, document, etc.) or enter a link to the site. In this case, the address will be opened in the default browser.
  13. Next, in the field located in the window on the right, enter a command phrase, after pronouncing which the action you have assigned will be performed. Click on the button "Add".
  14. After this the command will be added. This way you can add a virtually unlimited number of different command phrases. You can view their list by clicking on the inscription "My teams".
  15. A window will open with a list of entered command expressions. If necessary, you can clear the list of any of them by clicking on the inscription "Delete".
  16. The program will run in the tray and in order to perform an action that was previously included in the list of commands, you need to click Ctrl or the mouse wheel and say the corresponding code expression. The required action will be performed.

Unfortunately, this program, like the previous one, is currently no longer supported by manufacturers and cannot be downloaded from the official website. Another disadvantage is the fact that the application recognizes a voice command from the entered text information, and not by preliminary reading by voice, as was the case with Typle. This means that the operation will take longer to complete. In addition, Speaker is unstable and may not function correctly on all systems. But overall, it provides much more control over your computer than Typle does.

Method 3: Laitis

The next program, the purpose of which is to control computers running Windows 7 with your voice, is called Laitis.

  1. Laitis is good because you just need to activate the installation file and the entire installation procedure will be performed in the background without your direct participation. In addition, this tool, unlike previous applications, provides a fairly large list of ready-made command expressions, which are much more diverse than those of the competitors described above. For example, you can navigate around the page. To view the list of prepared phrases, go to the tab "Teams".
  2. In the window that opens, all commands are divided into collections corresponding to a specific program or area of ​​action:
    • Google Chrome (41 teams);
    • VKontakte (82);
    • Windows programs (62);
    • Windows hotkeys (30);
    • Skype (5);
    • YouTube HTML5 (55);
    • Working with text (20);
    • Websites (23);
    • Laitis Settings (16);
    • Adaptive commands (4);
    • Services (9);
    • Mouse and keyboard (44);
    • Communication (0);
    • AutoCorrect(0);
    • Word 2017 rus (107).

    Each collection, in turn, is divided into categories. The commands themselves are written in the categories, and the same action can be performed by pronouncing several variants of command expressions.

  3. When you click on a command, a pop-up window displays a complete list of voice expressions that correspond to it and the actions caused by it. And when you click on the pencil icon, you can edit it.
  4. All command phrases that are displayed in the window are available for execution immediately after launching Laitis. To do this, simply say the appropriate expression into the microphone. But if necessary, the user can add new collections, categories and teams by clicking on the sign «+» in appropriate places.
  5. To add a new command phrase in the window that opens under the inscription "Voice commands" Enter the expression whose pronunciation initiates the action.
  6. All possible combinations of this expression will be automatically added. Click on the icon "Condition".
  7. A list of conditions will open where you can select the appropriate one.
  8. After the condition is displayed in the shell, click the icon "Action" or "Web Action", depending on the purpose.
  9. Select a specific action from the list that opens.
  10. If you choose to go to a web page, you will have to additionally specify its address. After all the necessary manipulations have been completed, press "Save changes".
  11. The command phrase will be added to the list and ready to be used. To do this, just say it into the microphone.
  12. In addition, by going to the tab "Settings", you can select a text recognition service and a voice pronunciation service from the lists. This is useful if the current default services cannot cope with the load or are otherwise unavailable at the time. You can also specify some other parameters here.

In general, it should be noted that using Laitis to control Windows 7 voice provides much more opportunities for manipulating a PC than using all the other programs described in this article. Using this tool, you can set almost any action on your computer. It is also very important that the developers are currently actively supporting and updating this software.

Method 4: "Alice"

One of the new developments that allows you to control Windows 7 with your voice is the voice assistant from Yandex - “Alice”.

  1. Run the program installation file. It will perform the installation and configuration procedure in the background without your direct participation.
  2. After completing the installation procedure on "Toolbars" area will appear "Alice".
  3. To activate the voice assistant, you need to click on the microphone-shaped icon or say: "Hello Alice".
  4. After this, a window will open where you will be asked to pronounce the command by voice.
  5. To view the list of commands that this program can perform, you need to click on the question mark in the current window.
  6. A list of options will open. To find out which phrase you need to say to perform a specific action, click on the corresponding list item.
  7. A list of commands that need to be spoken into the microphone to perform a specific action will be displayed. Unfortunately, adding new voice expressions and corresponding actions in the current version of Alice is not provided. Therefore, you will have to use only those options that currently exist. But Yandex is constantly developing and improving this product, and therefore, it is quite possible that we should soon expect new features from it.

Despite the fact that in Windows 7 the developers did not provide a built-in mechanism for controlling the computer with voice, this feature can be implemented using third-party software. There are many applications for these purposes. Some of them are as simple as possible and are designed to perform the most frequent manipulations. Other programs, on the contrary, are very advanced and contain a huge base of command expressions, but in addition they allow you to add new phrases and actions, thereby functionally bringing voice control as close as possible to standard control via a mouse and keyboard. The choice of a specific application depends on what purposes and how often you intend to use it.

Today, voice assistants have become an integral part of life. Every day more and more people choose virtual assistants, replacing the mouse and keyboard. Artificial intelligence helps solve simple problems using voice input. After entering the information, the assistant recognizes the spoken speech and begins to function. Speak clearly and distinctly so that the assistant will correctly fulfill the request. He can suggest a route, news of the day, find music, show the weather, answer a simple question. The most common voice assistants for PCs: Cortana, Typle, Speaker, Ok Google, Gorynych, .

Cortana for Windows

Cortana is a voice assistant integrated into the operating system and created by Microsoft. The assistant is intended primarily for Windows, but also works as applications on the platforms iOS, Android, Xbox One, Microsoft Phone, Microsoft Band. Cortana will help systematize and plan your tasks and plans for a certain period, remind you to perform any actions, and provide information upon your request. It also has built-in functionality for answering common questions using Bing search. The functionality includes route planning, information about road conditions, which will help you not to be late. You can enter information using your voice and keyboard in text form. She keeps up the conversation: she sings songs, sends jokes - she is not without a sense of humor.

Features include such a function as predicting the user’s desires. If you give access to personal data, Microsoft's virtual assistant will “adapt” to you, constantly analyzing your actions: the places you like to be; your preferences in certain things; your long-term interests, hobbies and more.

The Cortana virtual assistant is tightly integrated with the operating system and can control Windows 10 and individual applications while you work: helps you read your emails, track your location, check your contacts, monitor your calendar, manage music and reminders, spanning multiple music apps and controlling audio according to user preferences.

It is possible to synchronize multiple devices. Cortana will stay up to date on multiple computers at the same time.

Alice Yandex (desktop version)

Alice– voice assistant, and . Alice can show the weather forecast, find information about public places, find music, convert currencies, do simple math calculations and can carry on a conversation. The program is very young and is being improved all the time. "Alice" allows you to conduct dialogue in text form and voice. Voice assistant Alice is able to understand the meaning of your phrases: “Where can I shop here?”, she understands: “Where are the shops here?”

Yandex search string or another name - Yandex.String located in the taskbar of the Windows operating system. The Yandex search bar can find information on the Internet that the user enters using voice or text. The user can specify a command to open any folder and document located on the computer. The program opens on a laptop computer by pressing the Lines button or hotkeys. In fact, this is a special case of Alice for PC. To save space on the 8 cm taskbar, the button is replaced with an icon with a microphone. Read more about ways to display the assistant on a PC in the article.

Typle - voice control of the computer

The program is developed for the Windows operating system. No knowledge of English is required here, there is no ability to work with the player and there is no perception of the text form. Due to the limited number of features, the program does not seem as efficient and functional to use. The assistant's limitation lies in opening only utilities and Internet pages. The program understands external noises as commands to execute, which may cause the computer to not work correctly. The voice assistant quickly completes assigned tasks. After downloading the program, you can come up with a main statement that gives a signal to work.

Gorynych

Domestic development of a voice assistant for controlling PC operating systems Windows 7, XP, Vista. Size 30.4 MB. Language Russian and English. The idea is based on the Dragon Dictate application, which was created by Western specialists. The voice assistant performs all the average commands that its counterparts perform. Using voice, the user can open any folder on the computer. This function is similar to the function from the Typle program. A special feature of the Gorynych voice assistant is entering text into Word using your own voice. The disadvantage of this function is that the user’s speech must be free of defects and clear. Over time, the program remembers the voice of the computer owner and begins to execute commands faster.

Speaker - voice control of the computer

Speaker is a voice assistant and software that is used in the Windows operating system. The software differs from others in its wider functionality. The user can open and close various folders on the PC and take a photo of the screen. To work with Speaker you must have a stable Internet connection. The program is controlled using the keyboard, which is not always convenient. The voice function leaves much to be desired: speech processing takes 5 seconds, which is a long time. The program converts speech to text.

Ok Google for PC

Okay Google is a voice assistant and at the same time part of a search engine. The program has many functions: scheduling events (setting reminders), tracking mail, going to any website, searching for musical compositions, finding addresses of public places, etc. Feature of the program: after executing the command, the program supplements the information itself. The program has advantages: it is free and stable. Cons: detailed program settings. The assistant is built into the Google Chrome browser and is available for PC, Android, iOS.

Siri on computer

Siri is a voice assistant that works on Apple devices: iOS, iPhone, iPad and iPod touch and laptops with macOS Sierra. On Apple gadgets, Siri is installed by default; all you need to do is activate it in the device settings.

Using the emulator you can install it on Windows 7-10, file size is 79 MB. The program converts human speech, subsequently giving the user recommendations. The American assistant can perform simple commands like others. “Understands” Russian speech very well. An Internet connection is required to operate.

Also read:

If you haven't found the answer, write in comments or feedback.