Virtual Jarvis. Marvel JARVIS – personal assistant for iOS devices

Today we will talk about our speech. I wish you control your computer with your voice, without using your fingers? And, as they say, with the power of thought! True, we will not control the computer with the power of thought, but with the power of our voice it is quite possible.

Typle program is one of the best programs to date for controlling a computer via voice. On the sites in the comments to this program, opinions converge.

True, it has its shortcomings. But more on that later. By the way, if you are interested, read my review.

You can download the program here: http://freesoft.ru/type

How to use it? First, let's launch it and look at the main control buttons:

The program welcomes us and immediately gives us tips on how to use Typle. First, click the “add” button and write down a word, for example “open”. To do this, let’s say this word into the microphone:

Then click add. So, we saved the word “Open” in the program with our voice. You can speak any other words into the microphone. The main thing is not to get confused.

The next step is to add commands. To do this, let's go to this point:

Then we check the box next to the item we need:

Select a program, application or action and click on the red record button. If the computer has accepted our voice, click “Add”:

And now one voice command will be visible in our profile. In this case, the one that opens 7-Zip:

And now, by clicking the final “start talking” button

We say the phrase “open Seven Zip.” In my case, everything will work. And the 7-zip program will open. Remember this phrase: Just open yourself? This is something approximately the same.

The program does not always work adequately. Now the mighty Russian language has not been fully studied by linguist programmers... But it’s still nice when the computer listens to you.

Therefore, for testing and banal curiosity, the Typle program is 100% suitable.

In this video you can see the history of the creation of the first voice engines and what else we need to work on:

There are such terrible names of other analogues of the program as Gorynych, Perpetuum, Dictograph, Voice Commander. But they are all “wrong.” They do not pass criticism of a worthy program.

It took me 5 minutes to master this program. This is quite a long time (mostly, I understand such programs in 1-2 minutes). If you have any questions, write. See you soon, friends :)!

Most users know that the Siri system is considered the most popular personal assistant and question-and-answer technology on iOS gadgets. Fortunately, Siri is not the only system available on the market. Thus, fans of science fiction and comics created by Marvel are offered the personal assistant JARVIS from the movie “Iron Man”.

If the owner of the device has seen the film “Iron Man,” then he probably knows Tony Stark’s butler, whose name is Jarvis. Consequently, the user will be able to resort to the help of a virtual servant on his own portable device. In addition, the JARVIS program is a unique development that uses the voice and image of the Jarvis character.

The JARVIS utility begins with the usual audio instructions for using and managing the specified tool. Once setup is complete, the user will need to indicate their gender (so that the virtual assistant can correctly address the owner of the device). In addition, here you will have to set the unit of measurement for the basic temperature conditions (in particular, degrees Kelvin, Fahrenheit or, of course, Celsius).

A detailed list of instructions can be found by touching the icon located in the upper corner of the display. In this case, all commands must begin with the address “Jarvis” and usually contain one word (for example, “Jarvis, weather forecast”). JARVIS can also notify the device owner about future meetings and display the current time. You can also create a variety of audio reminders in the program.

It is important to note that the JARVIS utility provides additional features for owners of optical discs with the blockbuster movie “Iron Man”. For example, the user can easily control the playback of the corresponding movie using this virtual butler.

Helpful information: if you ask your virtual assistant a question: should I buy a BMW 740 (http://www.bmw-avtoport.ru/auto/7/), then his answer with one hundred percent probability will be in the affirmative! By the way, you can purchase a BMW seventh series right now on the most favorable terms for yourself! All you need to do for this is visit the website www.bmw-avtoport.ru.

For a long time I haven’t left the thought of my “Jarvis” and controlling the equipment in the house with my voice. And finally, we got around to creating this miracle. I didn’t have to think long about the “brains”; the Raspberry Pi fits perfectly.

So, iron:

Raspberry pi 3 model b
logitech usb camera

Implementation

Our assistant will work according to the Alexa/Hub principle:

Activate offline using a specific word
Recognize a team in the cloud
Run command
Report on the work done or report requesting information

Because my camera is supported out of the box, I didn’t have to mess with drivers, so let’s move straight to the software part.

Offline activation

Activation will take place using CMU Sphinx, and everything would be fine, but out of the box recognition is very slow, more than 10 seconds, which is absolutely not suitable, to solve the problem you need to clear the dictionary of unnecessary words.

We install everything you need:

Pip3 install SpeechRecognition pip3 install pocketsphinx
Further

Sudo nano /usr/local/lib/python3.4/dist-packages/speech_recognition/pocketsphinx-data/en-US/pronounciation-dictionary.dict
We delete everything except the Jarvis we need:

Jarvis JH AA R V AH S
Now pocketsphinx recognizes it quite quickly.

Speech recognition

At first the idea was to use Google’s service, and besides, it is supported in SpeechRecognition. But as it turned out, Google takes money for this and does not work with individuals. persons.

Fortunately, Yandex also provides this opportunity, free of charge and extremely simple.

We register and receive an API KEY. All work can be done with curl'om.

Curl -X POST -H "Content-Type: audio/x-wav" --data-binary "@file" "https://asr.yandex.net/asr_xml?uuid=ya_uid&key=yf_api_key&topic=queries"

Speech synthesis

Here Yandex will help us again. We send text in response and receive a file with synthesized text

Curl "https://tts.voicetech.yandex.net/generate?format=wav&lang=ru-RU&speaker=zahar&emotion=good&key=ya_api_key" -G --data-urlencode "text=text" > file

Jarvis

We put everything together and get this script.

#! /usr/bin/env python # -*-coding:utf-8-*- import os import speech_recognition as sr from xml.dom import minidom import sys import random r = sr.Recognizer() ya_uuid = "" ya_api_key = "" # os.system("echo "Assist+ent za+ushchen" |festival --tts --language russian") def convert_ya_asr_to_key(): xmldoc = minidom.parse("./asr_answer.xml") itemlist = xmldoc.getElementsByTagName ("variant") if len(itemlist) > 0: return itemlist.firstChild.nodeValue else: return False def jarvis_on(): with sr.WavFile("send.wav") as source: audio = r.record(source) try: t = r.recognize_sphinx(audio) print(t) except LookupError: print("Could not understand audio") return t == ("jarvis") def jarvis_say(phrase): os.system("curl "https: //tts.voicetech.yandex.net/generate?format=wav&lang=ru-RU&speaker=zahar&emotion=good&key="+ya_api_key+"" -G --data-urlencode "text=" + phrase + "" > jarvis_speech.wav" ) os.system("aplay jarvis_speech.wav") def jarvis_say_good(): phrases = ["Done", "Done", "Yes", "Are there", "Anything else?", ] randitem = random.choice (phrases) jarvis_say(randitem) try: while True: os.system("arecord -B --buffer-time=1000000 -f dat -r 16000 -d 3 -D plughw:1,0 send.wav") if jarvis_on (): os.system("aplay jarvis_on.wav") os.system("arecord -B --buffer-time=1000000 -f dat -r 16000 -d 3 -D plughw:1,0 send.wav") os.system("curl -X POST -H "Content-Type: audio/x-wav" --data-binary "@send.wav" "https://asr.yandex.net/asr_xml?uuid="+ ya_uuid+"&key="+ya_api_key+"&topic=queries" > asr_answer.xml") command_key = convert_ya_asr_to_key() if (command_key): if (command_key in ['key_word", 'key_word1', 'key_word2"]): os.system ('') jarvis_say_good() continue except Exception: jarvis_say("Something went wrong")
What's going on here. We start an infinite loop, record three seconds with arecord’om and send sphinx for recognition if the word “jarvis” is found in the file

If jarvis_on():
play the pre-recorded activation notification file.

Again we record 3 seconds and send it to Yandex, in response we receive our command. Next, we perform actions based on the command.

That's all. You can come up with a great variety of execution scenarios.

Use-case

Now some examples of my real use

Philips Hue

Install

Pip install phue
In the Hue app we set a static IP:

Let's launch:

#!/usr/bin/python import sys from phue import Bridge b = Bridge("192.168.0.100") # Enter bridge IP here. #If running for the first time, press button on bridge and run with b.connect() uncommented #b.connect() print (b.get_scene())
We write down the IDs of the required circuits, like “470d4c3c8-on-0”

Final script:

#!/usr/bin/python import sys from phue import Bridge b = Bridge("192.168.0.100") # Enter bridge IP here. #If running for the first time, press button on bridge and run with b.connect() uncommented #b.connect() if (sys.argv == "off"): b.set_light(,"on", False) else: b.activate_scene(1,sys.argv)
In Jarvis we add:

If (command_key in ["turn on the light", "turn on the light", "light"]): os.system("python3 /home/pi/smarthome/hue/hue.py a1167aa91-on-0") jarvis_say_good() continue if (command_key in ["dim the lights", "dim the lights"]): os.system("python3 /home/pi/smarthome/hue/hue.py ac637e2f0-on-0") jarvis_say_good() continue if (command_key in ["turn off the lights", "turn off the lights"]): os.system("python3 /home/pi/smarthome/hue/hue.py "off"") jarvis_say_good() continue

LG TV

Let's take the script from here. After the first launch and entering the pairing code, the code itself does not change, so you can cut this part out of the script and leave only the control part.

In Jarvis we add:

#1 - POWER #24 - VOLUNE_UP #25 - VOLUME_DOWN #400 - 3D_VIDEO if (command_key in ["turn off the TV", "turn off the TV"]): os.system("python3 /home/pi/smarthome/TV/tv2. py 1") jarvis_say_good() continue if (command_key in ['turn up the volume", "louder"]): os.system("python3 /home/pi/smarthome/TV/tv2.py 24") jarvis_say_good() continue

Radio

sudo apt-get install mpg123
In Jarvis we add:

If (command_key in ["news", 'turn off the news', 'what's happening']): os.system('mpg123 URL") continue
You can also install homebridge and control everything through Siri, if you can’t shout to Jarvis.

As for the quality of speech recognition, it’s not Alexa, of course, but at a distance of up to 5 meters the percentage of correct hits is decent. The main problem is that speech from the TV/speakers is recorded along with commands and interferes with recognition.

That's all, thank you.

Tags:

raspberry pi
python

Add tags