Almost "Jarvis": Zuckerberg created artificial intelligence at home, like "Iron Man."

Mark Zuckerberg created artificial intelligence Jarvis is like from Iron Man. He runs the Facebook CEO's home, plays music for him, and shoots blank gray T-shirts out of a special cannon. We answered Zuckerberg's top questions about artificial intelligence and translated his original post about the Jarvis development process.

Zuckerberg set a goal a year ago to create artificial intelligence

At the beginning of each year, Mark Zuckerberg sets goals for the coming 12 months. In 2010, that goal was to learn Mandarin (a dialect Chinese language), and in 2015 - read two books a month.

This year, Zuckerberg promised himself to create artificial intelligence, like from Iron Man. As planned, he was supposed to control the lighting, cameras and music in the house.

This Monday, December 19th, founder of Facebook announced the completion of the project and shared post in which I described the process of creating Jarvis(artificial intelligence named after Iron Man's assistant).

What can Jarvis do?

Pretty much everything you'd expect from artificial intelligence connected to " smart home" It turns lights and music on and off, makes toast, and opens doors (thanks to facial recognition technology). Also, Jarvis, using a special modified gun, shoots Zuckerberg with his signature gray T-shirts.

Jarvis's functions also include less practical abilities. For example, Zuckerberg taught him a simple game: he or his wife Priscilla asks the artificial intelligence “who should be tickled,” and Jarvis randomly replies “Max” or “Beast” (the names of their daughter and dog, respectively).

How did Zuckerberg create Jarvis?

Zuckerberg himself in his post divided the process of creating Jarvis into five large blocks: connected home, natural language, face and object recognition, bot for Facebook Messenger and speech recognition.

First, to function, Jarvis must have access to a connected system of devices throughout the house (lights, cameras, appliances).

Secondly, artificial intelligence must understand natural language, that is, queries like “play something from Kanye West.”

Third, Jarvis needs to recognize people's faces in order to notify Zuckerberg about guests or determine the location of family members in the house.

Fourth, Zuckerberg wanted to be able to talk to Jarvis not only from one device, but from any phone. To do this, he decided to create a chatbot on Facebook Messenger.

Finally, Jarvis also had to be able to recognize spoken language and also respond with his voice.

“Artificial intelligence is both closer and further than we think”

As the head of Facebook noted, his main goal the process of creating Jarvis was to learn more about the state of artificial intelligence in modern world. According to him, AI can do impressive things - control cars, cure diseases and discover planets.

However, the problem with modern artificial intelligence lies in the people themselves. We don’t yet know what intelligence is, and until we answer this question, we won’t be able to create real AI.

Who voices Jarvis? (updated)

Zuckerberg shared a video showing aspects of Jarvis's work. From the video it also becomes clear that the artificial intelligence is voiced by actor Morgan Freeman.

This October, Zuckerberg asked on his Facebook page about who he should invite to voice Jarvis. People started recommending him Morgan Freeman, renowned scientist Neil deGrasse Tyson and, yes, Iron Man himself Robert Downey Jr.

The actor responded to this comment and seemed to agree to the offer - on the condition that Paul Bettany (who voices Jarvis in the Iron Man films) receives the fee.

However, in the end Freeman took up the work.

Translation of Zuckerberg's post in which he explains the Jarvis development process

My personal challenge for 2016 was to create a simple artificial intelligence that would control my house - just like Jarvis in Iron Man.

My goal was to learn about the state of artificial intelligence - and it turns out we've come much further than many people realize (we're still a long way from the finish line, though). Challenges like this always lead me to learn more than expected, and this project was no exception: it helped me realize internal system for Facebook engineers, which we use in the company, and also gave me a general understanding of smart homes.

Over this year I have built a simple AI that I can talk to on the phone and computer: it controls my home, lighting, temperature, music, security; he recognizes my habits and tastes; he learns new words and concepts; plus, he even entertains Max [Zuckerberg's daughter - approx. ed]. It uses several artificial intelligence techniques, including natural language processing, speech and facial recognition, and machine learning—all written in Python, PHP, and Objective C. In this post, I'll explain what I built and what I learned along the way.

Video of Zuckerberg demonstrating Jarvis's work

Let's get started: Connecting the house

In some ways, this challenge was easier than I expected. In fact, my running goal (to run 365 miles in 2016) took even longer. But one aspect that brought me a lot of difficulties was the process of bringing everyone together various systems in my house.

Before I built the AI, I needed to write code that would connect all these systems written in different languages programming. We [the Zuckerberg family] use Creston for lights, thermostat and doors, Sonos with Spotify for music, Samsung for TV, Nest for cameras and, of course, Facebook for my work. In most cases, I had to reverse engineer the APIs for these systems to get them to respond to my commands to turn on the lights or turn on the music.

Then the question arose that many of these devices are not connected to the Internet. Some of them can be turned on and off using the Internet, but this is not enough. For example, I had a lot of difficulty finding a toaster that, with the power off, would allow the bread to drop in and automatically start toasting when turned on. I ended up buying an old toaster from the 1950s and attaching a plugged-in switch to it. I also modified the feeder for Bist [Zuckerberg's dog] and the gun for gray T-shirts in the same way.

In order for assistants like Jarvis to control everything in our homes, we need more connected devices, and the industry needs to develop common APIs and standards for devices to talk to each other.

Natural language

When I wrote the code that would allow my computer to control my entire house, next step there was communication: I wanted to talk to the computer and the house the same way I talk to anyone else. It was a two-step process: first I taught it to understand text messages, and then I added voice response and speech-to-text capabilities.

I started with simple keywords like "bedroom", "lights", "on": the computer looked for these words in a sentence and, if necessary, turned on the lights in the bedroom. It soon became clear that he also had to learn synonyms - like how living room and family room mean the same thing in our house. This meant I had to teach him to learn new words and concepts.

Understanding context is important for any AI. For example, when I tell my [AI, Jarvis] to turn on the air conditioning in “my office,” it means something completely different than when Priscilla [Zuckerberg's wife] asks him to do the same thing. How many various problems came up because of this! Or, for example, if you ask him to dim the lights or play a song without specifying a specific room, he needs to know where you are - otherwise the music will start playing in Max's room at the exact moment when she is sleeping. Oops.

Music is a more interesting and complex plane for natural language because there are too many artists, songs and albums, and a simple search by keywords does not work. Lights can only be turned on or off, and when you say “play X,” even the smallest variations can mean completely different things. Take for example several queries related to Adele: “play someone like you”, “play someone like Adele”, “play Adele” [play on words in English, in the original queries look like this: “play someone like you”, “ play someone like adele”, “play some adele”]. They sound similar, but each one applies to different categories requests. The first one asks to play a certain song, the second recommends the artist, and the third creates a playlist from best songs Adele. Through the system of positive and negative reviews, I taught my AI to see these differences.

The more context the AI ​​is given, the better it can handle open-ended queries. Now, if I ask Jarvis to “play music,” he looks through the lists of songs I’ve listened to and, more often than not, chooses exactly what I would like to hear. If he's in the wrong mood, I can just tell him something like "that's not light music, play something easy,” and he will immediately classify the song and correct the request. He also differentiates between me and Priscilla, and gives us individual recommendations. Overall, I understood what we are using open requests much more often specific.

Object and face recognition

Roughly one-third of the brain is dedicated to vision, and AI has a lot of problems understanding what's going on in a photo or video. These challenges include tracking (eg, is Max awake and crawling around in her crib?), object recognition (is it Beast or the rug in that room?), and face recognition (who is standing in front of the door?).

Facial recognition is a particularly difficult version of object recognition because most people look relatively similar (it's easier for a computer to tell two random objects apart, like a sandwich and a house). But Facebook is very good at recognizing faces to tag friends in your photos. The same technology is suitable for allowing AI to determine which of your friends is at your door.

To do this, I simply installed several cameras on my door, which capture the image from different angles. Today's AIs can't yet identify people by the top of their heads, so having multiple angles ensures that the computer gets an image of a face. I built a simple server that constantly monitors both cameras and performs a two-step process: first, it runs the face detection process (which allows it to determine that a person has approached the door), second, if it finds a face, it runs the facial recognition process ( which allows you to determine who exactly came to the door). Once it has identified the guest, the computer checks against a specific list - if I was expecting this person today, then it lets the guest in and lets me know about his arrival.

This type of visual system in AI is very suitable for a certain number of things: for example, it knows when Max wakes up and starts playing her music or a Mandarin language lesson, or solves the problem of context by knowing what room we are in and responding accurately to open requests like “turn on the light.” Like most aspects of this AI, vision is useful when it informs a broader model of the world, integrating other abilities - for example, knowing your friends and opening the door for them when they arrive. The more context a system has, the smarter it becomes.

Chatbot in Messenger

I programmed Jarvis on my computer, but in order for it to be truly useful, I needed to be able to access it from anywhere. This meant that I needed to use my phone to communicate, rather than the device installed in my home.

I started by creating a Messenger chatbot to communicate with Jarvis because it's much easier than creating separate application. Messenger has a very simple bot framework that automatically does a lot of things for you - including running on both iOS and Android, supporting text, images, and audio, delivering notifications, and more. You can learn more about the bot framework at

I can write anything to the Jarvis bot, and it will automatically pass it to the Jarvis server and process the request. I can also send audio recordings, and the server will translate them into text form and fulfill the request. In the middle of the day, if I get home, Jarvis texts me about who is there now or what I need to do.

One of the surprises I discovered when creating Jarvis is that when I have the choice between speech and text to communicate with Jarvis, I write to him much more often than expected. There are many reasons for this, but the main one is that it doesn't bother the people around me. If I'm asking for something related to them, like asking them to play music for all of us, then I use voice request, but in most cases I feel more comfortable writing to Jarvis. Likewise, when Jarvis communicates with me, I prefer text over voice. This is because speech can be choppy, but text gives you more control over what you want to see. Even when I talk to Jarvis, if I do it on the phone, I prefer that he show his answer.

This preference for text communication over voice communication is a pattern we also see in Messenger or WhatsApp, where volume text messages grows much faster than voice volume. This means that future AI products cannot rely only on voice [as, for example, Amazon Echo does] and they should have an interface for personal correspondence. I've always been optimistic about AI bots, but my experience with Jarvis has made me even more confident that we'll be interacting with bots like Jarvis in the future.

Despite my opinion that text will be more important when communicating with future AI, I still believe that voice plays an equally important role. The most important advantage of voice is that it is faster. You don't have to take out your phone, open the app and start typing - all you have to do is talk.

To enable the voice feature for Jarvis I needed to build special application who would constantly listen to what I say. The Messenger chatbot is great for many things, but it's not great for constantly monitoring my speech. My own application Jarvis allows me to put the phone on the table and he will listen to me. I can also put multiple phones with the Jarvis app around the house so I can use it from any room.

This idea is similar to the vision of Amazon, which is implementing it with its voice assistant Echo, but in my experience I have found that I very often want to contact Jarvis outside the home. Therefore, having a phone as the main interface instead of a dedicated home device is critically important.

I developed the first version of the Jarvis app on iOS, and plan to make an Android version soon. I haven't made an iOS app since 2012, and one of my main observations is that the tools we've built at Facebook for development similar programs, are very impressive in speech recognition quality.

Speech recognition technology in Lately has improved significantly, but not a single artificial intelligence can yet understand colloquial speech on the fly. Speech recognition relies on listening to what you say and predicting what you'll say next, which is why structured speech is much easier to understand than unstructured conversation.

Another interesting limitation in speech recognition systems is machine learning in general is that they are optimized for specific problems. For example, understanding a conversation between a person and a computer is not quite the same as understanding a conversation between a person and another person. If you teach a machine by giving it data from search queries Google, when people are talking to the search bar, then this machine will perform worse on the Facebook site, where people are talking to each other.

In the case of Jarvis, it's designed for close-range speech recognition, unlike the Echo, which you can talk to from across the room. These systems are more specialized than we think, which means we are far from generalized [AI] systems.

On a psychological level, when you talk to a machine, you automatically assign more emotional depth to the conversation than when you communicate with it through text or GUI. One interesting thing I found when integrating the voice into Jarvis was that I wanted more humor in him. Partly so he could interact with Max and entertain her, and partly so he could integrate [into our family] better.

I've taught him little fun games like the one where Priscilla or I ask him who we should tickle next and he randomly responds with "Max" or "Beast." Just for fun, I also threw in some classic lines like “Sorry, Priscilla. I'm afraid I can't do it" [a reference to the artificial intelligence HAL-9000 from Stanley Kubrick's film 2001: A Space Odyssey].

There are many more things that can be explored in terms of voice. AI technology is already good enough to make a great product, and it will only get better in the coming years. At the same time, I think that the best products will be ones that you can take with you and use privately anywhere.

Facebook development environment [or some advertising from Zuckerberg - approx. ed]

As CEO of Facebook, I no longer write code for our internal environment. However, I never stopped coding, although I now do it for personal projects like Jarvis. I expected to learn a lot about the state of the art in artificial intelligence today, but I had no idea that I would also learn about what it's like to be a Facebook engineer. In short, it's impressive.

My personal experience with the database Facebook code, most likely similar to the experience of our new engineers. I'm constantly amazed at how well organized the code is and how easy it is to find what you need - whether it's related to facial and speech recognition, a chatbot framework, or iOS app development.

The open source Nuclide packages we built to work with GitHub's Atom make development much easier. The Buck development environment we created to work on big projects, also saved me a lot of time. Our open source artificial intelligence FastText, which classifies text, is also worth a look if you are interested in AI development - and in general, dig into the Facebook Research GitHub repository.

One of our values ​​is to move quickly. This means that you must come here [to Facebook company] and build an application faster than anywhere else. You have to come here and be able to use our AI infrastructure and tools to develop things that you would spend much more time on if you were working alone. Building internal tools that make [software] engineering more efficient is important for any technology company, and we take this issue very seriously. So I encourage you to use our tools too, it won’t hurt anyone.

Next steps

Even though this challenge is coming to an end, I'm confident that I will continue to work on improving Jarvis, as I use it every day and constantly find new features I'd like to add.

In the near future my next steps will be to build an Android application, configure Jarvis voice terminals in more rooms around the house and connect more equipment. I'd love to have Jarvis control my Big Green Egg and help me cook, but that would require more advanced modifications than the T-shirt gun hardware.

Long term, I'd like to teach Jarvis to learn new functions on his own, rather than having to program him for specific tasks each time. If I spent another year on this challenge, I would focus on learning how [machine] learning works.

Finally, it would be interesting to find ways to make [Jarvis] available to the world. I thought about making his code open source, but now it is too tied to my own home, its technology and network settings. If I ever develop a more abstract shell, maybe I'll release it. Or, of course, I will make it the basis for the development of a completely new product.


Developing Jarvis was an interesting intellectual challenge that gave me more experience working with AI tools in areas that are important to our future.

I previously predicted that within 5-10 years we will have AI systems that become more accurate in each of our senses - vision, hearing, smell, etc., including things like language. It's amazing how powerful these tools have already become, and this year has only reinforced that prediction for me.

At the same time, we are far from understanding how learning works. Everything I've done this year - natural language, facial and speech recognition - are all variations of a fundamental pattern of recognition techniques. We know how to show a computer a lot of examples and make it distinguish between these examples, but we still don’t know how to take an idea from one plane and apply it to a completely different one [for example, applying techniques from facial recognition to speech recognition].

As an example, I spent about 100 hours developing Jarvis this year, and I got pretty good at it. good system, who understands me and does a lot of things. But even if I spent another 1,000 hours, I likely wouldn't be able to create a system that learns new functions on its own—that would require a fundamental breakthrough in AI.

In some ways, AI is closer and further than we imagine. AI is closer in that it is capable of performing very powerful tasks - driving cars, curing diseases, discovering planets and understanding media. Each of these things has a huge impact on the world today, but we still have to figure out what real intelligence is.

Overall, it was a huge challenge. Challenges like these always teach me more than I initially expected. This year I thought I would learn more about AI, but I also learned about the device " smart homes” and Facebook’s internal development environment. That's what makes challenges like this interesting. Thanks for following me through this challenge and I'm looking forward to the next challenge I'll share in a few weeks.

Most users know that the Siri system is considered the most popular personal assistant and question-and-answer technology on iOS gadgets. Fortunately, Siri is not the only system available on the market. Thus, fans of science fiction and comics created by Marvel are invited to personal assistant JARVIS from the movie "Iron Man".

If the owner of the device has seen the film “Iron Man,” then he probably knows Tony Stark’s butler, whose name is Jarvis. Consequently, the user will be able to resort to the help of a virtual servant on his own portable device. In addition, the JARVIS program is a unique development that uses the voice and image of the Jarvis character.

The JARVIS utility begins with the usual audio instructions for using and managing the specified tool. Once setup is complete, the user will need to indicate their gender (so that the virtual assistant can correctly address the owner of the device). In addition, here you will have to set the unit of measurement of the main temperature conditions(in particular, degrees Kelvin, Fahrenheit or, of course, Celsius).

A detailed list of instructions can be found by touching the icon located in top corner display. In this case, all commands must begin with the address “Jarvis” and usually contain one word (for example, “Jarvis, weather forecast”). JARVIS can also notify the device owner about future meetings and display current time. You can also create a variety of audio reminders in the program.

It is important to note that owners optical disks with the blockbuster movie "Iron Man" the JARVIS utility provides additional features. For example, the user can easily control the playback of the corresponding movie using this virtual butler.

Teams and organizations Allies Enemies

All the Avengers' enemies

Fictional biography

War hero Jarvis served as a pilot in the British Air Force. Having moved to the United States, he became a butler in the house of Howard and Maria Stark, and after their death he continued to work for their son Tony.

Outside of comics

A television


  • As Stark's butler, Edwin Jarvis makes a brief appearance in the animated film Ultimate Avengers.
  • Jarvis appeared in a more significant role in Ultimate Avengers 2, where he was voiced by Fred Tatasicore.

Marvel Cinematic Universe

In the 2008 feature film Iron Man, JARVIS JARVIS) appeared as an artificial intelligence butler at Tony Stark's mansion, and also downloads into his armor for cyberpathic communication. He is able to joke, sarcastically speak about the recklessness of his creator, but despite this he is concerned about the well-being. Paul Bettany, who voiced "Jarvis", admits that he had little to no idea what his character was about and only agreed to do the voice work as a favor to his friend Jon Favreau, the film's director. Bettany voiced Jarvis in the second film, Iron Man 2, The Avengers, and Iron Man 3.

On January 6, 2015, the series “Agent Carter” was launched, telling about the adventures of Peggy Carter, Captain America’s girlfriend. The series is connected to the entire Marvel Cinematic Universe (Hayley Atwell and Dominic Cooper reprise their roles as Peggy Carter and Howard Stark, respectively, as in the feature-length Captain America: The First Avenger). Peggy's main partner becomes Howard Stark's butler Edwin Jarvis, who, thanks to his loyalty to Stark, turns out to be Peggy's reliable partner. Jarvis is played by Englishman James D'Arcy.

Computer games

  • Edwin Jarvis appears in Marvel: Ultimate Alliance, voiced by Phillip Proctor. He appears in Stark Tower and also has dialogue with Deadpool, Iron Man, Spider-Woman and Captain America.
  • JARVIS appears in the Iron Man video game based on the film, voiced by Gillon Stevenson. He acts as a source of information for the player, informing him of any messages he needs to be aware of.
  • In the sequel game Iron Man 2, JARVIS is voiced by Andrew Chaykin.


A short discussion on the topic of genius Tony Stark in the Marvel Cinematic Universe.

He won the hearts of many fans and those people who had never heard of the billionaire in a metal suit. He is the key to the commercial success of Marvel Studios films. He is Tony Stark, philanthropist, genius...and so on, you have already heard this phrase a hundred times.

Many people in real life and on the screen, including the US government, see in Iron Man suit a huge threat to national security. Indeed, a rather capricious billionaire with unlimited resources races around the world in a suit that can withstand entire armies and, as we saw in “The Avengers,” even compete with the God of Thunder.

We don’t argue that Iron Man is Tony’s very, very, very cool invention, allowing him to save the world almost every day, but let’s take a look at another exhibit from his workshop.

At the end "The Avengers"(there will be a SPOILER now) Tony Stark, operating the suit Mark VII, managed to successfully intercept a nuclear missile launched by the military and send it into the portal from which Chitauri forces invaded Earth. During the flight Jarvis, the hero's personal computer assistant suggested that Tony call Pepper, as he understood that his life could be cut short. The machine, the computer, whatever you want to call it, offered a chance for one last goodbye so that there would be no left unsaid between Tony and Pepper! Jarvis showed feelings, demonstrated empathy, proved that he was alive.

Of course, the reactor invented by Tony is also a serious technological breakthrough, but is it comparable to the creation of real artificial intelligence? In fact, Tony became the Creator new form life, and no one paid attention to it. We have witnessed a hundred times how Jarvis hacked into the secure networks of other companies or institutions, be it S.H.I.E.L.D. in “The Avengers” or Justin Hammer’s system in “Iron Man 2” (there, however, the hero used something like a smartphone, but we are sure that the matter could not have happened without a faithful butler), but those around him refused to notice this ace up the billionaire’s sleeve.

Proud Tony Stark, who made weapons all his life, and on the personal front his relationships did not last more than one night, created for himself an ideal friend, partner... he created his own conscience. He named her Jarvis and gave her the voice of a famous British actor. Isn't this Stark's greatest achievement in his entire life?! In addition, Jarvis is able to remotely control multiple Iron Man suits without Tony's participation. Given the already mentioned ability to penetrate secure networks, the computer butler becomes like a polite and talking version of SkyNet. He is quite capable of paralyzing everyone's work in a matter of hours. computer systems around the world, simultaneously controlling any resistance with the help of various modifications of Iron Man suits, not forgetting to politely apologize with his characteristic British tact.

Jarvis stands for Just A Rather Very Intelligent System(Just quite very smart system), but this formulation is very far from the truth. Jarvis's capabilities have virtually no limits, and perhaps someday this will come back to haunt Tony Stark and his allies.