Zero-knowledge proof. A thought experiment with time machines. Okay, so what does this all mean?

Using proof with zero knowledge confidential information can be explained with a specific example. Suppose there is a cave. The entrance to the cave is at point A, and at point B the cave branches into two halves - C and D. The cave has a secret: only those who know magic words, can open the door located between C and D.

Anton knows the magic words, Boris does not. Anton wants to prove to Boris that he knows the magic words, but in such a way that Boris still remains in the dark about these words. Then Anton can use the following protocol:

1. Boris is standing at point A.

2. At his choice, Anton approaches the door either from point C,

or from point D.

3. Boris moves to point B.

4. Boris orders Anton to appear either through the left passage to the door,

or through the right one.

5. Anton obeys Boris's orders, if necessary using

magic words to get through the door.

6. Steps 1-5 are repeated n times, where n is the protocol parameter.

Let’s assume that Boris has a video camera with which he records all of Anton’s disappearances in the depths of the cave and all of his subsequent appearances. If Boris shows records of all n experiments he performed together with Anton, can these records serve as evidence that Anton knows magic words for another person (for example, for Vladimir)?

Hardly. Vladimir will never be able to make sure that Anton did not inform Boris of his intentions in advance each time, so that Boris would then order him to leave exactly from the side of the door from which Anton entered. Or that not everything was cut out of the video recording made failed experiments, during which Anton was unable to carry out Boris’s orders.

This means that Boris is unable to convince Vladimir, who was not personally present during the experiments in the cave, that Anton really confirmed his knowledge of the secret. This means that the proof protocol used by Anton is characterized precisely by zero disclosure of confidential information. If Anton does not know the magic words that open the door in the cave, then by watching Anton, Boris will not be able to find out anything. If Anton knows the magic words, then even a detailed video recording of the experiments performed will not help Boris. Firstly, because when watching it, Boris will only see what he has already seen live. And secondly, because it is almost impossible to distinguish the video recording falsified by Boris from the genuine one.

The zero-knowledge proof protocol works due to the fact that without knowing the magic words, Anton can only exit from the side from which he entered. Therefore, only in 50% of all cases will Anton be able to deceive Boris by guessing from which side he will ask him to leave. If the number of experiments is n, then Anton will successfully pass all tests only in one case out of 2 n. In practice, you can limit yourself to n=16. If Anton correctly carries out Boris's orders in all 16 cases, then he really knows the secret of the magic words.

The cave example is illustrative, but has a significant flaw. It is much easier for Boris to see how at point B Anton turns in one direction and then appears from the opposite side. A zero-knowledge proof protocol is simply not needed here.

Therefore, let’s assume that Anton doesn’t know some magic words, like “Open Sesame.” No, Anton has more interesting information - he was the first to find a solution to this difficult problem. To prove this fact Boris and Anton do not have to demonstrate their decision to everyone. All he needs to do is apply the following zero-knowledge proof protocol for confidential information:

1. Anton uses the information he has and the generated

random number to reduce a hard problem to another hard problem isomorphic original problem. Anton then solves this new problem.

2. Anton uses the bit prediction protocol for the bit found on

step 1 of the decision, so that later, if Boris needs to familiarize himself with this decision, Boris could reliably verify that the decision presented by Anton was actually received by him at step 1.

3. Anton shows a new difficult problem to Boris,

4. Boris asks Anton to prove that two difficult problems

(old and new) are isomorphic, or provide the solution that Anton should have found in step 1 and prove that it is indeed a solution to the problem to which Anton reduced the original problem in the same step.

5. Anton fulfills Boris’ request.

6. Anton and Boris repeat steps 1-6 n times, where n is a parameter

protocol.

Hard-to-solve problems, the method of reducing one problem to another, as well as random numbers should, if possible, be chosen so that Boris does not have any information regarding the solution of the original problem even after repeating the steps of the protocol.

Not all hard-to-solve problems can be used in zero-knowledge proofs of confidential information, but most of them are quite suitable for such purposes. Examples include finding a Hamilton cycle (a closed path passing through all vertices of the graph only once) in a connected graph and determining graph isomorphism (two graphs are isomorphic if they differ only in the names of their vertices).

Post-quantum zero-knowledge proofs– zero-knowledge proof protocols that are quantum-resistant.

Quantum-resistant systems/protocols are considered to be those systems/protocols that cannot be compromised using computing power quantum computer. The term quantum stability is introduced in post-quantum cryptography.

Contents

Statement of the information security problem

Quantum computers have great computing power. On this moment This direction is rapidly developing. With all this, the increase in research in this area (which is certainly a positive feature) reduces the reliability of modern cryptosystems in use, for example those that are based on factorization of integers or discrete logarithm problems.

ZKBoo

ZKBoo relies on the Confidential Computing (MPC) protocol. ZKBoo's idea is to replace MPC with a (2,3)-decomposition of the chain.

The proof is constructed according to the following scheme:

1) P computes f using decomposition and captures 3 states;

2) P sends the obligations and outputs of each state after using the FS heuristic;

3) The answer comes which 2 of 3 states are opened for each N launch;

4) V checks the output data for completeness of recovery; checks that each of the 2 open states is calculated correctly and calculates whether the call is calculated correctly.

The number of launches is chosen so that the error is insignificant, so that the possibility of an attacker getting into all launches is minimal.

ZKB++

ZKB++ is modified version ZKBoo.

1) Decomposition: in View1 and View2, P must include k(i) since x(i) can be deterministically computed by V;

2) Not including input data: input shares generated pseudorandomly using k(i) do not need to be included in the representation when e = 1. Data only need to be sent if e = 2 or e = 3, since for third state, the input data cannot be obtained from the initial one, since it is generated pseudo-randomly using k(i);

3) Without sending obligations;

4) No additional random sequence for obligations;

5) Without considering the output: the output can be calculated from the rest of the evidence;

6) Not including state-representation: V can recalculate state considering only random k(e) and k(e+1).

The complete ZKB++ circuit is shown in Figure 1.

Improvements made to optimize ZKBoo reduce the size of proofs by a factor of 2 for ZKB++.

Beautiful terminology is one of the advantages modern cryptography. Punk band names or Tumbl nicknames are what cryptographic terms like "hard-core predicate", "trapdoor function", or "impossible differential cryptanalysis" sound like ). It seems to me that a term like " zero knowledge".

But the beauty of the term "zero knowledge" leads to its abuse. People immediately think that zero knowledge is becoming synonymous with “very, very secure.” Because of this, these words are added to the names of security systems and anonymous networks– which actually has nothing to do with the zero-knowledge protocol.

We said all this in order to emphasize the main thing: zero knowledge proof is one of the most powerful tools cryptography ever developed. But, unfortunately, it has been studied relatively little. Let's try to give a non-mathematical explanation of what makes ZK (zero knowledge) so special. Here we will talk about some current ZK protocols.

Origin of zero knowledge

Before Goldwasser and others, most work in this area focused on proof-of-correctness systems. That is, in cases where an attacker - the Tester - tries to pull a trick on the Controller by slipping him a false value. But Goldwasser, Micali and Rakoff looked at the opposite side of this problem. Instead of worrying only about the Tester, they asked: what happens if you don't trust the Controller?

They were especially concerned about the possibility of information leakage. Specifically, they asked how much additional information will the Controller receive in the course of proving the fact that the statement is true?

It is important to note that this is not just a theoretical interest. There are real practical applications, in which these things are important.

Here is one of them: imagine that the client is in real world wants to log into the web server using a password. Standard approach to the "real world" problem involves storing a hashed version of the password on the server. Login in such a system is considered as a kind of "confirmation" that the hash of the provided password is the output of the hashing function of the current password. And, more importantly, as "confirmation" that the client actually knows the password.

Most systems in the real world implement this "confirmation" in the worst possible way. The client simply passes the original password to the server, which recalculates the password hash and compares it with the stored value. The problem here is obvious: the server receives my password in the most attractive form for hackers, “plain text”. And the user can only pray that the server’s security is not compromised.

What Goldwasser, Micali, and Rakoff proposed was the hope for new methods of confirmation. If fully implemented, zero-knowledge proofs will be able to provide proof in the problem described above. Without divulging a single bit of information that corresponds to the fact that “this statement is true.”

"Real World" Example

So far we have been talking about rather abstract things. Let's go ahead and bring real example(slightly crazy) for zero knowledge protocol.

To help me explain this example, let's imagine that I'm a telecommunications mogul and I'm in the process of rolling out a new cellular network. My network structure presented below in this graph. Each vertex of the graph represents a mobile transmitter, and the connecting lines (edges) show where two cells overlap. In these locations, transmitters will interfere with each other. The good thing is that my network structure allows each tower to be configured in one of three frequency bands to avoid such influence.

Thus, the task of deploying my network comes down to assigning its own lane to each tower in such a way that the cells do not intersect. If we use colors to represent frequency bands, we can quickly develop one solution to this problem:


Of course, many of you have already noticed. What I have described here is only special case famous theoretical problem. It is called the “three-color graph coloring problem.” You may also know about one very interesting feature this problem. For some graphs, it is very difficult to find a solution or even determine that such a solution exists. The problem of coloring a graph in three colors (and searching for an answer to the question whether such a solution exists for this case), as is known, belong to the category of NP-complete problems.

It goes without saying that the above toy riddles are easy to solve. But what if it weren't so easy? For example, imagine that my network cellular communications was very large and complex. In this case, I would not have enough computing power to find a solution. In this case, it would be advisable to delegate the problem to someone else. To those who have enough computing power. For example, I could ask Google to solve this problem.

But there is one problem.

Let's assume Google will give me as much as I need. computing power in order to find a solution to the correct coloring of my graph. I'm certainly not going to pay them until I'm sure they have this solution. At the same time, Google is not going to provide me with a copy of the decision until I pay for it. We're at a dead end.

IN real life Common sense tells us the answer to this dilemma involves lawyers and escrow accounts. But here we are not talking about real life, but about cryptography. And, if you've ever read the work of cryptographers, you already know that the only way to find a solution to a problem is to come up with something completely crazy. technical solution. Crazy technical solution (with hats)

Let's imagine that Google employees consulted Silvio Micali from MIT, and he asked his colleagues Oded Goldreich and Avi Wigderson. After consulting, they developed a protocol that is so beautiful that it does not even require computers. A large warehouse with a supply of colored pencils and paper will be used. You will also need a large number of hats.

Let's see how it works

First, I'll go into a warehouse space and cover the floor with paper to create a space to represent my cellular network. Then I will leave the warehouse. Google can now come in, shuffle the collection of pencils and choose any three colors (like red/blue/purple, as in this example) and the color to represent the solution. Note that what specific color is used does not matter.

Before leaving the warehouse, Google covers each top with a hat. When I return, this is all I will see:


Obviously, this approach will perfectly protect Google secret. But this won't help me at all. It's possible that Google is filling in the colors at random and therefore in the wrong order. Or maybe they didn't fill out anything at all.

To convince me that the problem was actually solved, Google gives me the opportunity to “check” their graph coloring result. I have the right to choose, in random order, one face of this graph (that is, along one line between two adjacent hats). Google will remove these two hats, showing me a small part solutions:


Please note that my experiment has two possible outcomes:

If two vertices shown have the same color (or no color at all!) then I know for sure that Google is lying to me. In this case, I won't pay Google even a cent. If the two vertices shown have various colors, which means Google is not deceiving me in this case.

I hope the first statement is obvious. The second will require more detailed explanation. The problem is that even if the experiment was successful, Google can still deceive me. Still, I only looked under two hats. If there are E distinct edges in the graph, then Google is likely to provide an incorrect solution. For example, after one test they are fooling me with probability (E-1)/E (which for 1000 edges would be 99.9%).

Luckily, Google has the answer to this question. We'll just run the protocol again!

We take blank paper to create a new copy of the graph. Google is now taking on a new random shuffle of three colored pencils. They then fill in the graph with the correct solution, but using a new random set of three colors.

Again we carry out the operation with hats. I go back and randomly select new vertices. Let's look again at the logic described above. This time, if all goes well, I'll be much more confident that Google is telling me the truth. This is because in order to fool me, Google would have to get lucky twice.

Such an event can happen - but its probability will be even lower. The chance of Google fooling me twice in a row is (E-1)/E*(E-1)/E (or about 99.8% chance for our 1000-edge example).

Fortunately, we don't have to stop at two attempts. We will try again and again until we are convinced that Google is, with a high degree of probability, telling the truth.

I urge you not to take my word for it. Try this Javascript and see for yourself.

Note that I'm never completely sure that Google is being honest - there's always a tiny chance that they're deceiving me. But after large quantity iterations (E^2, as in our case), I will eventually get to the point where Google is deceiving me with a negligible probability - sufficient for practical application. After that, I can easily give the money to Google.

It is important that Google is also protected in this case. If I try to learn anything by saving and analyzing notes between runs of the protocol, I will fail. This is because Google uses random colors in each iteration. Limited information, which I can get will not give me anything. There is no way for me to tie this data together.

What makes this example "zero knowledge"?

I'm trying to convince you that this protocol does not allow information to leak out Google solution. But you don't have to take my word for it! The first rule of cryptographers is to never believe such things without evidence.

Goldwasser, Micali, and Rakoff proposed three criteria that every zero-knowledge protocol must meet. Informally they can be described as follows:

Completeness. If Google is telling me the truth, then I should have strong evidence of it (high probability evidence). Reliability. Google can only convince me if it actually tells the truth. “Zero disclosure” (this criterion is really called that). I don't have to find out anything more than the Google solution I got.

We have already discussed the arguments for completeness. The protocol will eventually convince me (with a negligible chance of error) if I run it enough times. Reliability is also quite easy to demonstrate. If Google tries to deceive me, I will detect it in the vast majority of cases.

The most difficult property remains “zero knowledge”. To understand it, let's do a rather strange thought experiment.

Thought experiment with time machines

First, let's start with a crazy assumption. Let's imagine that Google engineers are not as smart as people think they are. They work on this problem week after week, and they can't find a solution. Twelve hours before the deadline, Google gets desperate. They decide to trick me and say they have a coloring book for the Count (even though they don't actually have one).

Their idea is to stop by the GoogleX workshop and borrow Google's time machine prototype. The original plan was to go back a few years and use the extra work time to find new solutions to the problem. Unfortunately, as with many other Google prototypes, the time machine had a number of limitations. Most importantly, she can only travel back in time four and a half minutes.

Thus, there is no need to use a time machine to increase the time it takes to complete a job. But still, it turns out that this very limited technology can be used to deceive me.

I really don't know what's going on here. But it seems like it would come in handy.

The plan turned out to be devilishly simple. If Google doesn't know how the graph should be colored, they'll just randomly color the paper and then cover the tops with hats. If, by luck, the vertices turn out to be different colors, we will all breathe a sigh of relief and continue working, believing that everything is fine.

Suppose I take off a pair of hats and discover two vertices of the same color. If the protocol is implemented in the usual way, Google will fail in this case. But we use a time machine. Whenever Google finds itself in an awkward position, it simply corrects the situation. That is, specially appointed Google employee pulls the switch, time rewinds back four and a half minutes, and Google team colors the graph with a completely new random solution. After that, they fast forward time and try again.

Essentially, the time machine allows Google to "fix" any failed login without me suspecting anything. Since bad results will only occur 1/3 of the time, the expected execution time of the protocol (from Google's perspective) is only marginally longer than if the protocol were executed fairly. From my point of view, I don't even know that this time travel is happening.

This last point is the most important. From my point of view, I interact with the protocol in the same way whether there is a time machine or not. And yet, it’s worth noting again - in the time machine version, Google has absolutely no information about how to color the graph.

And what follows from this?

What we just showed is a theoretical example. In a world where time only moves forward and no one can fool me with a time machine, the hat protocol works correctly. After E^2 rounds of running it, I should be confident (not completely, but with a negligible probability of cheating) that the graph is colored correctly and Google has provided the correct information.

If time can be manipulated (in particular, if Google can "rewind time"), then it can fake the protocol, even if it has no information at all about how the graph should be colored.

From my point of view, what is the difference between these two protocols? They have the same statistical distribution and transmit the same amount useful information.

Believe it or not, this proves something very important.

In particular, it is assumed that I (the Verifier) ​​have some kind of strategy that will allow me to "extract" useful information about how Google does the coloring if the honest protocol is run. Then my strategy should work just as well if I'm being fooled with the time machine. The protocol runs are, from my point of view, statistically identical. I physically cannot show the difference.

Thus, the amount of information that I will receive in the “real experiment” and the “time machine experiment” is completely identical. The amount of information Google puts into the time machine experiment is exactly zero. Therefore, even in the real world, there will be no information leakage. All that remains is to show that computer scientists have a time machine (shhh, it's a secret).

How to get rid of hats and time machines

Of course, we wouldn't actually want to run the protocol using hats. And even Google (most likely) doesn't have a real time machine.

To tie all these things together, we need to bring this protocol into the digital world. To do this, we need a digital equivalent of a “hat”: something that hides the digital meaning, and at the same time “binds” the meaning and its creator (creating a “commitment”), so that he cannot change his mind.

Luckily, we have a great tool for this application. It's called a digital commitment scheme. A commitment scheme allows one party to create a "commitment" for a message, keeping it secret, and later open the "commitment" to see what's inside. They can be built from various components, including from the strong cryptographic functions hashing.

Having received the commitment diagram, we assembled all the components to launch in in electronic format zero knowledge protocol. The tester first encodes the coloring of the vertices as a set digital messages(for example, the numbers 0, 1, 2), then generates digital obligations for each. These obligations are forwarded to the Controller. When the Controller opens one edge, the Tester reveals the values ​​for the obligations that correspond to the two vertices.

This is how we managed to get rid of the hats. But how do we prove that this protocol is zero-knowledge?

But we are now in digital world. Therefore, we do not need a time machine to confirm that the protocol works. The key trick is that the protocol will not work between two people, but between two different computer programs (or, more formally, two probabilistic Turing machines.)

Now we can prove the following theorem: if the Controller is going to extract useful information from a normal protocol run, he will get the same amount of useful information as from a "cheat" run of the protocol, where the Tester did not put in any information to begin with.

We are talking about computer programs, and they can “go back in time.” For example, consider using a virtual machine that can take snapshots. Example of "rewind time" using virtual machines. Initially, the virtual machine follows one path, returns to original state, then execution branches to a new path.


Even if we don't consider virtual machines, any computer program can be "rewinded" to previous state, simply by running the program from the very beginning and sending the same data as input. Provided that the inputs (including input random numbers), are fixed, the program will always follow the same execution path. So we can rewind the program, starting it from the start and “forking” its execution when the program reaches a certain desired point.

Ultimately, what we get can be represented as a theorem. If for the Controller there is computer program, which successfully extracts useful information from the Tester (by interacting with the protocol), then the Tester can use the trick of rewinding the program, feeding the Controller random solutions. It uses the same logic we already applied above: if the Controller can successfully extract information by running a real protocol, then it will obtain the same amount of information by running a fake protocol based on rewinding the program. But since the fake protocol does not transmit any useful data, there is no information that can be extracted. Thus, the information that the Controller can extract is always zero.

What follows from all this?

To summarize, we can say that the protocol is complete and reliable. We can talk about reliability in any situation in which both parties are not trying to deceive each other.

At the same time, the protocol has zero knowledge. To prove this, we showed that a program that the Controller runs to extract information will be able to extract data from a program that has no meaningful data. This leads to an obvious contradiction, which tells us that the protocol in any situation does not leak information.

This gives us important advantages. Since anyone can create a fake transcript (like the Google example where they tried to convince me they had a solution), I can't replay the transcript to prove my case to anyone else (like a judge). The judge will say that he is not sure that the recording was made honestly and was not edited (as in the example with Google and the use of the time machine). This means that the transcript itself does not contain any information. The protocol only makes sense if I myself took part and can be sure that everything happened in real time.

Evidence for all NPs!

For those who have made it this far, I have important news. The task of coloring a cellular network in three colors is interesting in itself, but that's not all. For real interesting thing about the three-color coloring problem is that it belongs to the class of NP-complete. This means that any other problem that belongs to the NP class can be reduced to this one.

Simply put, Goldrich, Micali and Widgerson proved that "efficient" ZKs exist for a wide class of useful problems. Many of them are much more interesting than the problem of assigning frequencies in a cellular network.

You simply find the statement (in NP) you want to test and translate it into a three-color graph coloring problem. From this point you can run a digital version of our protocol with hats.

Instead of results

Of course, it would be incredibly stupid to immediately start using this protocol in practice. Its computational cost will include overall size the initial statement and evidence, plus the cost of translating the problem into a graph, and another E^2 runs of the protocol, which are necessary to verify the correctness of the solution. In theory this is "efficient", but since the total cost of the proof will be a polynomial of the length of the input, it is not applicable in practice.

So we have so far only proven that such proofs are possible. It remains to find evidence that is practical enough for real use.

Notes

* Formally, the purpose of interactive proof is to convince the Controller that specific string belongs to any language. Typically, the Tester in tasks has unlimited power, and the Controller is limited in the ability to make calculations.

**This example is based on original solution Goldwasser, Micali and Rakoff, and tutorial example using hats designed by Silvio Micali.

****** A simple example of a commitment can be constructed using an example hashing function. To create a commitment to the value "x", we simply generate some (sufficiently long) string of random numbers, which we call "salt", and the output commitment is C = hash(salt || x). To open a commitment, you simply open "x" and "salt". Anyone can verify that the original commitment is valid by recalculating the hash again. This safe method under some moderately strong assumptions about the function itself.

We have already looked at what Zk-SNARK or zero-knowledge proof protocol is. in simple words, but in this article we would like to delve deeper into the technical part of this phenomenon.

Zero-Knowledge Proof Protocol: How does it work?

So, as a reminder, null agreement proof allows you to prove that you are the owner of some information without disclosing its contents. To implement this concept, it will be necessary to introduce 3 algorithms at once, which we will denote as GK, P and V. Let’s consider their purpose:

  • GK is a key generator that accepts input α and program C, generating two keys: a verification key for the proofer PK and a verification key for the verification VK. These keys are public to all parties involved in the proof.
  • P is a user who needs to use 3 types of input data for proof: 1) verification key PK; 2) random input X, which will be publicly available to the parties; 3) a statement that needs to be proven without revealing its meaning - W. The algorithm P itself generates a proof Proof, which will satisfy the following conditions: Proof = P (PK; X; W).
  • V is a verifier algorithm that returns a Boolean variable. It can be represented in two values: TRUE (true) or FALS (false). So, the verifier takes the following input data V(VK; X; Proof), based on which it determines whether it is true or false.

This is how a zero-knowledge proof protocol works. The only thing I would like to talk about separately is the meaning α, mentioned in the paragraph about GK. The fact is that this parameter significantly complicates the use of Zk-SNARK in real applications, because anyone who owns it can create a false Proof, to which the system will return True. The developers of ZCash, a cryptocurrency that uses zero-proof technology, have been struggling with this problem for a very long time.

One of best sides modern cryptography is its beautiful terminology. You can create as many punk bands as you like with names like Hardcore Predicate, Trapdoor Function or Impossible Differential Cryptanalysis. However, there is a term that surpasses all others. This is Zero Knowledge Proof - "zero knowledge proof".

The term "zero knowledge" is so attractive that it leads to problems. People use it incorrectly, assuming that zero knowledge is a synonym "very, very reliable security". Because of this, it is used with everything - including encryption systems and anonymization networks - that actually have nothing to do with zero-knowledge protocols.

I write this to highlight that zero-knowledge proofs are among the most powerful tools ever invented by cryptographers. Unfortunately, they are understood just as poorly. In this series of posts, I will try to clearly describe what zero-knowledge proofs are and what makes them so special. We'll also look at some zero-knowledge protocols used in the real world.

History of zero knowledge

The concept of “zero knowledge” was first proposed in the 1980s by MIT researchers Shafi Goldwasser, Silvio Micali and Charles Rackoff. They studied the problems associated with interactive proof systems - theoretical systems in which one party (the "Prover") exchanges messages with a second party (the "Verifier") trying to convince it of the truth of some mathematical statement.*

Before Goldwasser and her colleagues, work in this area mainly focused on the correctness of the proof system. In other words, the scientists looked at situations in which an evil Prover tried to “deceive” the Verifier into believing a false statement. Goldwasser, Micali and Rekoff turned the problem on its head. Instead of worrying only about the Prover, they decided to consider what happens if you don't trust To the inspector.

The specific problem they posed was information leak. Scientists have wondered how much additional information the Verifier learns during the course of a proof beyond the fact that the statement is true.

It is important to note that this was not just done out of theoretical interest - such problems have real-world applications. Here's one such scenario: Imagine a user in the real world wants to log into a web server using a password. The standard "realistic" approach to this problem involves storing a hashed version of the password on the server. Thus, logging into the server can be considered as a kind of “proof” that the password hash is the result of applying a hash function to some password and that the client is in fact knows password.

Majority real systems implement this “proof” in the worst possible way: the client simply passes the original password to the server, which calculates a hash of the password and compares it with the stored value. The disadvantage of this approach is obvious: the server learned the client's unencrypted password. Thus, modern password hygiene is largely based on the assumption that the server is not compromised.

What Goldwasser, Micali, and Rekoff proposed renewed hope for implementing zero-knowledge proofs that provably do not tell no information except for one bit that means "this statement is true."

"Real World" Example

Our discussion is fairly abstract for now, so let's look at a "real world" example of a (slightly crazy) zero-knowledge protocol.

Imagine I'm a telecom tycoon rolling out a new cellular network. My network structure is shown in the figure below. Each vertex in this graph represents a radio tower, and the edges of the graph indicate the locations where the cells overlap, that is, where signal reception may be difficult. Luckily, to prevent interference, I can assign each tower to one of three different frequency bands.

So the problem of network deployment is how to assign frequency ranges towers so that no two overlapping cells use the same frequency. By representing frequency ranges in different colors, we can quickly suggest one solution to the problem:

Of course, many of you have already realized that this is just one instance of the famous theoretical problem of coloring a graph with three colors. This problem is interesting because for some graphs it is difficult to find a solution or even know does it exist it. In fact, three-coloring—more precisely, finding out whether it is possible to color a particular graph with three colors—is in the complexity class NP.

Obviously our toy example is easy to solve by hand, but what if it's not? Imagine, for example, that my cellular network is very large and complex - so much so that the computing resources at my disposal are insufficient to find a solution. In this case it would be desirable outsource the problem to someone else who has more computing resources. For example, I can ask my friends at Google to solve it for me according to the specification.

But this leads to a problem.

Let's say Google dedicates a significant portion of its computing infrastructure to finding a way to color my graph. Of course, I'm not going to pay them until I know that they actually found such a way, but at the same time Google time is not going to give me a copy of the decision until I pay them. We are at a dead end.

There is probably a real-life way out of this impasse, including involving lawyers and using escrow accounts, but this is not a blog about real life, but about cryptography. If you've ever read any crypto research, you'll realize that The right way to solve this problem is come up with an absolutely crazy technical solution.

Crazy technical solution (with hats!)

Note that I will never be absolutely sure that Google is honest - there will always be at least a tiny chance that Google is deceiving me. However, after a large number of iterations (namely E^2 ) my confidence will increase to the point where the likelihood of Google being deceived is negligible - quite small, so that in everyone practical problems she could be neglected. This way I can safely give Google my money.

We need to know that Google is safe too. Even if I'm trying to learn something about Google's solution by keeping notes between runs of the protocol, it shouldn't matter. I'm stumped by Google randomizes choice of colors in different iterations of the protocol. The limited information I receive is useless and I have no way to tie data from different iterations.

What makes this "zero knowledge"?

I previously stated that this protocol prevents Google's decision from being leaked, but don't let me get off that easy! The first rule of modern cryptography is never trust people who make such claims without proof.

Goldwasser, Micali, and Rekoff proposed three properties that any zero-knowledge protocol must satisfy. Informally they can be expressed as follows:

  1. Completeness (completeness). If Google is telling the truth, it will eventually be able to convince me (at least with a high probability).
  2. Correctness (soundness). Google can convince me of the right decision only in that case, If actually tells the truth.
  3. Zero knowledge (zeroknowledgeness) . I don't hear anything more about Google's decision.

We have already discussed the argument for completeness. The protocol is such that after enough iterations Google will eventually convince me (with a negligible chance of error). Demonstrating the correctness of this protocol is also quite easy. If Google ever tries to deceive me, I will almost certainly discover the betrayal.

It's the third property that's problematic here, but to understand this we need to do a very strange thought experiment.

Thought experiment (with time machines)

Let's start with a crazy hypothesis. Imagine that Google's engineers aren't as skilled as they seem. They've been working on my problem for weeks and months, but they can't find a solution. When there are 12 hours left before verification, Googlers become desperate. They decide deceive me so that I think they have a graph coloring when in fact they don't.

Their idea is to infiltrate a GoogleX workshop and borrow Google's time machine prototype. They initially plan to travel back in time several years to use the extra work time to solve the problem. Unfortunately, like most Google prototypes, the time machine has some limitations: it only allows you to travel back in time by four and a half minutes.

Thus, the option of using a time machine to obtain additional working time is eliminated. However, it turns out that even this very limited technology can still be used to trick me.

I have no idea what's going on here, but I thought the picture fit.

The plan is damn simple. Because Google doesn't really know valid graph coloring, Google simply colors the paper with random colors and then picks up the hats. If by pure chance we come across a pair of different-colored vertices, everyone will breathe a sigh of relief and we will continue to follow the protocol. So far so good.

However, I will inevitably sooner or later raise a pair of hats, discover two peaks one colors and catch Google cheating. And this is where the time machine comes in. When Google finds itself in this strange situation, it simply eliminates it. In other words, a special “Googler” turns the toggle switch, “rewinds” time by about four minutes, and the Google team completely recolors the graph with a new random solution. After that they start the time again, allowing me to try again.

Essentially, the time machine allows Google to "recover" from any bad things that happen during the execution of their fake protocol, which makes everything seem normal to me. Because the good try challenge the protocol occurs only in about 1/3 of attempts, the expected execution time of the protocol (from Google's point of view) is only moderately longer than the execution time of the fair protocol. As for me, I don't even suspect that time travel is happening.

The last point is the most important. In fact, from my point of view, not knowing that a time machine is involved leads to exactly the same interaction as reality. Statistically they are identical. And yet, again it is worth noting that in the version with the time machine atGoogle There is absolutely no information on how to color a graph.

What's all this for?

What we just looked at is an example simulations. Note that in a world where time only moves forward and no one can fool me with a time machine, protocol with hats correct, that is, after E^2 rounds, I have to be convinced (beyond a negligible probability) that the graph can in fact be colored and that Google has a valid solution.

We've just shown that if time doesn't just flow forward - more precisely, if Google can "rewind" my idea of ​​time - then Google can fake the actual running of the protocol even without information about the actual coloringegraph.

From my point of view, what is the difference between the two protocol options? If we consider their statistical distribution, there is no difference - both report approximately the same amount of useful information.

Believe it or not, this proves something very important.

Let's say that I (the Verifier) ​​have some strategy that "extracts" useful information about Google's coloring after observing the execution of an honest protocol. Then my strategy should work equally well in cases where I am being deceived by the time machine. From my point of view, the protocol runs are statistically identical - I physically cannot tell the difference.

So, if the amount of information I can extract in the "real experiment" and the "time machine experiment" is identical, but the amount of information Google provides in the "time machine experiment" is exactly zero, this suggests that even in the real world, the protocol should not produce any useful information. All that remains is to show that computer scientists have time machines. Yes, we have them! (This is a closely guarded secret.)

Getting rid of hats (and time machines)

Of course, we don't actually want to do the hat protocol and even Google doesn't have a real time machine (probably).

To tie everything together, we first need to take our protocol into the digital world. This requires us to construct the digital equivalent of a "hat" - something that both hides the digital meaning and yet "binds" the creator to it ("obliges") so that he cannot change his mind after the fact.

Luckily, we have the perfect tool for this, called the Digital Commitment Scheme. A commitment scheme allows one party to "commit" to specific message, keeping it in a "secret envelope" and then later "opening" the pledge envelope to reveal what's inside. Commitments can be created from a variety of ingredients, including (strong) cryptographic hash functions.***

With a commitment scheme, we have everything we need to electronically execute a zero-knowledge protocol. The prover first encodes the colors of the vertices as a set of digital messages (for example, using the numbers 0, 1, 2), and then generates digital obligations for each of them. These obligations are forwarded to the Reviewer. When the Verifier challenges a decision for an edge, the Prover simply publishes the values ​​for the commitments corresponding to the two vertices.

So we've eliminated the hats, but how can we prove that it's a zero-knowledge protocol?

Fortunately, we are now in the digital world and we don't need real car time to prove claims about this protocol. The main trick is to indicate that the protocol will not work between people, and between two different computer programs(or, in more formal terms, probabilistic Turing machines).

Now we can prove the following theorem. If a computer program could be created (for the Verifier) ​​to extract useful information after participating in a protocol run, it would be possible to use a "time machine" with that program so that the program would extract the same amount of useful information from a "fake" protocol run when the Prover does not provides no information.

And since we are now talking about computer programs, it is obvious that rewinding time is not at all an extraordinary possibility. In fact, we rewind computer programs all the time. Let's take, for example, software for virtual machines with a snapshot function.

An example of rewinding snapshots of virtual machines. The original VM is played forward, reverted to the original snapshot, and then another branch is executed.

Even if you don't have fancy virtual machine software, any computer program can be "rewinded" to more early state, simply starting it from the beginning again and passing it exactly the same input. If the input - including all random numbers - is fixed, the program will always follow the same execution path. You can rewind a program by simply starting it from the beginning and "forking" its execution when it reaches some desired point.

Ultimately we get the following theorem. If there is some Verifier computer program that successfully extracts information by interactively executing this protocol with some Prover, we can simply use the rewind trick to express a commitment to a random solution, and then "trick" the Verifier by rewinding the program until we pass trial. The same logic applies as above: if such a Verifier were to succeed in extracting information after executing the actual protocol, then it could extract the same amount of information from a simulated rewind protocol. But since no information is passed to the simulated protocol, there is nothing to extract. Therefore, the amount of information that the Reviewer can extract is zero.

Okay, so what does all this mean?

So, from the above analysis we know that the protocol is complete and correct. The argument for correctness stands whenever we know that no one is manipulating time - that is, that the Verifier's program is running normally and no one is rewinding its execution.

At the same time, the protocol also provides zero knowledge. To prove this, we showed that any Verifier program that successfully extracts information must also be able to extract information from a rewind protocol run when no information is initially available. This leads to an obvious contradiction and tells us that information leakage when executing such a protocol is impossible in both situations.

All this has important advantage. Since anyone can easily “fake” a protocol entry even after Google system proves to me that she has a solution, I cannot replay the transcript to prove anything to anyone else (say, a judge). This is because the judge would have no guarantee that the video was recorded fairly and that I did not edited it in the same way that the Google system could do with the help of a time machine. This means that the log entry itself contains no information. The protocol only makes sense if I myself participated in it and am sure that it happened in real time.

Evidence for all NPs!

If you've read this far, I'm sure you're ready for some big news. Namely, you're about to learn that tricolor cellular networks aren't that interesting of a problem—at least not on their own.

The 3-color problem is interesting primarily because it belongs to the class of NP-complete. Informally speaking, what is surprising about these problems is that any other problem from the class N.P. can be converted to an instance of this problem.

This result, in one fell swoop - thanks to Goldreich, Micali and Widgerson - proves that "efficient" zero-knowledge proofs exist for a wide class of useful statements, many of which much more interesting than the assignment of frequencies cellular networks. You simply find the statement (in the NP class) you want to prove, such as our hash function example above, and then convert it into an instance of the 3-color problem. After that, you simply run the digital version of the protocol with hats.

Let's sum it up

Of course, actually running this protocol for interesting statements would be very strange and stupid, because it would require a huge amount of work to be done. In theory this is "efficient" because the total proof cost would grow polynomially with the size of the input data, but in practice it would be completely different.

Thus, we have so far only shown that such evidence possible. The search for evidence practical enough to be used in the real world is left to us.

In other posts I will talk about some of them - namely effective proofs of various useful statements. I will give some examples (from real applications), where such techniques were used. Also, at the request of one of the readers, I will tell you why I don’t like the SRP (Secure Remote Password) protocol so much.

Notes

* Formally, the goal of an interactive proof is to convince the Verifier that a particular string belongs to some language. Typically, the Prover is very (unlimitedly) powerful, and the Verifier is limited in computing resources.

** This example is based on the original solution by Goldwasser, Micali and Rekoff, and the tutorial example with hats is based on the explanation given by Silvio Micali. I am only responsible for errors, if any.

*** A simple example of a commitment can be constructed using a hash function. To create a commitment for the value "x", simply generate some (sufficiently long) string of random numbers, which we'll call "salt", and print the commitment C = Hash(salt || x) . To make a commitment public, you simply display an "x" and a salt. Anyone can verify that the original commitment is valid by recalculating the hash. It is safe as long as some (moderately strict) requirements for the function itself are met.

Matthew Green