## Sharing a Secret

There is more to cryptography than simply hiding information. One very useful extension of the field is into the area of information dissemination. Imagine that you run a retail store with several employees. This store has a code for it’s alarm system that must be set every night and cleared every morning. Obviously, you can’t trust just any employee with the alarm key, because then they could enter the store at any time! But you’re also a very busy person that can’t be bothered with actually opening and closing the store every day. You would like to make a rule that there needs to be at least two people present at the store during opening and closing to prevent theft. How are you going to implement this rule? Fortunately you learned enough algebra is high school to do this! You can split the key code into even but distinct halves, and then hand one out to each employee. This way two employees would have to join their part together in order to open or close the door.

That example isn’t very interesting. How about enforcing the following rule: four employees can open a door, or two employees and one manager, or two managers. Now we need to split the key more finely, so that each manager gets two shares, each employee gets one, and it requires four parts to reassemble the key. Now that we understand the problem, let’s discuss the math.

The trick is called the Shamir Threshold scheme and uses LaGrange Interpolation and the knowledge that two points are required to infer a line, three points for a quadratic, etc. You simply define a polynomial with the secret number as the 0 root coefficient, and random numbers for the rest. Let *w* be the number of participants in the system, and *t* be the number of shares needed to reassemble a secret message *M*. Construct the follow polynomial:

This polynomial defines a graph and the shares of the secret will be points along the graph. Calculate a point for each person (w): (1, f(1)) (2, f(2)) (3, f(3)) … (w, f(w)). Each person now has a unique point on this graph. They can then take *t *number of points and calculate the LaGrange coefficients to reassemble the polynomial, thus yielding the secret.

LaGrange states that given a set of x,y points

then

where

Wikipedia has a good page on LaGrange interpolation if you need more of an explanation that what I gave.

Let’s see an example. We will take the first scenario I outlined: there are 10 employees, at least two must be present to lock and unlock the door. Let’s say the key code for the door is 25.

First we generate a polynomial: f(x) = 25 + 73x^1. The first coefficient is the secret, the second is a random number.

The second step is to generate unique points for each participant:

(Note: All of the arithmetic here is done modulus some random *n*)

Modulus n: 163

(1, 98)

(2, 8)

(3, 81)

(4, 154)

(5, 64)

(6, 137)

(7, 47)

(8, 120)

(9, 30)

(10, 103)

As you can see, none of the participants know the secret number, but if you take two of their points you can calculate the secret. Taking points (1,98) and (2,8) gives the set of LaGrange constants as {-324, -1}. Multiply -324 by 98, -1 by 8 and mod them by *n*=163 and you will be left with the secret, 25.

A more complicated example: 10 participants, 4 shares are required to reassemble. Same secret, 25.

Modulus n: 181

Polynomial: 25 + 119x^1 + 159x^2 + 106x^3

(1, 47)

(2, 118)

(3, 150)

(4, 55)

(5, 107)

(6, 37)

(7, 119)

(8, 84)

(9, 25)

(10, 35)

Now if you try to reassembly the secret with only 3 points, then you will get an incorrect value. Points (1, 47) (25, 107) (9, 25) will calculate a secret value of 40. Point (3,150), and (6, 37) will give you 82. Only if you select four or more point along the graph will you be able to calculate the correct secret.

Here is some Java code if you would like to play with this.

If you just want to see the program run, here is a pre-built JAR.

To run the JAR, use the command ‘java -jar secret.jar” provided you have a version of the JRE installed.

## Zero-Knowledge Proofs

I think I’m going to make Sundays “Learn About Cryptology Day”. Here’s the first installment: Zero-Knowledge Proofs. ZKP is an interactive process in which one party can prove that they know a secret without revealing any information about that secret. In doing this, a ZKP must satisfy three criteria:

- Must be
**complete**in the sense that if a person follows the protocol and passes then they have successfully proved that they are honest (within reasonable probability) - Must be
**sound**such that a dishonest party cannot fool anyone (within reasonable probability) - Must be
**zero-knowledge**meaning it cannot reveal any information about the secret, thus a dishonest party cannot learn anything while faking the process.

### Cave Example

To demonstrate a hypothetical system I need to introduce some characters. ZKPs are usually a two party systems, there is a *prover* (Peggy) whose goal is to convince a *verifier* (Victor) something. In this example, Peggy will know a secret that opens a door in a cave. The cave is circular and there is a door on the far end that blocks the path.

Victor would like to purchase this password from Peggy, but he doesn’t want to give her any money until he is convinced that she knows the password. Peggy is not about to tell him anything about the password since that would devalue it. Together, they devise a system that will convince Victor that Peggy knows the password but satisfy Peggy’s need to secrecy.

Peggy enters the cave by selecting a path, A or B, at random. Victor does not know which direction she chose. Once Peggy is beyond the bend, Victor enters the front of the cave, picks a direction, A or B, and shouts it to Peggy. She must exit the cave in this direction. If she has to pass the door to do this, then she must unlock the door with the password. There is a probability of 0.5 that Victor will chose the path that Peggy entered and she can simply turn around an leave. This at first might seem like it ruins the whole procedure, but consider the probability of Peggy successfully anticipating Victor’s request say, 100 times (0.5^100 = 7.8×10^-31). The chances of this happening are sufficiently low to be considered within reasonable probability. The protocol is shown below (images from Wikipedia)

NP Problems

A more realistic implementation of ZKP is done with NP-complete problems. NP-complete problems are a set of problems that are computationally difficult to solve, but easy to verify given then question and the solution. They are a subset of NP (**N**on-deterministic **P**olynomial time) class problems. A common implementation of ZKP with an NP-complete problem is to use Isomorphic graph transformations and Hamiltonian cycles. Given a graph *G*, a Hamiltonian cycle is a path around *G* that visits each vertex exactly once and returns to the starting vertex. There are many different cycles for each graph, provided the graph has a cycle to begin with. In this example, Peggy knows a graph *G*, and a Hamiltonian cycle*. *Victor knows the graph, but not the cycle. Peggy aims to convince Victor that she knows the cycle without revealing anything about it.

- Peggy creates an isomorphic graph of G, call it
*H*, - Victor now asks Peggy to do one of two things: show that
*H*is isomorphic to*G*, or give a Hamiltonian cycle to*H*. By revealing the cycle of*H*Peggy proves that she knows the cycle of*G*, because it is trivial to transform the cycle of*G*to that of*H*if*G*and*H*are isomorphic. - If Victor asked Peggy to show that
*H*is isomorphic to*G*then she must provide the vertex translations needed to transform*H*into*G*. - If Victor asked Peggy for a Hamiltonian cycle for
*H*then she transforms the cycle that she has for*G*and sends it to Victor. This reveals nothing about the cycle of*G*since Victor won’t have the vertex translations and doesn’t know exactly how*H*is isomorphic to*G*.

This problem is exactly the same as cave, with a probability of 0.5 that Peggy will be able to fool Victor at each step. Any NP-complete problem can be adapted and used in a ZKP, provided there hasn’t been a break through that allows NP-complete problems to be solved efficiently (P=NP). The security of these systems rely on the number of times the system is repeated, each time making it more and more difficult for Peggy to continue to fool Victor.

There are many adaptations of ZKP, including a non-interactive version, and research is ongoing trying to find ways to incorporate these into authentication systems and the Internet. Post questions in the comments section below.

## Big-Oh

The other day I was reading a certain social news aggregator and I came across a link to a blog about a new functional language being built off of .Net, called F#[I did it with .Net: Why I love F#]. I haven’t looked into F# very much so I can’t say much about it. I can tell you that it is a functional language being developed at Microsoft Research based of the OCaml language and using the .Net 2.0 runtime. What was more interesting was the conversation that the blog submission generated. The post was about the time complexity of a certain function in F# compared to a more tradition imperative language. The discussion brought to light how confusing time-complexity measurement can be for people, since it seemed most there didn’t know what they were talking about. So gather round folks for a riveting discussion about algorithm performance rating.

When computer scientists say that a certain algorithm is *fast* they are usually talking about its asymptotic time complexity. That is to say, they are measuring the number of steps an algorithm takes in respect to its input size. To simplify things, we imagine a computer where each basic step takes a constant amount of time. Now we can judge an algorithm based on how many times a basic step is performed. What we are most interested in is the upper bound of the growth rate, in other words, the amount of steps it takes in the worst case scenario. This is often written in Big-Oh notation (ex. O(n), to be read as “order n”). Here is a graph that shows the growth rate of the most common measurements:

O(1), O(logN), O(N), O(N^2), O(2^N)

What this graph is showing is the time it takes an algorithm to complete as the input size increases. As you can see, O(2^N) grows very rapidly. This is an example of a very very slow algorithm. Linear time, O(N), is considered acceptable for most problem sets, and algorithm researchers are thrilled if they can get it any lower.

I’m going to try to explain this with a real example and some code. To do this I need to introduce a data structure known as a *linked list*. The point of a linked list is to hold a collection of similar pieces of data in an efficient way. The data structure consists of a series of nodes that has a value and pointer to the next node in the list. Here is a node object in c#:

public class Node { public int value; public Node next; public Node() { next = null; } }

To create a list of these nodes, all you need to do is make a pointer to the first node, or the HEAD of the list. This way the link list can be scattered throughout memory, and nodes can be added and removed very efficiently. Here’s a picture to demonstrate:

If we start at the HEAD node and follow all of the pointers, then the list would be:

1 7 10 -5 12

Like any data structure, linked-lists are good for certain tasks and bad for others. For example, to find a particular node you would have to start at the HEAD and walk through the list until it is found. In a worst case scenario you would have to walk through the entire list which would take *n* steps (*n *being the number of nodes). This operation would have a time complexity of O(n). Here’s is some typical find functions in c#

class LinkedList { Node head; int getIndex(int value) { int index = 0; Node temp = head.next; while (temp != null) { if (temp.value == value) return index; temp = temp.next; index++; } return -1; } Node getNode(int index) { int counter = 0; Node temp = head.next; while (temp != null) { if (counter == index) return temp; temp = temp.next; counter++; } return null; } }

As you can see, both of these functions require some looping. In fact, they will loop until either the requested node is found, or the end of the list is reached. Obviously, this is better than an O(2^N) algorithm, but it still isn’t all that fast. To contrast, adding and removing elements are two operations that linked lists perform very quickly. To add a node we simply adjust what the HEAD node is pointing to, and to delete a node we take the value and pointer from the next node in the list. This is kind of hard to imagine, so here is some diagrams:

First, adding a node:

Second, deleting a node:

The difference between these functions and the previous two is that they don’t have to loop through the list when they run. The worst case in either function is that it takes one step to perform, so they get a time complexity of O(1), otherwise known as constant time. Here is some code:

class LinkedList { Node head; void Add(int value) { Node newNode = new Node(); newNode.next = head.next; newNode.value = value; } void Remove(Node n) { if (n.next == null) n = null; else { n.value = n.next.value; n.next = n.next.next; } } }

Is it obvious now how deleting and adding a node is faster than finding one? How would you rate a function that removes a particular node at a specified index? More code:

void removeAt(int index) { Node toRemove = this.getNode(index); this.Remove(toRemove); }

Code reuse makes this functions easy! The first line calls to the function getNode(index). We’ve already learned that finding a node at a particular index is an O(n) problem. The second line removes that node, which we just said is a O(1) problem. Thus, this function has the time complexity of O(n) + O(1). When adding time complexities together, we only care about the largest complexity, and throw out the smaller one. This is because the rating is supposed to indicate how quickly the function grows and smaller complexities don’t add anything to the upper bound. That being said, the time complexity of the removeAt(index) function is O(n). Get it?

Now these are just the basics of algorithm analysis. This was intended to merely be an introduction to the are, and quick explanation of how we rate algorithms for those who don’t already know. There is an entire field of research devoted to optimizing current solution sets. Computer scientists strive to find the perfect algorithm and data structure for a problem set and without optimal data structures and algorithms you would never be able to find an email in a 5GB inbox, or have an object in Halo obey the laws of physics.

## Public Key Cryptography – RSA

In this post I plan to discuss the concept of public key cryptography and explain how the RSA algorithm works. The inspiration for this post is two fold. First, I’m currently working on a encrypted messaging application named Scytale which should be released relatively soon and secondly because I find it particularly fascinating.

Crypto-systems fall into two separate categories, symmetric and asymmetric. Symmetric systems use a single key for encrypting and decrypting. A simple example of this is the Caesarian Shift cipher, which takes the alphabet and shifts it a certain number of characters. Below is an example of a Caesarian cipher with a shift of 4 characters.

Alphabet: abcdefghijklmnopqrstuvwxyz

Shift: defghijklmnopqrstuvwxzyabc

attack at dawn = dwwdfn aw gdyq

The symmetric key used here would be 4. This key would have to be known by both parties involved, and kept secret from any enemies. Of course, this is a trivial example and not a very secure was to encrypt any information. A more modern example would be the previous encryption standard DES, or the current standard Rijndael.

Asymmetric crypto-systems are an entirely different breed. They are called asymmetric because one key is used for encryption and another for decryption. The parties involved can post their respective encryption keys publicly, and keep their decryption keys to themselves. These systems are deeply rooted in mathematics and number theory. In this post I will explain one of the more popular asymmetric systems, RSA.

The security of RSA is based on the difficulty of calculating the prime factors of large composite numbers. The system has three components: Key generation, encryption, and decryption.

Key Generation

Each party that wishes to receive encrypted messages must first generate a key. This key will have two parts, a public key and a private key. To generate the key, two sufficiently large prime numbers are selected, *p* and *q*. These two primes are multiplied together to form the composite number *n*. The size of *n* is the strength of the system. If the person wants 2048 bit encryption, then *n* must be a 2048 bit number. The encrypting exponent *e* is now selected at random. This can be any number, as long as it is greater than 1 and less than the Euler totient of *n*, and is coprime to the Euler totient of *n*. The Euler totient of a number n is defined to be the number of positive integers less than *n* that are coprime to *n. *Because *n *is a composite number generated from two prime numbers, it is very easy to calculate it’s Euler totient (p-1)(q-1), if the two prime factors are known. Now the decrypting, *d*, exponent must be calculated. *d *is simply the inverse of *e* modulus the Euler totient of *n*. The modulus operator (mod) is simply the remainder of division. 5 mod 2 is equivalent to 2, because 5/2=4 with 1 remaining. Inverse in modulus arithmetic is trivial to find if the modding interger m (in this case the Euler totient of n) is known. I wrote a Powershell script to solve the inverse mod which can be found here. If the modding integer is not known however, it is quite difficult. Here in lies the security of RSA: for *d* to be calculated, the totient(n) must be known and for the totient(n) to be calculated, the prime factors of n (p and q) must be known. Now the participants can publish their public key {e, n} while making sure to keep their private key {d, p, q} to themselves.

Encryption

A plaintext message M is converted into a series of numbers (usually base 64), and split into blocks the size of *n*. Each block is encrypted separately and combined together to produce the cipher text. Each block is raised to the exponent* e* and modded by *n*. This is generally a quick calculation because *e* is usually small.

Decryption

A cipher text C is broken into blocks the size of *n* and decrypted separately. Each block is raised to the decrypting exponent *d* and modded by *n*. This operation can be done quite quickly by utilizing Chinese Remainder Theorem and Fermat’s Little theorem. CRT states that if the modding integer can be broken into prime factors, then the operation can be broken into two smaller operations and then combined together. This is shown below:

Yes, that looks pretty ugly, but it is significantly faster than computing the exponent without the theorem. Because p and q are prime, d can be reduced quickly to d mod p-1 or d mod q-1, depending on the equation. Also, since C will be larger than p and q, C can be quickly reduced to C mod p, or C mod q. This greatly reduces the time needed to compute the plain text.

Let’s do an example.

Simple, eh?

RSA has it’s drawbacks though. For starters, the cipher text size is rounded up to powers of *n*, meaning the message will get very large quickly. Also, the security of the system is based the difficulty to factor *n* into its two prime factors. This is a very difficult problem if *n* is large. The problem is, how large does *n* have to be? So far, a 1024 bit *n* has been factored in a fairly reasonable amount of time. For a message to be secure, a key size of 2048 bits to 4096 bits is recommended. As technology improves, the key size will have to grow accordingly. Elliptic curve cryptology, which can be adapted to any algorithm, is now being used in place of standard RSA to combat the key size problem. I will introduce elliptic cryptology in a later post.

Later I will present several other crypto-systems and introduce Scytale when it is complete.