The Recycle Bin

A repository of comments, code, and opinions.

Archive for March 2008

Big-Oh

with 2 comments

The other day I was reading a certain social news aggregator and I came across a link to a blog about a new functional language being built off of .Net, called F#[I did it with .Net: Why I love F#].  I haven’t looked into F# very much so I can’t say much about it.  I can tell you that it is a functional language being developed at Microsoft Research based of the OCaml language and using the .Net 2.0 runtime.  What was more interesting was the conversation that the blog submission generated.  The post was about the time complexity of a certain function in F# compared to a more tradition imperative language.  The discussion brought to light how confusing time-complexity measurement can be for people, since it seemed most there didn’t know what they were talking about.  So gather round folks for a riveting discussion about algorithm performance rating.

When computer scientists say that a certain algorithm is fast they are usually talking about its asymptotic time complexity.  That is to say, they are measuring the number of steps an algorithm takes in respect to its input size.  To simplify things, we imagine a computer where each basic step takes a constant amount of time.  Now we can judge an algorithm based on how many times a basic step is performed.  What we are most interested in is the upper bound of the growth rate, in other words, the amount of steps it takes in the worst case scenario.  This is often written in Big-Oh notation (ex. O(n), to be read as “order n”).  Here is a graph that shows the growth rate of the most common measurements:

O(1), O(logN), O(N), O(N^2), O(2^N)

time complexity

What this graph is showing is the time it takes an algorithm to complete as the input size increases.  As you can see, O(2^N) grows very rapidly.  This is an example of a very very slow algorithm.  Linear time, O(N), is considered acceptable for most problem sets, and algorithm researchers are thrilled if they can get it any lower.

I’m going to try to explain this with a real example and some code.  To do this I need to introduce a data structure known as a linked list.  The point of a linked list is to hold a collection of similar pieces of data in an efficient way.  The data structure consists of a series of nodes that has a value and pointer to the next node in the list.  Here is a node object in c#:

public class Node
    {
        public int value;
        public Node next;

        public Node()
        {
            next = null;
        }
    }

To create a list of these nodes, all you need to do is make a pointer to the first node, or the HEAD of the list.  This way the link list can be scattered throughout memory, and nodes can be added and removed very efficiently.  Here’s a picture to demonstrate:

link

If we start at the HEAD node and follow all of the pointers, then the list would be:

1 7 10 -5 12

Like any data structure, linked-lists are good for certain tasks and bad for others.  For example, to find a particular node you would have to start at the HEAD and walk through the list until it is found.  In a worst case scenario you would have to walk through the entire list which would take n steps (n being the number of nodes).  This operation would have a time complexity of O(n).  Here’s is some typical find functions in c#

    class LinkedList
    {
        Node head;

        int getIndex(int value)
        {
            int index = 0;
            Node temp = head.next;
            while (temp != null)
            {
                if (temp.value == value)
                    return index;
                temp = temp.next;
                index++;
            }
            return -1;
        }

        Node getNode(int index)
        {
            int counter = 0;
            Node temp = head.next;
            while (temp != null)
            {
                if (counter == index)
                    return temp;
                temp = temp.next;
                counter++;
            }
            return null;
        }
    }

As you can see, both of these functions require some looping. In fact, they will loop until either the requested node is found, or the end of the list is reached.  Obviously, this is better than an O(2^N) algorithm, but it still isn’t all that fast.  To contrast, adding and removing elements are two operations that linked lists perform very quickly.  To add a node we simply adjust what the HEAD node is pointing to, and to delete a node we take the value and pointer from the next node in the list.  This is kind of hard to imagine, so here is some diagrams:

First, adding a node:

add

Second, deleting a node:

delete

The difference between these functions and the previous two is that they don’t have to loop through the list when they run.  The worst case in either function is that it takes one step to perform, so they get a time complexity of O(1), otherwise known as constant time.  Here is some code:

class LinkedList
    {
        Node head;

        void Add(int value)
        {
            Node newNode = new Node();
            newNode.next = head.next;
            newNode.value = value;
        }

        void Remove(Node n)
        {
            if (n.next == null) n = null;
            else
            {
                n.value = n.next.value;
                n.next = n.next.next;
            }
        }
    }

Is it obvious now how deleting and adding a node is faster than finding one?  How would you rate a function that removes a particular node at a specified index?  More code:

void removeAt(int index)
   {
       Node toRemove = this.getNode(index);
       this.Remove(toRemove);
   }

Code reuse makes this functions easy!  The first line calls to the function getNode(index).  We’ve already learned that finding a node at a particular index is an O(n) problem.  The second line removes that node, which we just said is a O(1) problem.  Thus, this function has the time complexity of O(n) + O(1).  When adding time complexities together, we only care about the largest complexity, and throw out the smaller one.  This is because the rating is supposed to indicate how quickly the function grows and smaller complexities don’t add anything to the upper bound.  That being said, the time complexity of the removeAt(index) function is O(n).  Get it?

Now these are just the basics of algorithm analysis.  This was intended to merely be an introduction to the are, and quick explanation of how we rate algorithms for those who don’t already know.  There is an entire field of research devoted to optimizing current solution sets.  Computer scientists strive to find the perfect algorithm and data structure for a problem set and without optimal data structures and algorithms you would never be able to find an email in a 5GB inbox, or have an object in Halo obey the laws of physics.

Written by Nathan

March 11, 2008 at 12:14 am

Posted in C#, programming

Tagged with , , ,

Follow

Get every new post delivered to your Inbox.