Skip lists: A probabilistic alternative to balanced trees. Once again about skiplist... Uneven distribution of requests

An interesting type of data structure for an effective implementation of the ordered dictionary ADT is skip^cnucoK (skip list). This data structure organizes objects in a random order such

such that searching and updating are completed on average in 0(log n) time, where n is the number of dictionary objects. Interestingly, the concept of averaged time complexity used here does not depend on the possible distribution of keys in the input. On the contrary, it is dictated by the use of random number generators in the implementation of the input process to facilitate the process of deciding where to insert a new object. The execution time in this case will be equal to the average of the execution time of entering any random numbers used as input objects.

Random number generation methods are built into most modern computers, as they are widely used in computer games, cryptography, and computer simulations. Some methods, called pseudorandom number generators, generate numbers pseudorandomly according to some law, starting from the so-called seed. Other methods use hardware to extract "truly" random numbers. In either case, the computer has numbers that meet the requirements for random numbers in the given analysis.

The main advantage of using random numbers in the data structure and in creating an algorithm is the simplicity and reliability of the resulting structures and methods. In this way, it is possible to create a simple randomized data structure called a skip list, which provides a logarithmic search time cost similar to that of a binary search algorithm. However, the expected time cost of a skip list will be greater than that of a binary search in a lookup table. On the other hand, when updating a dictionary, a skip list is much faster than lookup tables.

Let the skip-list S for the dictionary D consist of a series of lists (iSo, S\, Si, З/j). Each list S\ stores a set of dictionary objects D by keys in non-decreasing order, plus objects with two special keys, written as “-oo” and “+oo”, where “-oo” means less and “+oo” means more than any possible key that may be in D. In addition, the lists in S meet the following requirements:

The list S 0 contains each object of the dictionary D (plus special objects with the keys “-oo” and “+oo”);

For / = 1, …, h – 1, the list Si contains (in addition to “-oo” and “+oo”) a randomly generated set of objects from the list S t _ ь

The list S h contains only “-oo” and “+oo”.

An example of such a skip list is shown in Fig. 8.9, which visually represents the list S with a list at the base and lists S\, ..., S^ above it. We denote the height of the list S as h.

Rice. 8.9. Skip list example

Intuitively, the lists are organized this way; so that?/+/ contains at least every second object 5/. As will be shown when considering the input method, objects in St+\ are selected randomly from objects in Si in such a way that each selected object 5/ is included in 5/ + \ with probability 1/2. Figuratively speaking, whether or not to place an object from in Si + 1 is determined depending on which side the tossed coin lands on - heads or tails. Thus, we can assume that S\ contains about n/2 objects, S2 - n/4, and, accordingly, Sj - about n/2′ objects. In other words, the height h of list S may be about logn. Although, halving the number of objects from one list to another is not a mandatory requirement in the form of an explicit skip-list property. On the contrary, random selection is more important.

Using the positional abstraction applied to lists and trees, we can think of a skip list as a two-dimensional collection of positions, organized horizontally into levels and vertically into towers. Each level is a list S^ and each tower is a set of positions storing the same object and located on top of each other in the lists. Skip list positions can be traversed using the following operations:

after(/?): returns the position next to the same level; before(/?): returns the position preceding p at the same level; below(/?): returns the position located below p in the same tower; above(/?): Returns the position above p in the same tower.

Let's establish that the above operations must return null if the requested position does not exist. Without going into details, note that a skip list can be easily implemented using a connected structure in such a way that the pass methods require 0(1) time (if there is position p in the list). Such a connected structure is a collection of h doubly linked lists, which are in turn doubly linked lists.

The skip list structure allows simple search algorithms to be applied to dictionaries. In fact, all skip-list search algorithms are based on the rather elegant SkipSearch method, which takes a key to and finds the object in the skip-list S with the largest key (which may be "-oo") less than or equal to k. Let's say we have the desired key to The SkipSearch method sets the position of p to the top-left position in the skip list S. That is, p is set to the position of the special object with the key "-oo" in S^. Then the following is done:

1) if S.below(/?) is null, then the search ends - the largest object in S with a key less than or equal to the desired key k is found at the level below. Otherwise, we go down one level in this tower, setting p S.be \ow(p);

2) from position p we move to the right (forward) until we find ourselves in the extreme right position of the current level, where keu(/?)< к. Такой шаг называется сканированием вперед (scan forward). Указанная позиция существует всегда, поскольку каждый уровень имеет специальные ключи «+оо» и «-оо». И на самом деле, после шага вправо по текущему уровню р может остаться на исходном месте. В любом случае после этого снова повторяется предыдущее действие.

Rice. 8.10. An example of searching in a skip list. The 50 positions examined during the search for the key are highlighted with a dashed designation

Code fragment 8.4 contains a description of the skip list search algorithm. Having such a method, it is easy to implement the findElement(/:) operation. The operation p^-SkipSearch(A;) is performed and the equality key(p)~ k is checked. If ^equals, /?.element is returned. Otherwise, a NO_SUCH_KEY signaling message is returned.

SkipSearch(/:) algorithm:

Input: search key.

Output: The position p in S whose object has the largest key less than or equal to k.

Assume that p is the top-left position in S (consisting of at least two levels), while below (p) * null do

р below(p) (scan down) while key(after(p))< к do

Let p after(p) (scan forward) return p

Code snippet 8.4. Search algorithm in skip list S

It turns out that the estimated execution time of the SkipSearch algorithm is 0(log n). Before justifying this, we present the implementation of skip list update methods

SKIP is a system for monitoring the execution of orders based on MS SharePoint, which allows you to fully automate the process of working with orders. This is a thoughtful, out-of-the-box solution for organizing work with errands. It is suitable for work both in large and geographically distributed companies, and for medium-sized companies due to the possibility of providing the most flexible configuration of all modules.

The SKIP system is based on the Microsoft SharePoint platform, which automatically means it can be integrated with Microsoft products, including Microsoft Office.

System functionality

The SKIP system is a “boxed” solution and in this version contains a basic set of functionality necessary to automate work with orders:

  • Assignment, execution, control of instructions;
  • Tracking the status of order execution;
  • Ability to create nested (“child”) orders.

List of instructions with color indication

At the same time, the presented functionality is implemented in such a way as to provide the user with the widest possible possibilities for working with the system:

  • Cataloging of orders (one order can be located in different folders);
  • Filtering lists of orders;
  • Export lists of orders to MS Excel;
  • Performance discipline reports;
  • Color indication of orders depending on the execution time and status of the order;
  • Possibility of attaching an arbitrary number of files of any format to the Order Card;
  • Integration with Outlook calendars;
  • Customizable notifications about the assignment and progress of work with the order;
  • System for replacing employees during vacations or business trips;
  • Creating periodic orders (or orders with a schedule) for events that have a certain period (meetings, appointments, etc.);
  • Displaying deadlines for completing orders on a Gantt chart;
  • And so on

List of tasks with Gantt chart

There is a fairly widespread opinion that completing various test tasks helps to very quickly raise your professional level. I myself sometimes like to dig up some tricky test question and solve it, in order to be constantly on my toes, as they say. Once I was completing a competitive assignment for an internship at a company, the problem seemed funny and interesting to me, here is its short text:
Imagine that your whining colleague has come to talk about his difficult task - he needs not just to sort a set of integers in ascending order, but to output all the elements of the ordered set from L to R inclusive!
You stated that this is an elementary task and that it will take you ten minutes to write a solution in C#. Well, or an hour. Or two. Or Alyonka chocolate bar

It is assumed that duplicates are allowed in the set, and the number of elements will not be more than 10^6.

There are several comments on the evaluation of the solution:

Your code will be evaluated and tested by three programmers:
  • Bill will run your solution on tests no larger than 10Kb.
  • In Stephen's tests, the number of requests will be no more than 10^5, while the number of add requests will be no more than 100.
  • In Mark's tests, the number of requests will be no more than 10^5.
The solution can be very interesting, so I thought it necessary to describe it.

Solution

Let us have an abstract storage.

Let's denote Add(e) - adding an element to the store, and Range(l, r) - taking a slice from l to r elements.

A trivial storage option could be like this:

  • The basis of the storage will be a dynamic array ordered in ascending order.
  • With each insertion, a binary search finds the position where the element must be inserted.
  • When requesting Range(l, r), we will simply take a slice of the array from l to r.
Let's evaluate the complexity of the approach based on a dynamic array.

C Range(l, r) - taking a slice can be estimated as O(r - l).

C Add(e) - insertion in the worst case will work in O(n), where n is the number of elements. At n ~ 10^6, insertion is the bottleneck. Below in the article an improved storage option will be proposed.

An example of the source code can be viewed.

Skiplist

Skiplist is a randomized alternative to search trees that is based on multiple linked lists. It was invented by William Pugh in 1989. Search, insert, and delete operations are performed in logarithmically random time.

This data structure is not widely known (by the way, quite a review about it is written about it on Habré), although it has good asymptotic estimates. Out of curiosity, I wanted to implement it, especially since I had a suitable task.

Let's say we have a sorted singly linked list:

In the worst case, the search is performed in O(n). How can it be speeded up?

In one of the video lectures that I watched while working on the problem, there was a wonderful example about express lines in New York:

  • Express lines connect some stations
  • Regular lines connect all stations
  • There are regular lines between common high-speed lines stations
The idea is this: we can add a certain number of levels with a certain number of nodes in them, and the nodes in the levels should be evenly distributed. Let's consider an example when each of the overlying levels contains half of the elements from the underlying level:


The example shows an ideal SkipList, in reality everything looks similar, but a little different :)

Search

This is how the search happens. Let's say we're looking for the 72nd element:

Insert

With the insert, everything is a little more complicated. In order to insert an element, we need to understand where to insert it in the lowest list and at the same time push it to a certain number of higher levels. But how many levels should each specific element be pushed through?

It is proposed to solve this this way: when inserting, we add a new element to the lowest level and start tossing a coin, until it falls out, we push the element to the next higher level.
Let's try to insert element - 67. First, let's find where in the underlying list it needs to be inserted:

I assumed that the coin fell out twice in a row. This means you need to push the element up two levels:

Access by index

To access by index, it is proposed to do the following: mark each transition with the number of nodes that lie under it:

Once we get access to the i-th element (by the way, we get it in O(ln n)), making a slice does not seem difficult.

Let it be necessary to find Range(5, 7). First we get the element at index five:


And now Range(5, 7):

About implementation

It seems like a natural implementation for the SkipList node to look like this:
SkipListNode(
int Element;
SkipListNodeNext;
int Width;
}
Actually, that's what was done.

But in C#, the array also stores its length, and I wanted to do this in as little memory as possible (as it turned out, in the conditions of the problem everything was not evaluated so strictly). At the same time, I wanted to make sure that the implementation of SkipList and the extended RB Tree took up approximately the same amount of memory.

The answer to how to reduce memory consumption was unexpectedly found upon closer inspection from the java.util.concurrent package.

2D skiplist

Let there be a singly linked list of all elements in one dimension. The second will contain “express lines” for transitions with links to the lower level.
ListNode(
int Element;
ListNodeNext;
}

Lane (
Lane Right;
Lane Down;
ListNode Node;
}

Dishonest Coin

You can also use a “dishonest” coin to reduce memory consumption: reduce the likelihood of pushing an element to the next level. William Pugh's article looked at a cross-section of several push probability values. When considering the values ​​of ½ and ¼, in practice, approximately the same search time was obtained while reducing memory consumption.

A little about random number generation

Digging into the guts of ConcurrentSkipListMap, I noticed that random numbers are generated as follows:
int randomLevel() (
int x = randomSeed;
x ^= x<< 13;
x ^= x >>> 17;
randomSeed = x ^= x<< 5;
if ((x & 0x80000001) != 0)
return 0;
int level = 1;
while (((x >>>= 1) & 1) != 0) ++level;
return level;
}
You can read more about generating pseudorandom numbers using XOR in this article. I didn’t notice any particular increase in speed, so I didn’t use it.

You can look at the source of the resulting repository.

All together can be downloaded from googlecode.com (Pagination project).

Tests

Three types of storage were used:
  1. ArrayBased (dynamic array)
  2. SkipListBased (SkipList with parameter ¼)
  3. RBTreeBased (red-black tree: implementation of a friend of mine who did a similar task).
Three types of tests were carried out for the insertion of 10^6 elements:
  1. Elements sorted in ascending order
  2. Items sorted in descending order
  3. Random elements
Tests were carried out on a machine with i5, 8gb ram, ssd and Windows 7 x64.
Results:
Array RBTree SkipList
Random 127033 ms 1020 ms 1737ms
Ordered 108ms 457ms 536ms
Ordered by descending 256337ms 358ms 407ms
Quite expected results. You can see that inserting into an array, when an element is inserted somewhere other than at the end, is the slowest. At the same time, SkipList is slower than RBTree.

Measurements were also taken: how much each storage takes up in memory with 10^6 elements inserted into it. We used a studio profiler; for simplicity, we ran the following code:

var storage = ...
for (var i = 0; i< 1000000; ++i)
storage.Add(i);
Results:
Array RBTree SkipList
Total bytes allocated 8,389,066 bytes 24,000,060 bytes 23,985,598 bytes
There are also quite expected results: storage on a dynamic array took up the least amount of memory, and SkipList and RBTree took approximately the same amount.

Happy ending with “Alenka”

A whining colleague, according to the terms of the task, bet me a chocolate bar. My solution was accepted with the maximum score. I hope this article will be useful to someone. If you have any questions, I will be glad to answer.

P.S.: I had an internship at SKB Kontur. This is to avoid answering the same questions =)

7 answers

Since you mentioned a List that is both indexable (I'm assuming you want faster retrieval) and to allow duplicates, I would suggest you use a custom set with a LinkedList or ArrayList.

You need to have a base set, like a HashSet, and keep adding values ​​to it. If you encounter a duplicate, the value of this set should point to the list. So, you will have both quick search and, of course, you will save your objects as a psuedo collection.

This should give you good extraction efficiency. Ideally, if your Keys are not duplicated, you will achieve O(1) in search speed.

This answer is 3 years late, but I hope it will be useful to those who want a list of Java skips from now on :)

This solution allows for duplication as you requested. I'm following roughly the guide here http://igoro.com/archive/skip-lists-are-fascinating, so the difficulties are similar to that delete costs O(nlogn) - how I did it "Don't try to use doubly connected nodes, I'm guessing this will result in delete to O(logn).

The code contains: an interface, a skip list that implements the interface, and a node class. It is also general.

You can customize the parameter LEVELS for performance, but remember the tradeoff between space and time.

> search(T data); ) public class SkipList > implements SkipableList ( public static void main(String args) ( SkipList sl = new SkipList<>(); int data = (4,2,7,0,9,1,3,7,3,4,5,6,0,2,8);<>for (int i: data) ( sl.insert(i); ) sl.print();<>sl.search(4);< LEVELS; i++) { if (rand.nextInt((int) Math.pow(2, i)) == 0) { //insert with prob = 1/(2^i) insert(SkipNode, i); } } } @Override public boolean delete(T target) { System.out.println("Deleting " + target.toString()); SkipNodesl.delete(9);< LEVELS; i++) { head.refreshAfterDelete(i); } System.out.println(); return true; } @Override public SkipNodesl.print();< LEVELS; i++) { head.print(i); } System.out.println(); } private void insert(SkipNode sl.insert(69); > sl.print(); current = this.getNext(level); while (current != null && current.getNext(level) != null) ( if (current.getNext(level).data == null) ( SkipNode < 1) { if (current.data.equals(data)) { result = current; break; } current = current.getNext(level); } return result; } void insert(SkipNoderesult = null; SkipNode< 1) { this.setNext(SkipNode, level); SkipNode.setNext(current, level); return; } while (current.getNext(level) != null && current.data.compareTo(SkipNode.data) < 1 && current.getNext(level).data.compareTo(SkipNode.data) < 1) { current = current.getNext(level); } SkipNodeSkipNode, int level) ( SkipNode current = this.getNext(level);

if (current == null) ( this.setNext(SkipNode, level); return; ) if (SkipNode.data.compareTo(current.data) successor = current.getNext(level);:

current.setNext(SkipNode, level); SkipNode.setNext(successor, level);) void print(int level) ( System.out.print("level " + level + ": ["); int length = 0; SkipNode current = this.getNext(level); while (current != null) ( length++; System.out.print(current.data.toString() + " "); current = current.getNext(level); ) System.out.println("], length: " + length);

) )

You can use the below code to make your own

basic skipper 1) Do

Public class MyClass ( public static void main(String args) ( Skiplist skiplist=new Skiplist(); Node n1=new Node(); Node n2=new Node(); Node n3=new Node(); Node n4=new Node (); Node n5=new Node(); n1.setData(2); n4.setData(4); (5); skiplist.insert(n1); skiplist.insert(n3); skiplist.insert(n5); n6); /*print all nodes*/ skiplist.display(); System.out.println(); /* print only fast lane node*/ skiplist.displayFast(); one_next; //contain pointer to next node private Node two_next; //pointer to node after the very next node public int getData() ( return data; ) public void setData(int data) ( this.data = data; ) public Node getOne_next() ( return one_next; ) public void setOne_next(Node one_next) ( this.one_next = one_next; ) public Node getTwo_next() ( return two_next; ) public void setTwo_next(Node two_next) ( this.two_next = two_next;

) ) class Skiplist( Node start; //start pointer to skip list Node head; Node temp_next; //pointer to store last used fast lane node Node end; //end of skip list int length; public Skiplist())( start= new Node(); end=new Node(); length=0; temp_next=start; ) public void insert(Node node)( /*if skip list is empty */ if(length==0)( start.setOne_next( node); node.setOne_next(end); temp_next.setTwo_next(end); head=start; length++; prev=temp; temp=temp.getOne_next(); ) /*add a fast lane pointer for even no of nodes*/ if(length%2==0)( prev.setOne_next(node); node.setOne_next(end) ; temp_next.setTwo_next(node); temp_next=node; node.setTwo_next(end); /*odd no of node will not contain fast lane pointer*/ else( prev.setOne_next(node); node.setOne_next(end); ) ) ) public void display())( System.out.println("--Simple Traversal--"); Node temp=start.getOne_next(); while(temp != end)( System.out.print(temp. getData()+"=>");

temp=temp.getOne_next();

1 = > 2 = > 3 = > 4 = > 5 = > 6 = >

) ) public void displayFast())( System.out.println("--Fast Lane Traversal--"); Node temp=start.getTwo_next(); while(temp !=end)( System.out.print(temp. getData()+"==>"); temp=temp.getTwo_next();

2 == > 4 == > 6 == >

Conclusion:

New ConcurrentSkipListSet<>(new ExampleComparator()); public class ExampleComparator implements Comparator (//your impl)

You can create a comparator that will make your SkipListSet behave like a regular list.

I don't claim that this is my own implementation. I just can't remember where I found it. If you know, let me know and I'll update. This works well for me:

Public class SkipList > implements Iterable (Node _head = new Node<>(null, 33); private final Random rand = new Random(); private int _levels = 1; private AtomicInteger size = new AtomicInteger(0); ///

/// Inserts a value into the skip list. /// public void insert(T value) ( ​​// Determine the level of the new node. Generate a random number R. The // number of // 1-bits before we encounter the first 0-bit is the level of the node. / / Since R is // 32-bit, the level can be at most 32. int level = 0; size.incrementAndGet(); for (int R = rand.nextInt(); (R & 1) == 1; R >>= 1) ( level++; if (level == _levels) ( _levels++; break; ) ) // Insert this node into the skip list Node newNode = new Node<>(value, level + 1); Node<= level) { newNode.next[i] = cur.next[i]; cur.next[i] = newNode; } } } /// cur = _head; for (int i = _levels - 1; i >= 0; i--) ( for (; cur.next[i] != null; cur = cur.next[i]) ( if (cur.next[i] .getValue().compareTo(value) > 0) break ) if (i /// Returns whether a particular value already exists in the skip list /// public boolean contains(T value) ( ​​Node cur = _head; for (int i = _levels - 1; i >= 0; i--) ( for (; cur.next[i] != null; cur = cur.next[i]) ( if (cur.next[i] .getValue().compareTo(value) > 0) break; if (cur.next[i].getValue().compareTo(value) == 0) return true; ) /// iterator() ( return new SkipListIterator(this, 0); ) public int size() ( return size.get(); ) public Double toArray() ( Double a = new Double; int i = 0; for (T t: this) ( a[i] = (Double) t; i++; ) return a) ) class Node > ( public Node next; public N value; @SuppressWarnings("unchecked") public Node(N value, int level) ( this.value = value; next = new Node; ) public N getValue() ( return value; ) public Node getNext() ( return next; ) public Node getNext(int level) ( return next; ) public void setNext(Node next) ( this.next = next; ) ) class SkipListIterator > implements Iterator ( SkipList list; Node current;

int level; public SkipListIterator(SkipList

list, int level) ( this.list = list; this.current = list._head; this.level = level; ) public boolean hasNext() ( return current.getNext(level) != null; ) public E next() ( current = current.getNext(level); return current.getValue(); ) public void remove() throws UnsupportedOperationException ( throw new UnsupportedOperationException(); ) ) Fixed a bug in the implementation provided by @PoweredByRice. It threw an NPE for cases where the remote node was the first node. Other updates include renamed variable names and reverse printing of skip list order. Import java.util.Random; interface SkipableList > ( int LEVELS = 5; boolean delete(T target); void print(); void insert(T data); SkipNode sl.print(); search(T data); ) class SkipNode > ( N data; @SuppressWarnings("unchecked") SkipNode ) new SkipNode; SkipNode(N data) ( this.data = data; ) void refreshAfterDelete(int level) ( SkipNode current = this; while (current != null && current.getNext(level) != null) ( if (current.getNext(level).data == null) ( SkipNode while (current != null && current.getNext(level) != null) ( if (current.getNext(level).data == null) ( SkipNode successor = current.getNext(level).getNext(level);< 1) { if (current.data.equals(data)) { result = current; break; } current = current.getNext(level); } return result; } void insert(SkipNodecurrent.setNext(successor, level); return;< 1) { this.setNext(skipNode, level); skipNode.setNext(current, level); return; } while (current.getNext(level) != null && current.data.compareTo(skipNode.data) < 1 && current.getNext(level).data.compareTo(skipNode.data) < 1) { current = current.getNext(level); } SkipNodesuccessor = current.getNext(level); current.setNext(skipNode, level); > implements SkipableList skipNode.setNext(successor, level); int data = (4,2,7,0,9,1,3,7,3,4,5,6,0,2,8);<>) void print(int level) ( System.out.print("level " + level + ": [ "); int length = 0; SkipNode current = this.getNext(level);<>sl.search(4);< LEVELS; i++) { if (rand.nextInt((int) Math.pow(2, i)) == 0) { //insert with prob = 1/(2^i) insert(skipNode, i); } } } @Override public boolean delete(T target) { System.out.println("Deleting " + target); SkipNodewhile (current != null) ( length++; System.out.print(current.data + " "); current = current.getNext(level); ) System.out.println("], length: " + length);< LEVELS; i++) { head.refreshAfterDelete(i); } System.out.println("deleted..."); return true; } @Override public SkipNode) ) public class SkipList ( private final SkipNode (null); private final Random rand = new Random(); sl = new SkipList<>@Override public void insert(T data) ( SkipNode