BPlusTree Multithreaded Key-Value NoSQL Store

A B+ tree or B plus tree is a type of tree which represents sorted data in a way that allows for efficient insertion, retrieval and removal of records, each of which is identified by a key. It is a dynamic, multilevel index, with maximum and minimum bounds on the number of keys in each index segment (usually called a "block" or "node"). In a B+ tree, in contrast to a B-tree, all records are stored at the leaf level of the tree; only keys are stored in interior nodes.

The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context—in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.

The NTFS, ReiserFS, NSS, XFS, and JFS filesystems all use this type of tree for metadata indexing. Relational database management systems such as IBM DB2, Informix,Microsoft SQL Server and others support this type of tree for table indices. Key-value database management systems such as CouchDB support this type of tree for data access.

Roger Knapp has a very carefully designed 100% managed code implementation that he calls CSharpTest.Net.BPlusTree. You can check out Roger's offerings here  http://csharptest.net/. After considering several offerings including redis (no 100% managed code version) and RaptorDb, I settled on CsharpTest.Net.BplusTree. I really liked RaptorDb, but in some tests I got data corruption. Then I read Knapp's review post on it, which confirmed my suspicions that it's just not ready for prime time.

CSharpTest.Net.BPlusTree's real strengths are that it fully supports multithreading, and has superior select capability - able to perform well in excess of 510,000 operations per second, and over 1,157,000 operations using 4 concurrent threads. Most importantly, it is truly threadsafe, no matter what you throw at it, and has plenty of configuration options.

Among other recent additions, it features:

Collections.LListNode<T> - a doubly linked list implementation that can support asynchronous iteration.
Collections.SynchronizedDictionary/SynchronizedList to support synchronization of a list/dictionary given a locking strategy from the Synchronization namespace.
IO.ClampedStream to provide an IO stream aggregation for a subset of the provided stream.
IO.Crc32 to provide calculation of a CRC32 value from bytes or strings.
IO.FileStreamFactory - an IFactory<Stream> producer of streams for a given file.
IO.FragmentedFile -  an underpinning of the B+Tree implementation that provides sub-allocations within a single file.
IO.SharedMemoryStream - a block allocating memory stream that can be simultaneously used by multiple threads at the same time.
IO.StreamCache - a pool of open file streams that a thread can open and close without the overhead of actually opening or closing the underlying file streams.
Interfaces.IFactory<T> provides a simple generic factory interface for supplying instances of type T.
Interfaces.ITransactable provides a simple transaction interface.
IpcChannel.IpcEventChannel provides a cross domain/process connectionless communication built on events.
Serialization.ISerializer<T> provides a simple interface for an object that can read and write an instance of type T to and from a stream.
Serialization.PrimitiveSerializer provides basic implementation of the ISerializer<T> interface for the primitive types.
Serialization.VariantNumberSerializer provides a protobuffer-like encoding for numeric types.
Threading.WaitAndContinueList - a work list based on WaitHandles and resulting actions so that multiple activities can be performed on a single thread.
Threading.WaitAndContinueWorker - a single worker thread that processes a WaitAndContinueList.
WorkQueue and WorkQueue<T> provide simple thread pool processing of tasks that the caller can wait for completion on.
Utils.ObjectKeepAlive - a simple object to track references to other instances to avoid garbage collection.
Utils.WeakReference<T> - a derivation of WeakReference that is type-safe.

The big advantage of using BPlusTree is that it is a custom key-value Dictionary<K,V> backed on the disk drive. You can store anything you want, as long as you can write a Serializer for it. And that's easy. Here's one I wrote to serialize a Person Class that sports an Addresses property of type IList<Address>:

public class PersonSerializer : ISerializer<Person>
       public void WriteTo(Person value, System.IO.Stream stream)
           BinaryFormatter bf = new BinaryFormatter();
           bf.Serialize(stream, value);

       public Person ReadFrom(System.IO.Stream stream)
           BinaryFormatter bf = new BinaryFormatter();
           return (Person) bf.Deserialize(stream);

Of course, you do not have to use the BinaryFormatter, which is not very lightweight.
  In order to test out CSharpTest.Net.BPlusTree, I converted the entire source solution (There's a NuGet package now) to .NET Framework 4. Then I removed everything except the BPlusTree and the Library projects, and added my own Windows Forms BPlusTreeSerialize test app.  I'll reproduce the single form's code class which really contains all of the logic:

namespace BplusTreeSerialize
    public partial class Form1 : Form
        BPlusTree<int, CSharpTest.Net.Serialization.Person>.Options options = new BPlusTree<int, CSharpTest.Net.Serialization.Person>.Options(PrimitiveSerializer.Int32, new PersonSerializer())
            CreateFile = CreatePolicy.IfNeeded,FileName = @"C:\Temp\Storage.dat"

         public Form1()

         private void button1_Click(object sender, EventArgs e)
            Stopwatch sw = new Stopwatch();
            int i = 0;
            // add 10000 persons, each with 2 addresses
            using (BPlusTree<int, CSharpTest.Net.Serialization.Person> map = new BPlusTree<int, CSharpTest.Net.Serialization.Person>(options))
                  for (i = 0; i < 10000; i++)
                   List<Address> addresses = new List<Address>();
                    Address addr = new Address(i, i.ToString() + " Raptor St.", "Raptor", "ME", "91101", "Personal");
                    Address addr2 = new Address(i, i.ToString() + " Raptor St.", "Raptor", "ME", "91101", "Business");
                    Person p = new Person(i.ToString(), "Mr", "Joe", "Blow" + i.ToString(), "joe@blow.com",
                                             "123-123-1234", addresses);
                     map.Add(i, p);
                     if (i % 1000 == 0)
                        label5.Text = i.ToString();
            var t = sw.ElapsedMilliseconds.ToString();
            label4.Text = "10,000 Items stored and Index Saved in " +t + " ms.";

         private void button2_Click(object sender, EventArgs e)
             using (BPlusTree<int, CSharpTest.Net.Serialization.Person> map = new BPlusTree<int, CSharpTest.Net.Serialization.Person>(options))
                var kvp = map.FirstOrDefault(x => x.Key == int.Parse( this.txtItem.Text));
                CSharpTest.Net.Serialization.Person p2 = kvp.Value;
               List<Person> lp = new List<Person>();
                 this.dataGridView1.DataSource = lp;
                 this.dataGridView2.DataSource = (List<Address>) p2.Addresses;

The only other additions to create this were the Person and Address classes, and the PersonSerializer that needs to be plugged into the initialization method of the tree.

Roger has many other features in his full source tree including IPC libraries; I encourage developers who are interested in NoSQL solutions to examine it carefully.

You can download the Visual Studio 2010 solution for this test app here. The solution includes the Windows Forms test app as well as a complete ASP.NET app showing how to use this arrangement for the web.

By Peter Bromberg   Popularity  (4945 Views)