The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented
storage context—in particular, filesystems. This is primarily because unlike
binary search trees, B+ trees have very high fanout (typically on the order of
100 or more), which reduces the number of I/O operations required to find an
element in the tree.
The NTFS, ReiserFS, NSS, XFS, and JFS filesystems all use this type of tree for metadata
indexing. Relational database management systems such as IBM DB2, Informix,Microsoft
SQL Server and others support this type of tree for table indices. Key-value
database management systems such as CouchDB support this type of tree for data
access.
Roger Knapp has a very carefully designed 100% managed code implementation that he
calls CSharpTest.Net.BPlusTree. You can check out Roger's offerings here http://csharptest.net/. After considering several offerings including redis (no 100% managed code version)
and RaptorDb, I settled on CsharpTest.Net.BplusTree. I really liked RaptorDb,
but in some tests I got data corruption. Then I read Knapp's review post
on it, which confirmed my suspicions that it's just not ready for prime time.
CSharpTest.Net.BPlusTree's real strengths are that it fully supports multithreading,
and has superior select capability - able to perform well in excess of 510,000
operations per second, and over 1,157,000 operations using 4 concurrent threads.
Most importantly, it is truly threadsafe, no matter what you throw at it, and
has plenty of configuration options.
Among other recent additions, it features:
Collections.LListNode<T> - a doubly linked list implementation that can support asynchronous iteration.
Collections.SynchronizedDictionary/SynchronizedList to support synchronization of a list/dictionary given a locking strategy from the
Synchronization namespace.
IO.ClampedStream to provide an IO stream aggregation for a subset of the provided stream.
IO.Crc32 to provide calculation of a CRC32 value from bytes or strings.
IO.FileStreamFactory - an IFactory<Stream> producer of streams for a given file.
IO.FragmentedFile - an underpinning of the B+Tree implementation that provides sub-allocations within
a single file.
IO.SharedMemoryStream - a block allocating memory stream that can be simultaneously used by multiple threads
at the same time.
IO.StreamCache - a pool of open file streams that a thread can open and close without the overhead
of actually opening or closing the underlying file streams.
Interfaces.IFactory<T> provides a simple generic factory interface for supplying instances of type T.
Interfaces.ITransactable provides a simple transaction interface.
IpcChannel.IpcEventChannel provides a cross domain/process connectionless communication built on events.
Serialization.ISerializer<T> provides a simple interface for an object that can read and write an instance of
type T to and from a stream.
Serialization.PrimitiveSerializer provides basic implementation of the ISerializer<T> interface for the primitive
types.
Serialization.VariantNumberSerializer provides a protobuffer-like encoding for numeric types.
Threading.WaitAndContinueList - a work list based on WaitHandles and resulting actions so that multiple activities
can be performed on a single thread.
Threading.WaitAndContinueWorker - a single worker thread that processes a WaitAndContinueList.
WorkQueue and WorkQueue<T> provide simple thread pool processing of tasks that the caller can wait for completion
on.
Utils.ObjectKeepAlive - a simple object to track references to other instances to avoid garbage collection.
Utils.WeakReference<T> - a derivation of WeakReference that is type-safe.
The big advantage of using BPlusTree is that it is a custom key-value Dictionary<K,V>
backed on the disk drive. You can store anything you want, as long as you can
write a Serializer for it. And that's easy. Here's one I wrote to serialize
a Person Class that sports an Addresses property of type IList<Address>:
public class PersonSerializer : ISerializer<Person>
{
public void WriteTo(Person value, System.IO.Stream stream)
{
BinaryFormatter bf = new BinaryFormatter();
bf.Serialize(stream, value);
}
public Person ReadFrom(System.IO.Stream stream)
{
BinaryFormatter bf = new BinaryFormatter();
return (Person) bf.Deserialize(stream);
}
}
Of course, you do not have to use the BinaryFormatter, which is not very lightweight.
In order to test out CSharpTest.Net.BPlusTree, I converted the entire source solution
(There's a NuGet package now) to .NET Framework 4. Then I removed everything
except the BPlusTree and the Library projects, and added my own Windows Forms
BPlusTreeSerialize test app. I'll reproduce the single form's code class
which really contains all of the logic:
namespace BplusTreeSerialize
{
public partial class Form1 : Form
{
BPlusTree<int, CSharpTest.Net.Serialization.Person>.Options options
= new BPlusTree<int, CSharpTest.Net.Serialization.Person>.Options(PrimitiveSerializer.Int32,
new PersonSerializer())
{
CreateFile = CreatePolicy.IfNeeded,FileName = @"C:\Temp\Storage.dat"
};
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
Stopwatch sw = new Stopwatch();
int i = 0;
sw.Start();
// add 10000 persons, each with 2 addresses
using (BPlusTree<int, CSharpTest.Net.Serialization.Person> map = new BPlusTree<int, CSharpTest.Net.Serialization.Person>(options))
{
for (i = 0; i < 10000; i++)
{
List<Address> addresses = new List<Address>();
Address addr = new Address(i, i.ToString() + " Raptor St.", "Raptor", "ME", "91101", "Personal");
Address addr2 = new Address(i, i.ToString() + " Raptor St.", "Raptor", "ME", "91101", "Business");
addresses.Add(addr);
addresses.Add(addr2);
Person p = new Person(i.ToString(), "Mr", "Joe", "Blow" + i.ToString(), "joe@blow.com",
"123-123-1234", addresses);
map.Add(i, p);
if (i % 1000 == 0)
{
label5.Text = i.ToString();
Application.DoEvents();
}
}
}
sw.Stop();
var t = sw.ElapsedMilliseconds.ToString();
label4.Text = "10,000 Items stored and Index Saved in " +t + " ms.";
}
private void button2_Click(object sender, EventArgs e)
{
using (BPlusTree<int, CSharpTest.Net.Serialization.Person> map = new BPlusTree<int, CSharpTest.Net.Serialization.Person>(options))
{
var kvp = map.FirstOrDefault(x => x.Key == int.Parse( this.txtItem.Text));
CSharpTest.Net.Serialization.Person p2 = kvp.Value;
List<Person> lp = new List<Person>();
lp.Add(p2);
this.dataGridView1.DataSource = lp;
this.dataGridView2.DataSource = (List<Address>) p2.Addresses;
}
}
}
}
The only other additions to create this were the Person and Address classes, and
the PersonSerializer that needs to be plugged into the initialization method
of the tree.
Roger has many other features in his full source tree including IPC libraries; I
encourage developers who are interested in NoSQL solutions to examine it carefully.
You can download the Visual Studio 2010 solution for this test app here. The solution includes the Windows Forms test app as well as a complete ASP.NET
app showing how to use this arrangement for the web.