In my previous articles, I have introduced RavenDB, the document database. We have performed basic CRUD operations and seen how data can be persisted to RavenDB. Today we will go one step ahead and see how we can Index data in RavenDB.
As a pre-requisite, it is a good idea to go through the introductory posts first.
Introducing ‘Chirpy - the Twitter Clone'
Before we dive into RavenDB indexes, let’s set the context for ourselves. We all know that Twitter searches can at times be frustrating! Let us go ahead and build a clone for ourselves with powerful yet simple search capabilities . Since we need performance and scale-out capability to handle our ultra popular app, we choose RavenDB over other available databases.
Update: Check out ChirpyHQ in the next part of this article RavenHQ and ASP.NET MVC – Moving your Data to the Cloud using AppHarbor
As of now, we have a very basic functionality that involves sending out a ‘chirp’ message and adding hashtags to keywords. We want to create a ‘trending’ topic by monitoring hashtag counts as of now. We should also be able to search for words or phrases in old Chirps.
For future enhancement we want to be able to Reply to a chirp and represent ‘chirp’ replies hierarchically. Finally we would like to search Chirps via Hashtag.
The complete source code is available here. Feel free to download and run it as you walk through the article.
The ‘Chirp’ Document Structure
With the above requirement in mind, we build our document structure as follows:
{
"Value": "Introducing #Chirpy - the new #microblogging platform",
"UserId": 0,
"InReplyToId": -1,
"ChirpReplies": null,
"Tags": [
"#Chirpy",
"#microblogging"
]
}
This is a JSON representation of an object which is basically a set of <key,value> pairs. These are defined as follows:
- Value: The text of the chirp as is.
- UserId: The numeric identifier for the User who sent out the chirp. The actual User is a separate document and is reserved for future use.
- InReplyToId: The numeric Id of the Chirp to which the current Chirp is a reply
- ChirpReplies: A ‘reserved’ field to be used for nesting replies to the current Chirp
- Tags: Array of strings representing each HashTag
We can map the above easily to a POCO as follows:
public class Chirp
{
public int Id { get; set; }
public string Value { get; set; }
public int UserId {get;set;}
public int InReplyToId { get; set; }
public IList<int> ChirpReplies { get; set; }
public IList<string> Tags { get; set; }
}
With the schema in place, we will create a new Database on Raven Server that is available on port 8080. So in a dev environment once the server is running it should be available at http://localhost:8080/. If you get an error, navigate to the <solution>\packages\RavenDB.1.0.888\server and execute the Raven.Server.exe
By default all documents go to a database called ‘Default’. We will use the RavenDB Management System to create a database called ‘Chirpy’. Alternately the following piece of code ensures that the Database is created on first run if not present already
Creating Indexes using Map-Reduce and LINQ
Map-Reduce has become a popular technique in Key-Value or Document databases where a query is split into two phases (functions) Map and Reduce. In the Map phase, data is separated as per the grouping field or Key field. In Reduce phase, aggregations are done for each grouping created in the Map phase.
Different platforms support different languages for defining Map-Reduce queries. RavenDB uses LINQ.
Creating a Map-Reduce function from the RavenDB Management System
Step 1: Make sure the RavenDB server is running and navigate to the Management Console (it’s a Silverlight application).
Step 2: From the Databases pull down, select ‘Chirpy’
Step 3: Select the Index tab on top and then click on “Create an Index”
Step 4: Provide a Name, the Map LINQ query and the Reduce LINQ query
The Name
The name is similar to namespaces or folders with the first part being the collection that we want to build the index on and the second part being the name of the Index.
So Chirps is a collection of Chirp ‘documents’. Raven picks the name based on the name of the document class that you are persisting and pluralizes it. Chirp is the name of our Document class hence a collection called Chirps was created by RavenDB.
The Map Function
- Let’s look at the Map Query in details. In the first line we are referring to all the documents in the Chirps collections. Documents are accessed through the docs keyword, followed by the Collection Name.
- Rest of the query is standard LINQ where we are selecting one each HashTag in each Chirp.
- Also we are creating a property called Count with the value = 1. This way every tag has a count of 1 in the map phase. All the tags go into a ‘results’ output collection. Note, that if the same tag is present in two Chirps then they are present twice in the ‘results’ collection, but they have the same key (tag).
At the end of the Mapping phase, we extract the list of all Hashtags associated with all Chirps. Let move on to the reduce phase.
The Reduce Query
The reduce query starts with the results collection and groups by Tag. Then selects the Tag.Key and the summation of the Count. The summation happens for the particular Tag group. The output is a JSON with the Tag name and the count showing how many times the Tag is present.
Under the Hood
I think it’s now clear how RavenDB does indexing, but how does it store the Indexes? Well simple enough, RavenDB simply builds a new document type for the output of the reduce phase and saves one document instance for each result returned.
Eventual Consistency of RavenDB Indexes
RavenDB though a Document database, supports ACID properties for database writes. However, RavenDB indexes run on a parallel thread and support BASE (BASE in Document Databases imply Basically Available, Soft-state, Eventually-consistent). Consider a scenario, where we have millions of Chirps on our site and RavenDB has all of them indexed for hashtags. Now when ten new Chirps with new hashtags are added, RavenDB kicks off the Indexing thread to update the Index and marks the Index as stale. The index remains marked stale till it is done indexing the new entries.
However RavenDB unlike other RDBMS does not prevent you from querying the existing Indexes. However when querying stale Indexes, Raven clearly marks the Index as stale implying Indexing is in progress and the records ‘may’ not reflect the latest status. However once the Indexing completes, data will be consistent again. This is why RavenDB’s Indexes are called “Eventually consistent”.
Using the Index in ASP.NET
So we have now defined an Index in RavenDB. We will consume it in our ASP.NET application. The new document type that is created for the Index can either be replicated on the server side by creating a POCO or by using the ‘dynamic’ object. The dynamic object helps us by not requiring us to declare an explicit class however it is ineffective for filtering the result set based on a pre-determined value of one of the properties in the dynamic object.
The POCO class looks as follows, it maps the indexed document properties to Tag and Count.

The Repository code to get the data is as follows
We query the Index by specifying the Index Name that we had specified earlier and for all the Data Items returned we convert them into a Domain object, which is another POCO.
The ‘Trending Tags’ page shows a list of Tags and their mention count at the moment. It looks as follows:
The search results page in our ASP.NET Application looks as follows:
These results are from the following query, which essentially uses the default index to look through all documents and filters by looking into each Tag collection for the document and the provided search query.

A little later, we will see how it can be improved using a generic text Index
Dynamic Indexes in RavenDB
We saw how we can create Indexes on RavenDB and consume them from ASP.NET MVC. However there maybe scenarios where we want to do searches that do not have Indexes defined, how does RavenDB handle such searches? Answer is Dynamic Indexes. RavenDB looks at an incoming query and checks if there is an Index for it, if not it creates a dynamic Index and returns the result. However it keeps the dynamic Index around so that it does not incur the hit for rebuilding the Index every time. RavenDB intelligently improves on the Index as queries are fired against it.
Searching all Chirps using Dynamic Search
Searching from the Management Console
- Start the Management Console and Click on the Index Tab and select Dynamic Query

- Click on ‘All Docs’ and select Chirps to select the collection we want to search
- In the query window provide the query – Tags:#Chirp and click Execute
As we see Chirps with the Tag #chirp (case insensitive) are shown in the results.
Dynamic Search != Full Text Search
This dynamic search is not a full text search. As in, if we select the field Value and ask it to search for ‘chirp’, it will not return any result even though there are Chirps with that text in them.

You can do a wildcard search like Value:This* to get the following result however doing Value:*This* is not advisable because it results in ignoring the indexes and directly doing a full document store scan (similar to full table scans in RDBMS systems).
Text Search without Full Table Scan
To do text searches over certain fields in a RavenDB document store by defining an Index as follows
This defines a new Index that has a single object called Query which is an array with values from the chirp.Value field and chirp.Tags field. The fact that Tags is a collection is not a problem RavenDB boxes them into a two-dimensional array and will do a search in them too.
Once the Index is ready, we add a new Chirp as highlighted below
Once this chirp is in we can do a ‘text’ search for the word ‘fulltext’ as follows
Similary a search for ravendb will return Chirps that have RavenDB as a tag as well as a text in the Chirp
Fuzzy Searches and Lucene Indexes
RavenDB internally makes heavy use of the Lucene Search engine. Lucene is a powerful text search engine ported to .NET from the original Apache Java project.
Since RavenDB internally uses Lucene for text searches all of Lucene’s fuzzy search logic automatically included in RavenDB. For example in the above text search if we wanted to search for fultext instead of fulltext we could try the query
Query:fultext
However this would not return any results because the word fultext is not there in any of the Chirps. However if we wanted to tone the accuracy down by using Lucene syntax we could issue the search query
Query:fultext~0.8
We would get the result as follows
By adding the expression ‘~0.8’ we are tuning down the accuracy level (which is by default 1 or full word search). This is a straight Lucene feature that becomes available for us. Feel free to investigate more Lucene functions that can be used with RavenDB.
Going Forward
As much as I would love to go on with the power and simplicity of RavenDB, we bring this post to a close. We saw how powerful and simple RavenDBs searching and Indexing was. The source code is available for you to play around.
We will look at more features of RavenDB in the near future. Hope you enjoyed the article. Do let us know if you have questions, suggestions or feedback.
Update: Thanks to Richard Fawcett (https://github.com/yeurch) Chirpy is now available with a MVC3 Web Project and the original MVC4 BEta Web Project. They are two different solutions, you can download the entire code from https://github.com/dotnetcurry/Chirpy/zipball/master
This article has been editorially reviewed by Suprotim Agarwal.
C# and .NET have been around for a very long time, but their constant growth means there’s always more to learn.
We at DotNetCurry are very excited to announce The Absolutely Awesome Book on C# and .NET. This is a 500 pages concise technical eBook available in PDF, ePub (iPad), and Mobi (Kindle).
Organized around concepts, this Book aims to provide a concise, yet solid foundation in C# and .NET, covering C# 6.0, C# 7.0 and .NET Core, with chapters on the latest .NET Core 3.0, .NET Standard and C# 8.0 (final release) too. Use these concepts to deepen your existing knowledge of C# and .NET, to have a solid grasp of the latest in C# and .NET OR to crack your next .NET Interview.
Click here to Explore the Table of Contents or Download Sample Chapters!
Was this article worth reading? Share it with fellow developers too. Thanks!
Sumit is a .NET consultant and has been working on Microsoft Technologies since his college days. He edits, he codes and he manages content when at work. C# is his first love, but he is often seen flirting with Java and Objective C. You can follow him on twitter at @
sumitkm or email him at sumitkm [at] gmail