Using RavenDB and ASP.NET MVC 4 to create a Twitter Clone Chirpy

Posted by: Sumit Maitra , on 5/5/2012, in Category ASP.NET MVC
Views: 84382
Abstract: In this article, we will be exploring the power and simplicity of RavenDB's Searching and Indexing. We will create a sample ASP.NET MVC app called Chirpy to demonstrate how we can Index Data in RavenDB

In my previous articles, I have introduced RavenDB, the document database. We have performed basic CRUD operations and seen how data can be persisted to RavenDB. Today we will go one step ahead and see how we can Index data in RavenDB.

As a pre-requisite, it is a good idea to go through the introductory posts first.

Introducing ‘Chirpy - the Twitter Clone'

Before we dive into RavenDB indexes, let’s set the context for ourselves. We all know that Twitter searches can at times be frustrating! Let us go ahead and build a clone for ourselves with powerful yet simple search capabilities . Since we need performance and scale-out capability to handle our ultra popular app, we choose RavenDB over other available databases.

Update: Check out ChirpyHQ in the next part of this article RavenHQ and ASP.NET MVC – Moving your Data to the Cloud using AppHarbor

As of now, we have a very basic functionality that involves sending out a ‘chirp’ message and adding hashtags to keywords. We want to create a ‘trending’ topic by monitoring hashtag counts as of now. We should also be able to search for words or phrases in old Chirps.

 

For future enhancement we want to be able to Reply to a chirp and represent ‘chirp’ replies hierarchically. Finally we would like to search Chirps via Hashtag.

The complete source code is available here. Feel free to download and run it as you walk through the article.

The ‘Chirp’ Document Structure

With the above requirement in mind, we build our document structure as follows:

{
   "Value": "Introducing #Chirpy - the new #microblogging platform",
   "UserId": 0,
   "InReplyToId": -1,
   "ChirpReplies": null,
   "Tags": [
     "#Chirpy",
     "#microblogging"
   ]
}

This is a JSON representation of an object which is basically a set of <key,value> pairs. These are defined as follows:

  • Value: The text of the chirp as is.
  • UserId: The numeric identifier for the User who sent out the chirp. The actual User is a separate document and is reserved for future use.
  • InReplyToId: The numeric Id of the Chirp to which the current Chirp is a reply
  • ChirpReplies: A ‘reserved’ field to be used for nesting replies to the current Chirp
  • Tags: Array of strings representing each HashTag

We can map the above easily to a POCO as follows:

public class Chirp
{
    public int Id { get; set; }
    public string Value { get; set; }
    public int UserId {get;set;}
    public int InReplyToId { get; set; }
    public IList<int> ChirpReplies { get; set; }
    public IList<string> Tags { get; set; }
}

With the schema in place, we will create a new Database on Raven Server that is available on port 8080. So in a dev environment once the server is running it should be available at http://localhost:8080/. If you get an error, navigate to the <solution>\packages\RavenDB.1.0.888\server and execute the Raven.Server.exe

By default all documents go to a database called ‘Default’. We will use the RavenDB Management System to create a database called ‘Chirpy’. Alternately the following piece of code ensures that the Database is created on first run if not present already

ensure-database-create

 

Creating Indexes using Map-Reduce and LINQ

Map-Reduce has become a popular technique in Key-Value or Document databases where a query is split into two phases (functions) Map and Reduce. In the Map phase, data is separated as per the grouping field or Key field. In Reduce phase, aggregations are done for each grouping created in the Map phase.

Different platforms support different languages for defining Map-Reduce queries. RavenDB uses LINQ.

Creating a Map-Reduce function from the RavenDB Management System

Step 1: Make sure the RavenDB server is running and navigate to the Management Console (it’s a Silverlight application).

Step 2: From the Databases pull down, select ‘Chirpy’

selecting-chirpy-database

Step 3: Select the Index tab on top and then click on “Create an Index”

navigate-to-create-an-index

Step 4: Provide a Name, the Map LINQ query and the Reduce LINQ query

map-reduce-tag-count

The Name

The name is similar to namespaces or folders with the first part being the collection that we want to build the index on and the second part being the name of the Index.

So Chirps is a collection of Chirp ‘documents’. Raven picks the name based on the name of the document class that you are persisting and pluralizes it. Chirp is the name of our Document class hence a collection called Chirps was created by RavenDB.

The Map Function
  • Let’s look at the Map Query in details. In the first line we are referring to all the documents in the Chirps collections. Documents are accessed through the docs keyword, followed by the Collection Name.
  • Rest of the query is standard LINQ where we are selecting one each HashTag in each Chirp.
  • Also we are creating a property called Count with the value = 1. This way every tag has a count of 1 in the map phase. All the tags go into a ‘results’ output collection. Note, that if the same tag is present in two Chirps then they are present twice in the ‘results’ collection, but they have the same key (tag).

At the end of the Mapping phase, we extract the list of all Hashtags associated with all Chirps. Let move on to the reduce phase.

The Reduce Query

The reduce query starts with the results collection and groups by Tag. Then selects the Tag.Key and the summation of the Count. The summation happens for the particular Tag group. The output is a JSON with the Tag name and the count showing how many times the Tag is present.

Under the Hood

I think it’s now clear how RavenDB does indexing, but how does it store the Indexes? Well simple enough, RavenDB simply builds a new document type for the output of the reduce phase and saves one document instance for each result returned.

Eventual Consistency of RavenDB Indexes

RavenDB though a Document database, supports ACID properties for database writes. However, RavenDB indexes run on a parallel thread and support BASE (BASE in Document Databases imply Basically Available, Soft-state, Eventually-consistent). Consider a scenario, where we have millions of Chirps on our site and RavenDB has all of them indexed for hashtags. Now when ten new Chirps with new hashtags are added, RavenDB kicks off the Indexing thread to update the Index and marks the Index as stale. The index remains marked stale till it is done indexing the new entries.

However RavenDB unlike other RDBMS does not prevent you from querying the existing Indexes. However when querying stale Indexes, Raven clearly marks the Index as stale implying Indexing is in progress and the records ‘may’ not reflect the latest status. However once the Indexing completes, data will be consistent again. This is why RavenDB’s Indexes are called “Eventually consistent”.

Using the Index in ASP.NET

So we have now defined an Index in RavenDB. We will consume it in our ASP.NET application. The new document type that is created for the Index can either be replicated on the server side by creating a POCO or by using the ‘dynamic’ object. The dynamic object helps us by not requiring us to declare an explicit class however it is ineffective for filtering the result set based on a pre-determined value of one of the properties in the dynamic object.

The POCO class looks as follows, it maps the indexed document properties to Tag and Count.

tag-search-result-poco

The Repository code to get the data is as follows

get-all-chirp-repository-code

We query the Index by specifying the Index Name that we had specified earlier and for all the Data Items returned we convert them into a Domain object, which is another POCO.

The ‘Trending Tags’ page shows a list of Tags and their mention count at the moment. It looks as follows:

whats-trending

The search results page in our ASP.NET Application looks as follows:

search-page

These results are from the following query, which essentially uses the default index to look through all documents and filters by looking into each Tag collection for the document and the provided search query.

get-all-chirps-by-tag

A little later, we will see how it can be improved using a generic text Index

Dynamic Indexes in RavenDB

We saw how we can create Indexes on RavenDB and consume them from ASP.NET MVC. However there maybe scenarios where we want to do searches that do not have Indexes defined, how does RavenDB handle such searches? Answer is Dynamic Indexes. RavenDB looks at an incoming query and checks if there is an Index for it, if not it creates a dynamic Index and returns the result. However it keeps the dynamic Index around so that it does not incur the hit for rebuilding the Index every time. RavenDB intelligently improves on the Index as queries are fired against it.

Searching all Chirps using Dynamic Search

Searching from the Management Console

- Start the Management Console and Click on the Index Tab and select Dynamic Query

navigate-dynamic-query

- Click on ‘All Docs’ and select Chirps to select the collection we want to search

dynamic-search-select-collection

- In the query window provide the query – Tags:#Chirp and click Execute

dynamic-tag-search

As we see Chirps with the Tag #chirp (case insensitive) are shown in the results.

Dynamic Search != Full Text Search

This dynamic search is not a full text search. As in, if we select the field Value and ask it to search for ‘chirp’, it will not return any result even though there are Chirps with that text in them.

failed-dynamic-text-search

You can do a wildcard search like Value:This* to get the following result however doing Value:*This* is not advisable because it results in ignoring the indexes and directly doing a full document store scan (similar to full table scans in RDBMS systems).

wildcard-search

 

Text Search without Full Table Scan

To do text searches over certain fields in a RavenDB document store by defining an Index as follows

defining-text-search

This defines a new Index that has a single object called Query which is an array with values from the chirp.Value field and chirp.Tags field. The fact that Tags is a collection is not a problem RavenDB boxes them into a two-dimensional array and will do a search in them too.

Once the Index is ready, we add a new Chirp as highlighted below

chirpy-chirp-index

Once this chirp is in we can do a ‘text’ search for the word ‘fulltext’ as follows

text-search-value

Similary a search for ravendb will return Chirps that have RavenDB as a tag as well as a text in the Chirp

text-search-tag-and-values

 

Fuzzy Searches and Lucene Indexes

RavenDB internally makes heavy use of the Lucene Search engine. Lucene is a powerful text search engine ported to .NET from the original Apache Java project.

Since RavenDB internally uses Lucene for text searches all of Lucene’s fuzzy search logic automatically included in RavenDB. For example in the above text search if we wanted to search for fultext instead of fulltext we could try the query

Query:fultext

However this would not return any results because the word fultext is not there in any of the Chirps. However if we wanted to tone the accuracy down by using Lucene syntax we could issue the search query

Query:fultext~0.8

We would get the result as follows

lucene-fuzzy-search

By adding the expression ‘~0.8’ we are tuning down the accuracy level (which is by default 1 or full word search). This is a straight Lucene feature that becomes available for us. Feel free to investigate more Lucene functions that can be used with RavenDB.

Going Forward

As much as I would love to go on with the power and simplicity of RavenDB, we bring this post to a close. We saw how powerful and simple RavenDBs searching and Indexing was. The source code is available for you to play around.

We will look at more features of RavenDB in the near future. Hope you enjoyed the article.  Do let us know if you have questions, suggestions or feedback.

Update: Thanks to Richard Fawcett (https://github.com/yeurch) Chirpy is now available with a MVC3 Web Project and the original MVC4 BEta Web Project. They are two different solutions, you can download the entire code from https://github.com/dotnetcurry/Chirpy/zipball/master

Give a +1 to this article if you think it was well written. Thanks!
Recommended Articles
Sumit is a .NET consultant and has been working on Microsoft Technologies since his college days. He edits, he codes and he manages content when at work. C# is his first love, but he is often seen flirting with Java and Objective C. You can follow him on twitter at @sumitkm or email him at sumitkm [at] gmail


Page copy protected against web site content infringement by Copyscape


User Feedback
Comment posted by Justin on Saturday, May 5, 2012 2:28 PM
Your sample code uses the repository pattern and it is recommended to not use the repository pattern with raven. See this post which explains why: http://novuscraft.com/blog/ravendb-and-the-repository-pattern
Comment posted by Justin on Saturday, May 5, 2012 2:37 PM
oops sorry the entire link was not pasted in here it is: <a href="http://novuscraft.com/blog/ravendb-and-the-repository-pattern">RavenDB and the repository pattern</a>.
Comment posted by Sumit on Saturday, May 5, 2012 5:21 PM
Hello Justin,

Thanks for the link. Yes, I am aware of the other pattern that Ayende also prefers. But as I type this I am working on the next version of Chirpy and the Repository pattern just saved me some refactoring.

Technically concern for Repository pattern is that it's an extra layer to write, but any large scale system will eventually have a need for atleast one intermediate DTO. No harm in getting started with one.
The other concern about DocumentSession is valid. I'll update the code to reuse sessions better.

Design patterns are like opinions, as some wise man said - "have strong opinions, that are weakly held" :-).

Cheers,
Sumit.
P.S. This article I wrote earlier describes a 'repository-less' pattern http://www.dotnetcurry.com/ShowArticle.aspx?ID=787
Comment posted by angel on Sunday, May 6, 2012 11:37 PM
why I can't open it in VS2010?...the asp project doesn't open :(
Comment posted by Pierrick on Monday, May 7, 2012 1:38 AM
Really good article! I will surely use it to get started with RavenDB.
Concerning the repository layer I reckon it's fine not to have one in any demo or simple project as it wastes a lot of time. But as you said any large project will need to have that level of abstraction, it's just cleaner when the codebase gets bigger.
Thanks again
Comment posted by TheCodeJunkie on Monday, May 7, 2012 3:22 AM
'Chripy' has been used by these guys http://chirpy.codeplex.com/ as a project name for quite some time now. Suggest changing it to avoid confusion :)
Comment posted by Suprotim Agarwal on Monday, May 7, 2012 3:31 AM
Thanks @TheCodeJunkie for your suggestion. I guess for now we will call it The Twitter Clone Chirpy. For the next version, we will be more specific!
Comment posted by Sumit on Monday, May 7, 2012 4:21 AM
@Angel : The project uses the ASP.NET MVC 4 Beta. That could be a reason. Any other error messages that you can share will help.

@Pierrick : Glad you enjoyed it.

@TheCodeJunkie : Oh my, they have an awesome mascot too :), we definitely don't want to create confusion with that project.
Comment posted by martonx on Thursday, May 10, 2012 5:50 AM
Good articele. Could you tell us about different NoSQL DB engines. For example mongoDB vs ravenDB vs anything else.
Why did you choose ravenDB?
Comment posted by mikko on Thursday, May 10, 2012 1:56 PM
Good stuff and brilliantly written
Comment posted by Miss Mir on Friday, May 11, 2012 12:24 AM
i liked it.. but not understood it really welll..m new to .net framework> hopes that in future it will help me
Comment posted by swayaminfotech on Monday, May 14, 2012 7:49 AM
thanks posting....
Comment posted by Sumit on Monday, May 14, 2012 1:39 PM
@martonx now that you have requested we will definitely put it on our list to do a MongoDB article. RavenDB was not 'selected' as such, we have been following it's growth and it's an exciting platform that's built on .net ground up. It is maturing well and as true blue .net enthusiasts we enjoy a nice document db platform that is designed for horizontal scale out in .net.
@mikko @swayaminfotech glad you enjoyed it.
@Miss Mir you can try some of the articles I wrote earlier introducing RavenDb as well as the DEpendency Injection articles.

Post your comment
Name:  
E-mail: (Will not be displayed)
Comment:
Insert Cancel