March 8, 2012

Reference lists in RavenDB

When using a relational database such as SQL Server it is very common to store reference lists (e.g. a list of Countries) in their own tables. Then, when we need to reference a Country from an entity, we would just hold a reference to its identifier.

In fact, we could even model these lists in our domain and use our fancy ORM to automatically wire up the association. For example:

public class Vacancy
{
    public int Id {get;set;}
    public string Title { get; set; }
    public Industry Industry { get; set; }
}

public class Industry
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string ResourceLink { get; set; }
}

We could then load a Vacancy object and navigate directly to its industry:

var vacancy = repo.Get<Vacancy>(21);
var resourceLink = vacancy.Industry.ResourceLink;

In the database, there would be a one-to-many relationship from the Industry table to the Vacancy table.

Enter RavenDB

When I first started working with RavenDB (a document database), the lack of relations was actually a benefit. It encourages you to follow DDD (Domain-Driven Design) practices and removes much of the friction encountered when trying to persist complex domain objects to a relational store. Value objects now sit where they should be, in the same document as their root aggregate.

It is possible to reference other documents. This makes sense when you need to reference other aggregates. Take the typical customer / order example - both of these entities are aggregates and should be separate documents. In RavenDB we can just store a “CustomerId” reference within the Order document.

In fact, if we want to load the customer’s details along with the order we can do the following:

var order = 
    session.Include<Order>(o => o.CustomerId).Load(123);
var customer =
    session.Load<Customer>(order.CustomerId);

Whilst I’m happy with the above approach for referencing other aggregates, I didn’t like it for reference lists.

A reference list item such as a Country means nothing by itself. The logical solution then is to just store the reference list item within the entity document:

{
	Name: "Ben Foster",
	Country: { Name = "France" }
}

But what if the name of a Country changes?

Get over it!

This was actually the first thing that crossed my mind when doing the above. But seriously, how often is a country name going to change, or a state name. If we were working with an ever changing set of data, we wouldn’t call them reference lists.

When you start using a document database you may need to make a few compromises. In this case, I would rather not have to make an additional call to load every piece of reference data on an entity. The cost is that if the data changes in our reference list we haven’t got any kind of referencial integrity - but then we could quite easily use a listener for that sort of thing.

Complex reference data

In some cases you may have complex reference data that is subject to change. In these cases, create a “Reference” class that you store within your entity documents that contains the most commonly used information, along with an identifier that allows you to navigate to the full reference data if you need it. Going back to the Vacancy/Industry example at the beginning of this post:

public class Vacancy
{
    public string Id {get;set;}
    public string Title { get; set; }
    public IndustryReference Industry { get; set; }
}

public class IndustryReference
{
    public string Id { get; set; }
    public string Name { get; set; }
}

public class Industry
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string ResourceLink { get; set; }
	// other properties
}

Of course, you could also create an index that includes the addition Industry fields - the solution you go with will largely depend on how often you need access to this information.

Storing reference data

Ayende did a post on modelling reference data in RavenDB. Often reference data is little more than a list of string values used to populate a select list. To have a separate document for each value is overkill. In Ayende’s post he stores the list of States within a single document.

I like this approach and as such created a generic way to handle reference lists:

public class ReferenceList<T> where T : ReferenceListItem
{
    protected ReferenceList() { } // for RavenDB's benefit

	public ReferenceList(string id)
    {
        if (string.IsNullOrEmpty(id))
            throw new ArgumentNullException("id");

        Id = "referencelists/" + id;
        Items = new List<T>();
    }
    
    public string Id { get; private set; }
    public ICollection<T> Items { get; private set; }

    public void AddItem(T item)
    {
        if (item == null)
            throw new ArgumentNullException("item");

        Items.Add(item);
    }
}

public abstract class ReferenceListItem
{
    public string Id { get; set; }

    public ReferenceListItem(string id)
    {
        Id = id;
    }
}

public class Country : ReferenceListItem
{       
    public string Name { get; set; }
    public string IsoCode { get; set; }
    public StateProvince[] States { get; set; }

    public Country(string name, string isoCode) : base(isoCode)
    {
        this.Name = name;
        this.IsoCode = isoCode;
    }

    public class StateProvince
    {
        public string Name { get; set; }
        public string Abbreviation { get; set; }
    }
}

We can then create a new reference list like so:

var countryList = new ReferenceList<Country>("countries");
countryList.AddItem(
    new Country("United States", "840")
    {
        States = new[] { 
            new Country.StateProvince { Name = "Maryland", Abbreviation = "MD" },
            new Country.StateProvince { Name = "Massachusetts", Abbreviation = "MA" }
        }
    }
);

This will create the following document:

{
  "Items": [
    {
      "Name": "United States",
      "IsoCode": "840",
      "States": [
        {
          "Name": "Maryland",
          "Abbreviation": "MD"
        },
        {
          "Name": "Massachusetts",
          "Abbreviation": "MA"
        }
      ],
      "Id": "840"
    }
  ]
}

The only problem is that RavenDB will create a document collection named “ReferenceListOfCountries” (the default naming convention for generic types).

We can override this with the following convention:

documentStore.Conventions.FindTypeTagName =
                        type => (IsReferenceList(type)) 
                            ? "ReferenceLists" 
                            : DocumentConvention.DefaultTypeTagName(type);

//
private static bool IsReferenceList(Type type)
{
    return type.IsGenericType &&
        type.GetGenericTypeDefinition() == typeof(ReferenceList<>);
}

Now when we save the Country list it will be added to the “ReferenceLists” document collection.

We can now view our full country reference list with a single GET:

http://localhost:8080/databases/ravendbtests/docs/referencelists/countries

To retrieve the list using the .NET client is equally trivial:

IEnumerable<Country> countryList = 
    Session.Load<ReferenceList<Country>>("referencelists/countries")
    .Items;

We could even wrap this into an extension method:

public static class Extensions
{
    public static IEnumerable<T> LoadReferenceList<T>
        (this IDocumentSession session, string referenceListName) where T : ReferenceListItem
    {
        var refList = session.Load<ReferenceList<T>>("referencelists/" + referenceListName);
        return refList != null ? refList.Items : Enumerable.Empty<T>(); 
    }
}

Usage:

var countryList = Session.LoadReferenceList<Country>("countries");

I’ll be putting this concept into production in the coming weeks so will no doubt post again regarding how to handle the situation when reference list data does change.

© 2022 Ben Foster