The case for document-oriented databases

Last week, I started working with AngularJS for a real-world project - I had played with Angular before, mainly to see how it differs from KnockoutJS which I had been using intensively before. This post is not about AngularJS itself (I may write one later on when I have a better perspective about the framework), I’ll just say that my first impressions about it are:

it lacks good documentation - the existing documentation doesn’t exactly have the right “feel”, and is incomplete
when coming from Knockout, Angular feels rather invasive, given that it is a “full-blown” framework rather than “just” a MVVM mapping tool
my current impression is that the learning curve looks like a roller-coaster - there’s moments of intense productivity interrupted with high amounts of “not getting anywhere, why on earth am I using this sh*t”

That being said, there’s one thing that AngularJS seems to get really right: it makes it possible to map documents to the UI without imposing its own design philosophy on the said document. With KnockoutJS for example, you’d have to make sure certain fields were provided as arrays, and generally had to know really well how the different bindings worked before knowing into what shape to bring your data so that the UI would do what you wanted.

Let’s say for example that you want to make it possible to add advertisements about wine on an application (who wouldn’t). To spice things up a bit, let’s say that the content should be multilingual. We’d have a model looking like this:

{
    "translations": {
        "fr": {
            "title": "Côtes de Provence rosé",
            "description": "Robe claire et limpide, le nez en accords avec la bouche, dévoile des  notes originales et plaisantes de fraise écrasée et de fruits rouges."
        },
        "en" {
            "title": "Rosé wine Côtes de Provence",
            "description": "..."
        }
    },
    "url": "http://www.cotes-de-provence-domaine-grand-rouviere.com/c%C3%B4tes-de-provence-ros%C3%A9/"
}

So our document mainly consists of a link and to multilingual content (in reality we may want to add more things to it, such as special characteristics, the price, etc., but let’s keep things simple for the sake of the example).

It is possible to write up the user-interface for the above model like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23



<div class="row">
 <h3>Wines</h3>

 <div>
  <div control-group label="Language">
   <select id="languageCode" name="languageCode" ng-model="$parent.currentLang"
           ng-options="l.value as l.label for l in languages"></select>
  </div>
  <div control-group label="Title">
   <input type="text" id="title" name="title" ng-model="wine.translations[$parent.currentLang].title" required>
   <span validation-error-for="required">Required</span>
  </div>
  <div control-group label="Description">
   <textarea id="description" name="description" class="field span6" rows="6"
             ng-model="wine.translations[$parent.currentLang].description"></textarea>
  </div>
  <div control-group label="Link">
   <input type="text" id="site" name="site" ng-model="wine.url" required>
  </div>

 </div>
</div>

Some comments about this code:

the control-group and validation-error-for directives are custom directives to render boilerplate Twitter Bootstrap) markup, inspired by this article
the ng-options directive fetches the languages and displays them appropriately in the language drop-down. The languages are made available in the root scope of the application and have as value [{value: "en", label: "English"}, {value: "fr", label: "French"]

What I find really interesting in comparison to KnockoutJS is the possibility to bind values against a map: using the binding expression wine.translations[$parent.currentLang], we can dynamically navigate the document hierarchy without having to add any special lookup code. Of course, when e.g. creating a new document, we need to provide a scaffold for those documents, where translation would be an empty object, but this is a small price to pay in comparison to the flexibility this provides in regards to data design.

So what does this all have to do with document-oriented databases? Well, let’s see what we need to represent the above object in a relational database, using e.g. Slick. Given that we have multi-lingual content, we have several options as to how to model it, let’s go for an approach where each entity requiring multi-lingual content gets its own translation table. Using lifted embedding, we now get the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


case class Wine(
  id: Option[Long],
  url: String)

case class WineTranslation(
    id: Option[Long],
    wineId: Long,
    title: String,
    description: String,
    languageCode: String)

object Wines extends Table[Wine]("WINES") {
    def id = column[Long]("ID", O.PrimaryKey, O.AutoInc)
    def url = column[String]("URL")

    def * =  id.? ~ url <> (Wine, Wine.unapply _)
}

object WineTranslations extends Table[WineTranslation]("WINE_TRANSLATIONS") {
    def id = column[Long]("ID", O.PrimaryKey, O.AutoInc)
    def wineId = column[Long]("WINE_ID")
    def title = column[String]("TITLE")
    def description = column[String]("DESCRIPTION")
    def languageCode = column[String]("LANGUAGE_CODE")

    def * =  id.? ~ wineId ~ title ~ description ~ languageCode <> (WineTranslation, WineTranslation.unapply _)
}

Next, let’s see how to turn this into the document we want to display. For this purpose, we need to create a model on the server-side to represent the document to which to map:

1
2
3
4
5
6
7
8
9


case class WineView(
  url: String,
  translations: Map[String, WineTranslationView]
)

case class WineTranslationView(
  title: String,
  description: String
)

Now, let’s write the code to get back a full representation of a Wine, including the translations, added to the Wines table object defined above:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


def findOne(id: Long): Option[(Wine, Seq[WineTranslation]] = DB.withSession { implicit session =>
  val query = for {
    w <- Wines if w.id === id
    t <- WineTranslations if t.wineId === w.id
  } yield (w, t)

  query.
    list(). // fetch results
    groupBy(_._1). // group by wine
    map { case (wine, group) =>
      (wine, group.map(_._2)) // turn this into a tuple of (wine, translations)
    }.headOption
}

Great, now we’ve got our data, let’s give that to the client (assuming we use the Play framework):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


def wine(id: Long) = Action { implicit request =>
  Wines.findOne(id).map { case (wine, translations) =>
    val view = WineView(
      url = wine.url,
      translations = translations.map(t => (t.languageCode, WineTranslationView(t.title, t.description)))
    )
    Ok(Json.toJson(view))
  } getOrElse {
    NotFound
  }
}

If we want to create or update a new wine entry, we need to do the mapping the other way around:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


def update(id: Long) = Action { implicit request =>
  Wines.findOne(id).map { case (wine, translations) =>
    request.body.asJson.flatMap { body =>
      Json.fromJson[WineView](body).asOpt.map { updated =>
        WineTranslations.deleteWineTranslations(id)
        updated.translations.map { case (lang, translationView) =>
          val t = WineTranslation(title = translationView.title, description = translationView.description, languageCode = lang)
          WineTranslations.insert(t)
        }
        val updatedWine = wine.copy(url = updated.url)
        Wines.update(id, updatedWine)
        Ok
      }
    } getOrElse {
      BadRequest
    }
  } getOrElse {
    NotFound
  }
}

I think that by now you may guess what I’m getting at: these mapping gymnastics to go from a hierarchical document to a relational database are extremely verbose. Let’s see in contrast how the same would look like by putting things into e.g. MongoDB using the Salat case class mapper.

First, the data definition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


case class Wine(
  id: Option[ObjectId],
  url: String,
  translations: Map[String, WineTranslation]
)

case class WineTranslation(
  title: String,
  description: String
)

object Wines extends SalatDAO[Wine, ObjectId](wineCollection)

Next, the controller code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


def wine(id: ObjectId) = Action { implicit request =>
  Wines.findOneById(id).map { wine =>
    Ok(Json.toJson(wine))
  }
}

def update(id: ObjectId) = Action { implicit request =>
  Wines.findOneById(id).map { existing =>
    request.body.asJson.flatMap { body =>
      Json.fromJson[Wine](body).asOpt.map { updated =>
        Wines.save(updated)
        Ok
      }
    } getOrElse {
      BadRequest
    }
  } getOrElse {
    NotFound
  }
}

Much simpler, isn’t it?

I think what this boils down to is the following: AngularJS and other modern client-side frameworks are built around the fundamental concept of manipulating and working with hierarchical documents. Relational databases, on the other hand, aren’t all that great at representing hierarchies - those need to be turned into relations, which are fundamentally different things. Of course, there is a way to get from one paradigm to the other, however, as we’ve seen, it is up to the developer to do the mapping from one to the other. In the Java space, this mapping has traditionally been done using ORMs such as Hibernate, however, these tools bring another set of problems with them, as soon as the data model gets bigger.

I’m pretty sure that I’m doing some things the wrong way in regards to Slick; the query above can probably be simplified somewhat. Yet the issue remains that overall, representing a document in a relational database does require a non-trivial amount of mapping effort.

On the other hand, relational databases are (still to this day) percevied as safe and easy to operate, and the hype that came with the NoSQL movement has essentially focused on how fast and scalable those technologies are. And yet, I think that at the end of the day, there’s only a few players that really require the speed and scalability that’s been hyped so much, and a lot more that could benefit from using document stores for their “document” aspect.

Contents