Do we need the database at all? – Event Sourcing!

In my previous blog post I reasoned about how the database could be hidden behind an ORM tool. And logic could now be written in lambdas (java or scala) instead of SQL. The ORM tool will transform the data model (residing in the database) to an object model (consisting of  java/scala objects).
I pinpointed  some big advantages of this, but I also mentioned that there was an even bigger advantage with the object model that I will address in this post.
Event Sourcing, let me explain what it is with a simple example.

Example: The dry cleaner

The dry cleaner is a place where you can leave your clothes to get them cleaned. In return for your clothes you receive a receipt. When the clothes are cleaned, you can get them back by showing the receipt.
A simple object model for this could be:

(more about how the DryCleaner trait could be implemented later)
The dry cleaner trait has methods for leaving and retrieving clothes as well as a method for looking at the dry cleaners current inventory of clothes.
A simple example of using this model:

The implementation of the DryCleaner could be backed by a database using an ORM tool, but let’s consider another alternative right now. What if all modifying methods of the DryCleaner implementation wrote their arguments to a log file? (the leave() and retrieve() methods modify the object model while the inventory method is a read only method).

The Event Log

What if all (modifying) operations on the object model resulted in a log entry. If all modifications was stored as “events” in a log.
A log created from the example above could yield the following log:

Note: the receipt number need not be present on “leave” rows, but increases the readability

This log contains all the information contained in the method calls of the example plus a serial number of the call and a time stamp. (The timestamps in this example has been modified to improve readability, the code example above would of course execute instantly).
Could you recreate the object model from the information contained in this log?
Yes!
How could the object model be recreated?
By using the information on each log row to call the right method on the DryCleaner object.
Does this work for complex object models?
Yes, the object model can be as complex as you need, below I will describe how to automate the process of reading/writing this log.
In that case, can you do without a database?
Yes, you can choose to store this log in a text file if you would like.

This is called Event Sourcing, and here are some benefits of Event Sourcing:

  • The object model will be portable using a simple text file
  • Human readable, if your events have good names, the resulting log is easy to understand for a human.
  • The log itself is a very elaborate audit log.
  • No database data model needed. Even if you store this log in a database, you do not need to model the entities in a data model.
  • A time machine, you know everything that has happened and when (unlike a data model in a relational database).

The time machine

What if I need to know the inventory of the dry cleaner at a specific point in time?
If I want to know the inventory of the dry cleaner at 2015-08-28 at 09:00, I simply load all events that has a timestamp less than this time. Thus we will have recreated the exact state of the dry cleaner at this moment!
In fact, we can recreate the state of all involved objects for any time we choose!
This is an extremely useful feature of event sourcing. You can return to any point in history to see exactly what happened.

Additional benefits of event sourcing

  • External systems can listen to the events (collect statistics or react to events)
  • Extreme performance. All operations on the object model will be in memory (no need to go to a database). This will be extremely fast.
  • Performance for persistence. All events are immutable and are always added, never removed. You can use an append-only operation for storing them.
  • Simplification. The events are simple. No complex domain object or graphs of objects need to be saved. The object model can of course be as complex as needed, the complexity is not transferred to the persistence mechanism.
  • Testing. It is easy to set up test data, a pre recorded event log can be used to set up the test data. No messy mocking needed.
  • Flexibility of design. The events does not describe the domain objects. So the object model can be redesigned without any need to migrate existing data to the new design.

And some Cons

  • You might need to support ‘old’ events for a long time. You can easily add new types of events, but removing old events is harder. Because you still have to be able to read old event logs. There is ways to remedy this (by making snapshots for example). But still, it could be a pain.
  • Event Sourcing is a less known technology compared to RDBMS etc. It will pose a learning curve to most software developers. And it might be harder to sell to an organization.

Automate the reading and writing of the event log

How do you implement this? Do you have to implement logging to the event log in every (modifying) method? And to parse and apply the states again, do you need to implement this for every class?
No, the implementation could be done through the reflection mechanism. Once and for all for all entities. You could use a naming convention or annotations to mark which methods that modify the object model.
In our bookkeeping application we implement the logging to the event log through a single class of 100 lines of code. All our 120 entities use this single class for all event writing and reading. This is an extremely big win, only 100 lines of code to handle all persistence for a very big application. All the slick code (see my last blog post) could also be removed! We can now concentrate on writing application logic!

Application of Event Sourcing at Speedledger

Our main product at Speedledger is our bookkeeping application. And bookkeeping seems to be the perfect match for event sourcing as described above.

  • We can restore any customers bookkeeping to any point in time. And thus we can let our customers do this in some cases (however, there are some legal aspects what you can do with your bookkeeping).
  • We can easily transfer data for one customer from one system to another. For example, we can transfer customer data from our production environment to the system used by our support personnel simply by exporting the event log as a text file. The support person can then view an exact copy of the customers data in their system while helping the customer.
  • Our support personnel can use the event log itself when helping customers, they see exactly what the customers has done in their bookkeeping. The support personnel has no specific technical training but can still read and use the event log since it only describes bookkeeping events (and not in a very technical way).

You need a database!

Just because it is possible to store the events in a text file without the database entirely, I do not recommend this. A database has many other advantages (transactions for instance). The point is that you do not need a relational database or SQL. You could use something much more lightweight, like a document oriented database. And the data model can be extremely simple, you just need a place to store the event log.

100% consistent object model and no bugs (next blog post)

What about database constraints? The database helps you perserve data integrity by using constraints. If you do not use a real data model (or a database at all), how can you preserve data integrity?

In my next blog post I will describe how to preserve data integrety in an object model with Event Sourcing. And in a much more complete way than you can hope to acheive by database constraints. And this will in turn lead to a manner of coding that makes bugs much less likely.

I want to write lambdas, not SQL

When lambda was introduced in Java, I was over the moon. I thought it was the best feature ever… soo useful! Though lambdas is no longer a new feature in Java (it has been in production for ~1.5 years), I have not used it nearly as much as I wanted. Why? Because the application I write most of my code for use plain old JDBC. Why is that a problem, why can I not use lambdas anyway?

Look at the figure below, these two examples do the same thing, one in SQL and one with Lambdas:

SQL

private static final String GET_CALCULATED_RESULT =
    "SELECT SUM(vertrans_amount) AS verifSum " +
    "FROM verif_trans vt, verifs v, periods p, accounts a, financial_years fy " +
    "WHERE vt.acc_id = a.acc_id AND a.acc_num >= 3000 AND a.acc_num < 9000 " +
    "AND vt.ver_id = v.ver_id AND v.period_id = p.period_id " +
    "AND p.finyear_id = fy.finyear_id AND fy.finyear_id = ? ";

Lambdas / Object model

SLMoney totSum = allVerifications.stream()
    .flatMap(verification -> verification.getVerifTransactions())
    .filter(verifTrans -> verifTrans.getAccount().isProfitAndLossAccount())
    .map(verifTrans -> verifTrans.getAmount())
    .collect(SLMoney.sum());

The two examples above do the same thing, why is the lambda version better?

Here are some reasons, the really good reasons require some more explaining (more about that later):

  • The lambda version can be parsed by the IDE and compiler (that will tell me if I am making some syntactic error or misspelled something)
  • The lambda version can be easily tested in a unit test. The JDBC version require testing in a real database that first must be populated with test data.
  • The lambda version require no extra code around. The JDBC code is best written in a separate class with a DAO interface in front to keep the java code readable.
  • The lambda version is written in Java, the other is not (only wrapped in java).

Object model – the key to using lambdas

To efficiently use lambdas on our domain, we need a real object model. An object model where our entities have real relations to each other. Not relations through database ids but real java references. Like this:

organization.getFirstFinancialYear().getAccouns().get(0);

And not like this.

FinancialYear financialYear = financialYearDao.getFinancialYear(organization.getFirstFinancialYearId());
List<Accounts> accounts = accountsDao.getAccounts(financialYear.getId());
Account account = accounts.get(0);

One way of getting what I want

It would be tempting to throw away the application and start all over again with the object model approach. But for several reasons, a big rewrite of our application is not the best option . Instead, we’ll have to “dig where we stand”. When we introduce an object model, the “old” code must still work in parallel. The new object model can be used when implementing new features.

The obvious way to introduce the object model is an ORM tool! This way we can keep the database while still exposing it as an object model.

An ORM tool would have to have the following characteristics:

  • It must be uncached, because it will have to work in parallel with old code written for JDBC.
  • I would like the entities to be immutable (why? more about this in my next blog post : Event Sourcing)
  • I would like the object model NOT to mirror the data model entirely. A naive ORM tool would map database tables to objects that exactly mimic the table with all the attributes. I do not want this because I would like to hide some attributes that are not needed in an object model (like database Ids and unused columns). And in some cases, I want to remodel the entities and relations between them in a way that does not directly map to the database.

My choice fell on Slick, a Scala ORM framework.. And as an extra bonus, no SQL whatsoever is needed since Slick lets me write all database queries/updates/inserts in pure Scala (with lambdas!).

20% Project

I told my boss Pål about my ideas, how we could simplify things, reduce the number of lines-of-code (LOC) and make the work more fun. Being a visionary guy (and an ex developer himself), Pål immediately saw the potential. We discussed the idea and decided that I could dedicate some time to work on this. I would dedicate one day a week to this project, a so called “20% project”.

Results

I have now spent 60 hours implementing the object model in our flagship product (the accounting application). And it really looks promising! We can now write new functionality using lambdas (in Java or Scala) like figure 1. The amount of code needed to write a new service in the application is reduced to a fraction, while the readability is improved.

Exporting the object model in JSON

A really cool side effect of this is that we can now export customer data from the database as JSON! Since the object model is entirely made of Java/Scala objects, I can use reflection to export all the data in the model as JSON (and then importing it again). We have attempted this before (extracting customer data for one customer in JSON or XML) but doing this directly from the database has has been too complex (there is 120 database tables involved). Now it is trivial The class that serializes the entire object model as JSON is 58 LOC long and the deserializer is 73 LOC.

Number of code lines left (a thought experiment)

Code is a liability. This is the view of myself and my colleagues. If some LOC can be removed, they should! (if readability is maintained). Less code means less number of bugs and less code to maintain. Therefore it is interesting to see if and how much code we can save if we would rewrite the entire application using the new object model and lambdas.

Of course, all the old JDBC code could be removed. All other Java code could be rewritten using lambdas (I assume this code would be ~1/3 of the original size, I think this is a conservative estimate). Some code will be added to the new Scala slick database classes and entities.

Total accounting code 161857
JDBC classes (SQL + Java) -17606
DAO interfaces -5181
JDBC test classes -16869
Old Model objects -27419
All other code -2/3 * 94782
Scala slick DB classes +4572
New entities +6508
Total LOC left 42474

LOC left: 26% of the original

That’s a real big win by itself! But, wait till you see whats comes next…

Next blog post

In my next blog post, I will explain the really big advantages of the object model. Namely: a way to introduce Event Sourcing and removing the entire database.