On the rare occasion that I walk into the engineer pit I am usually hard-pressed to be acknowledged with more than a grunt, smile and a quick glance away from the computer screens. I get it though, “plugged-in” engineers are focused creatures with little to no tolerance for outside distraction. Recently however, I was surprised by the heated discussion I entered into upon beckoning them for lunch. The great “database debacle”of 2014 I like to call it. Intrigued but perplexed I listened and learned some interesting details, particularly regarding MongoDB and Cassandra. Our lead engineer helped me out afterwards with a brief Q&A to clarify some of the key differences/similarities between the two and some pros/cons. The basic differences are listed below:
Q) If price were not a factor which would you choose to use for your own company and why?
A) Each has their own use cases. If you’re looking to consume large volumes of data and wish to have linear scalability, Cassandra is your database. If write speed is not a concern, and ease of development is more important, go with Mongo. Cassandra is more complex to use, and more sensitive to queries (in fact, one large query can very easily bring down a node). I wouldn’t suggest building a webapp on Cassandra, but it’s more appropriate for a big data system.
Q) Which do you feel is easier to work on ?
A) Mongo is easier to develop on.
Q) In your opinion what is the biggest pro and biggest con for each?
A) Biggest Pros: Cassandra: Write performance, idempotency is easy to maintain (don’t need to do a query before an insertion), durability, linear scalability, no single point of failure. Mongo: Ease of development, stores BSON (basically JSON) which is easy to manage and extremely useful when working with web applications, super simple and powerful indexing. Biggest Cons: Cassandra: Queries are slow and can kill nodes, in-depth understanding of the database is required to effectively manage it. Mongo: global write lock limits its use for big data applications.
Q) Why do you think Rubikloud would be best to use Cassandra/ Mongo?
A) Cassandra is used for our consumption layer – we consume large volumes of data at a very high velocity and that data needs somewhere to be written. Cassandra is extremely efficient at handling writes. Also, because we’re using Storm, we need the idempotency properties provided by Cassandra. Storm is a guaranteed at least once processing system – so if the database receives events multiple times, it needs to handle it in an idempotent way so there aren’t duplicate records. Mongo is great for our data scientists. All results of our aggregation systems are stored in Mongo for the scientists to explore, run tests on, and develop against. It’s also great for the front end because most results that are stored in Mongo can be sent practically as-is because it is already in a JSON format.