We interviewed Hanneli Tavante, who is one of our speakers at ConFoo Vancouver 2016. Her presentation is titled “Cassandra - Crash Course” Hanneli is a developer and Open Source contributor. She organizes various community meetups. She is also passionate about mathematics and machine learning. She lives in Brazil.
What advantages does Cassandra offer over the common database management systems?
If you have large amounts of data in a relational database, it is possible that you might have struggled with some problems: consistency under the master/slave model, sharding, availability when performing queries on a database with sharding, denormalisation, and other topics. Cassandra comes up with the idea of linear scalability via commodity hardware, which means that when you have an increasing number of requests, you just add more machines, and it works! You do not have any manual work over your data. There is no single point of failure, so if uptime means a lot to your business, Cassandra is possibly a great option. Last but not least - incredible speed on writings and readings, flexibility over the CAP theorem, and a fantastic community supporting the database.
Can a non-DBA set up a Cassandra cluster?
Absolutely. There are pre-built instances on Amazon and Azure. The tutorials are very accurate and, in the case of questions, the community is active. DataStax also provides an enterprise paid version of Cassandra (together with other resources), for those who are willing to have support. However, it is important to have an expert in operations and cloud/ hardware (depending on your choice) nearby.
Is it worth switching an existing project to Cassandra, and what effort would that require?
It is very likely that you might want to build on a hybrid model. Start migrating the critical parts of the system to Cassandra, to provide a better throughput to your clients or to increase your availability. If you notice that your business could be fully migrated to the Cassandra data model, you can go ahead and make a full change as well. The biggest effort might be the paradigm change - relational data model is no longer valid with Cassandra. The learning curve is pretty smooth, so after one or two months practicing, the team should be able to adopt and use Cassandra.
What is the data model used by Cassandra?
Strictly classifying, Cassandra is a distributed key-value store. It works under a masterless architecture, where nodes are distributed and are identical structures. There is no idea of master/slave. Think of a ring with several nodes it, each of them with equal responsibilities. This is the idea of Cassandra.
What resources would you recommend to beginners?
Datastax Academy is a great resource with self-paced courses. Planet Cassandra is another excellent website, with lots of tutorials and blog posts from the community and also from Cassandra maintainers and core committers. There are also several books that have excellent content and use cases.
Don't forget to register for your nearest ConFoo conference and follow us on Twitter for more blog posts.