Apache Cassandra – the Highly Scalable NoSQL Database

03-24-2017-Cassandra

The NoSQL database movement happened because traditional relational databases simply don’t work as well in the highly distributed environments typical of today’s Web infrastructure. We recently covered the NoSQL graph database, Neo4j, here at the blog. It serves the needs of those looking to find relationships between records within huge Big Data stores.

This time out, we train our eye towards Apache Cassandra. Leveraging a key-value storage model, Cassandra offers high scalability and latency across widely distributed data centers. Read further to see if this NoSQL database makes sense for your organization’s data management needs.

The Genesis of Cassandra

Cassandra began as an internal project at Facebook. It actually powered the auto-complete functionality in the social network’s search box. Facebook released the project into the open source community in 2008. It became an Apache Software Foundation top level project in 2010 after two years in the incubator.

The latest release of Cassandra – 3.10 – became available in February of this year. As a fully open source database, it is downloadable for free. A free application with cross-platform support for most popular operating systems makes it worth checking out on a pilot project at your organization. Driver support exists for many current programming languages, like Java (using JDBC), Python, Node.js, Go, and C++.

Enterprises looking for a commercial NoSQL solution built upon Cassandra need to check out DataStax’s offerings. That company is known as the leading commercial provider of support for the database.

Cassandra’s Features and Functionality

Highly scalable distributed performance is Cassandra’s major calling card. DataStax provides a white paper comparing third-party benchmarks of a few of the most popular NoSQL databases (MongoDB, Couchbase, Hive), which revealed Cassandra as the top performer by a wide margin. Fault tolerance and replication are also seamlessly handled across a multitude of data centers – an important feature considering the modern global business landscape.

Impressive scalability also distinguishes Cassandra from similar NoSQL database products. Many enterprise users of the database boast massive production deployments, highlighted by Apple’s 10 petabytes of data spread over 75,000 nodes. Netflix also stores 420 terabytes of data across 2,500 nodes. Needless to say, Cassandra has rapidly become the database of choice for these enormous chunks of Big Data.

The database’s architecture provides no single points of failure; ensuring access to the data isn’t hampered by large amounts of network traffic. Since every node is identical, an entire data center can go offline without any loss of data. This kind of durability makes Cassandra very attractive to businesses with mission-critical applications – built-in support for multiple data centers is another plus.

Adding new servers to a deployment is also a breeze, according to DataStax’s lead Cassandra evangelist, Patrick McFadin. “You simply boot up a new machine and tell Cassandra where the other nodes are and it takes care of the rest,” said McFadin.

Superior horizontal scalability combined with ease of administration make Cassandra a worthy option for businesses looking to embrace NoSQL for their modern database needs. Its driver support for most popular languages lets developers come up to speed quickly. This is one open source database worth checking out.

Stay tuned to the Betica Blog for additional dispatches from the wide world of software development. As always – thanks for reading!