What is Cassandra?

Apache Cassandra is an open-source distributed database for managing large amounts of data across multiple servers and ensuring no single point of failure. Cassandra offers continuous availability, operational simplicity, and easy data distribution across multiple data centers and cloud availability zones. Its built-for-scale architecture means that it maintains reliable read/write performance with massive amounts of data, expands horizontally with no downtime, and can handle thousands of concurrent users or operations per second.

Sending Cassandra Metrics

Use collectd and the generic-jmx plugin to capture and report Cassandra metrics including throughput requests, errors, latency, compaction activity, and hint activity. SignalFx provides built-in Cassandra monitoring dashboards displaying useful metrics by cluster and by specific node in production.

Cassandra

Cassandra Monitoring

Every node in Cassandra is essentially identical and plays the same role. This means that any node can respond to any request, but it becomes difficult to troubleshoot when issues arise. The key aspects to monitoring Cassandra are resources used by each node, response latencies to requests, requests to offline nodes, and the compaction process.  

Monitoring Load for Each Node: Recovering from running out of capacity is difficult and costly due to the large amount of data stored on each individual Cassandra node. Two key indicators of capacity are CPU load and free disk, and knowing the basic consumption of these resources can help determine where and when resources should be adjusted.

Measuring Latency in Read/Write Requests: Cassandra read and write latencies can have a large impact on application performance, and therefore are important metrics to monitor. An increase in read or write latency may indicate an emerging performance issue.

Handling Requests with Offline Nodes: Nodes in a Cassandra cluster store “hints” about write changes when a node goes offline. After the offline node returns, the other nodes can send the hints to help the node catch up on missed write tasks. Typically, the number of active hints will be zero. A sustained number of hints indicates an issue with one or more nodes.

An Efficient Compaction Process: The compaction process can have a large impact on Cassandra server performance and application performance. The number of pending compaction tasks should be low, and a sustained increase is a sign of emerging performance issues.

The SignalFx Difference

Intelligent Alerts: Applying analytics to alert rules helps to determine whether a change is normal or a threat to performance. Measuring the rate of change of compaction activity gives an  indication to how much capacity will be required for the cluster. SignalFx makes it easy to add a duration condition over a set time period to give additional context in understanding the change.

Monitoring from Cluster to Node:  With SignalFx, monitor from cluster to individual Cassandra node in a single dashboard. Instant visibility gives you both the flexibility to gain a service-wide view of performance and the power to explore individual nodes within the cluster.

Instant Insight: SignalFx provides out-of-the-box insight for Cassandra with built-in dashboards to capture and display the most relevant metrics. Immediately subscribe to relevant alerts for key Cassandra metrics, created by your team or pre-built by SignalFx. Create and change detectors at any time, but having a jumpstart with recommended detectors at the click of a button significantly reduces your time to productivity and customization.

Select Cassandra Metrics

Read Operations
Read Timeouts
Read Unavailables
Write Operations
Write Timeouts
Write Unavailables
Range Slice 
Operations
Range Slice 
Timeouts
Range Slice 
Unavailables
Range Slice Latency
Read Latency
Write Latency
Total Pending 
Hints
Compaction Operations
Waiting to Run
Storage Used for 
Cassandra Data
 

Start Your Cassandra Monitoring Trial

Try SignalFx for 14 days. No credit card required.