October 28, 2016
Cassandra Monitoring
Choice of Tool
New Relic
http://newrelic.com/plugins/3legs/113
Monitor Cassandra statistics using the 3legs plugin. Metrics include Read and Write latency (global and per host), Cache statistics, Pending compactions, flushes and more.
Datastax – OpsCenter
http://www.datastax.com/documentation/opscenter/4.1/pdf/opscuserguide41.pdf
DataStax OpsCenter is a visual management and monitoring solution for Apache Cassandra and DataStax Enterprise.
The DataStax agents are installed on the Real-time (Cassandra), Analytics (Hadoop), and Search (Solr) nodes. They use Java Management Extensions (JMX) to monitor and manage each node. Cassandra exposes a number of statistics and management operations through JMX. Using JMX, OpsCenter obtains metrics from a cluster and issues various node administration commands, such as flushing SSTables or doing a repair. It monitors a variety of metrics at the system and cassandra cluster level including performance and column family metrics.
JConsole
JConsole is a graphical monitoring tool to monitor Java Virtual Machine (JVM) and Java applications both on a local or remote machine.
JConsole uses underlying features of Java Virtual Machine to provide information on performance and resource consumption of applications running on the Java platform using Java Management Extensions (JMX) technology.
JConsole comes as part of Java Development Kit (JDK) and the graphical console can be started using “jconsole” command.
Nagios
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=3819&cf_id=24
For Cassandra, foll. metrics are monitored:
Storage Load – The amount of data stored on each Cassandra node
Pending Tasks – The number of basic task stages waiting to run
Active Tasks – The number of basic task stages currently running
Blocked Tasks – The number of basic task stages blocked from running
Pending Internal Tasks – The number of internal task stages waiting to run
Active Internal Tasks – The number of internal task stages currently running
And foll. JVM Metrics
Active JVM Threads
JVM Heap memory usage
JVM Non-Heap memory usage
Monitoring through Datastax OpsCenter
Installing OpsCenter Console 4.1
Configure /etc/yum.repos.d/datastax.repo
sudo yum install opscenter
service opscenterd start
Connect to the console using http://:8888 Add an existing Cassandra cluster from the Console, by clicking on “Add Cluster” and adding the IP of one of the Cassandra Cluster Nodes (JMX Port:7199, Thrift Port:9160)
All the nodes in the cluster are now identified.
/etc/opscenter/opscenterd.conf is the configuration file
Installing OpsCenter Agent
Add the DataStax Yum repository in the /etc/yum.repos.d/datastax.repo file
yum install datastax-agent
In address.yaml set stomp_interface to the IP address that OpsCenter is using.
echo "stomp_interface: 10.132.56.93"|tee -a /var/lib/datastax-agent/conf/address.yaml
If SSL communication is enabled in /etc/opscenter/opscenterd.conf , use SSL in address.yaml.
echo "use_ssl: 1" | sudo tee -a /var/lib/datastax-agent/conf/address.yaml
Start the DataStax agent.
service datastax-agent start
Perform above step on each node in the cluster
Configuring Access
python /usr/share/opscenter/bin/set_passwd.py <username> admin
service opscenterd restart
Enabling HTTPS
vi /etc/opscenter/opscenterd.conf
and uncomment the ssl lines
Restart Opscenter daemon
Access console with https://:8443/opscenter/index.html
HA Setup for OpsCenter
Configure VIP and attach it to Primary Opscenter
On Standy Opscenter, copy all the necessary congfis.
On agent side, configure communication to VIP
#Metrics and Alerts Configuration
All configured alerts in the console are stored in OpsCenter Keyspace, in column family Settings, with rowkey global-cluster-alert-rules and each column will represent a metric alert
We need to identify the metrics and thresholds we want to monitor.
We are not using the mail alerting feature, instead we are using the Web URL alerting feature to send mails. This is because the mail alerting feature is not configurable to send mails to different DLs based on the threshold and also customize the mail subject. (The subject will always be static, whereas we want it to have format like alert_level:hostname – alert name
Configure Apache Web Server
yum install
httpd /sbin/service httpd start
Configuring PHP for Apache
yum install php --> to install the libphp5 module?
vi /etc/httpd/conf/httpd.conf --> and add below line
LoadModule php5_module modules/libphp5.so
Configuring Sendmail
yum install sendmail
sendmail.cf and submit.cf
Configuring the Alerting Script
cat /etc/opscenter/event-plugins/posturl.conf
[posturl]
enabled=1 url=http://<hostname>/postOPSCevents.php # levels can be comma delimited list of any of the following: # DEBUG,INFO,WARN,ERROR,CRITICAL,ALERT # If left empty, will listen for all levels levels=WARN,ERROR,CRITICAL,ALERT
This script will send warning to Warning mail_id and critical alerts to Critical mail_id.
Monitoring through New Relic
https://github.com/threelegs/newrelic-plugins
Download the plugin from above link onto one of the nodes in the cassandra cluster. Else you can download from New Relic Console, under Plugins Section.
ls /tmp/newrelic_3legs_plugin-0.0.2-cassandra.tar.gz
/tmp/newrelic_3legs_plugin-0.0.2-cassandra.tar.gz
cd /root/ gunzip newrelic_3legs_plugin-0.0.2-cassandra.tar.gz
cd config
Configure activation.conf, application.conf, newrelic.properties, /etc/init.d/newrelic_3legs
/sbin/service newrelic_3legs start
Now on New Relic Console, you could see Cassandra on the left hand side clicking which you could see the cluster PCS-Cassandra we configured.
Even with one host agent running, it was showing metrics for all hosts
Click on the Settings icon and configure thresholds for Read Latency, Write Latency and Down Hosts.