.. _cluster_metrics:

Cluster performance metrics
===========================
Cluster metrics are aggregated across all nodes in the cluster. Cluster metrics
are a good way to monitor cluster performance at a high level. OpsCenter tracks
a number of cluster-wide metrics for read performance, write performance,
memory and capacity.

Watching for variations in cluster performance can signal potential performance
issues that may require further investigation. For general performance
monitoring, watching for spikes in read and write latency, along with an
accumulation of pending operations can signal issues that may require further
investigation. Drilling down on high-demand column families can further
pinpoint the source of performance issues with your application.

.. _write_requests:

Write Requests
--------------
The number of write requests per second. Monitoring the number of
requests over a given time period can give you and idea of system write
workload and usage patterns.

.. _write_request_latency:
    
Write Request Latency
---------------------
The response time (in milliseconds) for successful write requests. The time
period starts when a node receives a client write request, and ends when the
node responds back to the client. Optimal or acceptable levels of write latency
vary widely according to your hardware, your network, and the nature of your
write load. For example, the performance for a write load consisting largely of
granular data at low consistency levels would be evaluated differently from a
load of large strings written at high consistency levels.

.. _read_requests:
    
Read Requests
-------------
The number of read requests per second. Monitoring the number of
requests over a given time period can give you and idea of system read workload
and usage patterns.

.. _read_request_latency:
    
Read Request Latency
--------------------
The response time (in milliseconds) for successful read requests. The time
period starts when a node receives a client read request, and ends when the
node responds back to the client. Optimal or acceptable levels of read latency
vary widely according to your hardware, your network, and the nature of your
application read patterns. For example, the use of secondary indexes, the size
of the data being requested, and the consistency level required by the client
can all impact read latency. An increase in read latency can signal I/O
contention. Reads can slow down when rows are fragmented across many SSTables
and compaction cannot keep up with the write load.

.. _jvm_memory_usage:
    
Cassandra JVM Memory Usage
--------------------------
The average amount of Java heap memory (in megabytes) being used by
Cassandra processes. Cassandra opens the JVM with a heap size that is half of
available system memory by default, which still allows an optimal amount of
memory remaining for the OS disk cache. You may need to increase the amount of
heap memory if you have increased column family memtable or cache sizes and are
getting out-of-memory errors. If you monitor Cassandra Java processes with an
OS tool such as top, you may notice the total amount of memory in use exceeds
the maximum amount specified for the Java heap. This is because Java allocates
memory for other things besides the heap. It is not unusual for the total
memory consumption of the JVM to exceed the maximum value of heap memory.

.. _jvm-collection-count:

JVM CMS Collection Count
-------------------------
The number of concurrent mark-sweep (CMS) garbage collections performed by the JVM per second. These are large, resource-intensive collections. Typically, the collections occur every 5 to 30 seconds.

.. _jvm-collection-time:

JVM CMS Collection Time
-------------------------
The time spent collecting CMS garbage in milliseconds per second (ms/sec). 

.. note:: 
   A ms/sec unit defines the number of milliseconds for garbage collection for each second that passes. For example, the percentage of time spent on garbage collection in one millisecond (.001 sec) is 0.1%.

.. _jvm-parnew-collection-count:

JVM ParNew Collection Count
----------------------------
The number of parallel new-generation garbage collections performed by the JVM per second. These are small and not resource intensive. Normally, these collections occur several times per second under load.

.. _jvm-parnew-collection-time:

JVM ParNew Collection Time
---------------------------
The time spent performing ParNew garbage collections in ms/sec. The rest of the JVM is paused during ParNew garbage collection. A serious performance hit can result from spending a significant fraction of time on ParNew collections.

.. _data_size:
    
Data Size
---------
The size of column family data (in gigabytes) that has been
loaded/inserted into Cassandra, including any storage overhead and system
metadata. DataStax recommends that data size not exceed 70 percent of total
disk capacity to allow free space for maintenance operations such as compaction
and repair.

.. _total_bytes_compacted:

Total Bytes Compacted
----------------------
The number of sstable data compacted in bytes per second.

.. _total_compactions:

Total Compactions
-------------------
The number of compactions (minor or major) performed per second.