.. _cf_metrics:

Column family performance metrics
=================================
Column family metrics allow you to drill down and locate specific areas of your
application workloads that are the source of performance issues. If you notice
a performance trend at the OS or cluster level, viewing column family metrics
can provide a more granular level of detail.

The metrics for KeyCache Hits, RowCache Hits and SSTable Size can only be
viewed on a single column family at a time. Otherwise, all column family
metrics are available for specific column families as well as for all column
families on a node.

In addition to monitoring read latency, write latency and load on a column
family, you should also monitor the hit rates on the key and row caches for
column families that rely on caching for performance. The more requests that
are served from the cache, the better response times will be.

OpsCenter 3.0 and later has been optimized to handle thousands of column 
families efficiently. If a column family experiences a dramatic dip in 
performance, check the Pending Tasks metric for a back-up in queued operations.  

Viewing SSTable Size and SSTable Count for a specific column family (or counts
for all families) can help with compaction tuning.

.. _cf_local_writes:

CF: Local Writes
----------------
The write load on a column family measured in requests per second. This
metric includes all writes to a given column family, including write requests
forwarded from other nodes. This metric can be useful for tracking usage
patterns of your application.

.. _cf_local_write_latency:
    
CF: Local Write Latency
-----------------------
The response time in milliseconds for successful write requests on a
column family. The time period starts when nodes receive a write request, and
ends when nodes respond. Optimal or acceptable levels of write latency vary
widely according to your hardware, your network, and the nature of your write
load. For example, the performance for a write load consisting largely of
granular data at low consistency levels would be evaluated differently from a
load of large strings written at high consistency levels.

.. _cf_local_reads:
    
CF: Local Reads
---------------
The read load on a column family measured in requests per second. This
metric includes all reads to a given column family, including read requests
forwarded from other nodes. This metric can be useful for tracking usage
patterns of your application.

.. _cf_local_read_latency:
    
CF: Local Read Latency
----------------------
The response time in milliseconds for successful reads on a column
family. The time period starts when a node receives a read request, and ends
when the node responds. Optimal or acceptable levels of read latency vary
widely according to your hardware, your network, and the nature of your
application read patterns. For example, the use of secondary indexes, the size
of the data being requested, and the consistency level required by the client
can all impact read latency. An increase in read latency can signal I/O
contention. Reads can slow down when rows are fragmented across many SSTables
and compaction cannot keep up with the write load.

.. _cf_keycache_requests:
    
CF: KeyCache Requests
---------------------
The total number of read requests on the row key cache.

.. _cf_keycache_hits:
    
CF: KeyCache Hits
-----------------
The number of read requests that resulted in the requested row key being found
in the key cache.

.. _cf_keycache_hit_rate:
    
CF: KeyCache Hit Rate
---------------------
The percentage of cache requests that resulted in a cache hit that indicates the effectiveness of the key cache for a given
column family. The key cache is used to find the exact location of a row on disk.
If a row is not in the key cache, a read operation will populate the key cache
after accessing the row on disk so subsequent reads of the row can benefit.
Each hit on a key cache can save one disk seek per SSTable. If the hits line
tracks close to the requests line, the column family is benefiting from
caching. If the hits fall far below the request rate, this suggests that you
could take actions to improve the performance benefit provided by the key
cache, such as adjusting the number of keys cached.

.. _cf_rowcache_requests:
    
CF: RowCache Requests
---------------------
The total number of read requests on the row cache. This metric is only
meaningful for column families with row caching configured (it is not enabled
by default).

.. _cf_rowcache_hits:
    
CF: RowCache Hits
-----------------
The number of read requests that resulted in the read being satisfied from the
row cache. This metric is only meaningful for column families with row caching
configured (it is not enabled by default).

.. _cf_rowcache_hit_rate:
    
CF: Row Cache Hit Rate
----------------------
The percentage of cache requests that resulted in a cache hit that indicates the effectiveness of the row cache for a given
column family. This metric is only meaningful for column families with row caching
configured (it is not enabled by default). The graph tracks the number of read
requests in relationship to the number of row cache hits. If the hits line
tracks close to the requests line, the column family is benefiting from
caching. If the hits fall far below the request rate, this suggests that you
could take actions to improve the performance benefit provided by the row
cache, such as adjusting the number of rows cached or modifying your data model
to isolate high-demand rows.

.. _cf_sstable_size:
    
CF: SSTable Size
----------------
The current size of the SSTables for a column family. It is expected
that SSTable size will grow over time with your write load, as compaction
processes continue doubling the size of SSTables. Using this metric together
with SSTable count, you can monitor the current state of compaction for a given
column family. Viewing these patterns can be helpful if you are considering
reconfiguring compaction settings to mitigate I/O contention.

.. _cf_sstable_count:
    
CF: SSTable Count
-----------------
The current number of SSTables for a column family. When column family
memtables are persisted to disk as SSTables, this metric increases to the
configured maximum before the compaction cycle is repeated. Using this metric
together with SSTable size, you can monitor the current state of compaction for
a given column family. Viewing these patterns can be helpful if you are
considering reconfiguring compaction settings to mitigate I/O contention.

.. _cf_pending_ops:
    
CF: Pending Reads/Writes
------------------------
The number of pending reads and writes on a column family. Pending
operations are an indication that Cassandra is not keeping up with the
workload. A value of zero indicates healthy throughput. If out-of-memory events
become an issue in your Cassandra cluster, it may help to check cluster-wide
pending tasks for operations that may be clogging throughput.

Bloom filters are used to avoid going to disk to try to read rows that don't actual exist.

.. _cf_bloom_filter_space_used:

CF: Bloom Filter Space Used
------------------------------
The size of the bloom filter files on disk. This grows based on the number of rows in a column family and is tunable through the per-CF attribute, ``bloom_filter_fp_chance``; increasing the value of this attribute shrinks the bloom filters at the expense of a higher number of false positives. Cassandra reads the bloom filter files and stores them on the heap, so large bloom filters can be expensive in terms of memory consumption.

.. note:: Bloom filters are used to avoid going to disk to try to read rows that don't actual exist.

.. _cf_bloom_filter_false_positives:

CF: Bloom Filter False Positives
-----------------------------------
The number of false positives, which occur when the bloom filter said the row existed, but it actually did not exist in absolute numbers.

.. _cf_bloom_filter_false_positive_ratio:

CF: Bloom Filter False Positive Ratio
---------------------------------------
The fraction of all bloom filter checks resulting in a false positive. This should normally be at or below .01. A higher reading indicates that the bloom filter is likely too small.
