.. _pending_metrics:

Pending task metrics
====================
Pending task metrics track requests that have been received by a node, but are
waiting to be processed. An accumulation of pending tasks on a node can
indicate a potential bottleneck in performance and should be investigated.

Cassandra maintains distinct thread pools for different stages of execution.
Each of these thread pools provide granular statistics on the number of pending
tasks for that particular process. If you see pending tasks accumulating, it is
indicative of a cluster that is not keeping up with the workload. Essentially,
pending tasks mean that things are backing up, which is usually caused by a
lack of (or failure of) cluster resources such as disk bandwidth, network
bandwidth or memory.

Pending Task Metrics for Writes
-------------------------------
Pending tasks for the following metrics indicate that write requests are
arriving faster than they can be handled.

.. _flushes_pending:

Flushes Pending
^^^^^^^^^^^^^^^
The flush process flushes memtables to disk as SSTables. This metric shows the
number of memtables queued for the flush process. The optimal number of pending
flushes is 0 (or at most a very small number). A value greater than 0 indicates
either I/O contention or degrading disk performance (see disk metrics such as
disk latency, disk throughput, and disk utilization for indications of disk
health).

.. _flush_sorter_tasks_pending:
    
Flush Sorter Tasks Pending
^^^^^^^^^^^^^^^^^^^^^^^^^^
The flush sorter process performs the first step in the overall process of
flushing memtables to disk as SSTables.

.. _memtable_post_flushers_pending:
    
Memtable Post Flushers Pending
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The memtable post flush process performs the final step in the overall process
of flushing memtables to disk as SSTables.

.. _write_operations_pending:
    
Write Requests Pending
^^^^^^^^^^^^^^^^^^^^^^^^
The number of write requests that have arrived into the cluster but are
waiting to be handled. During low or moderate write load, you should see 0
pending write operations (or at most a very low number). A continuous high
number of pending writes signals a need for more capacity in your cluster or to
investigate disk I/O contention.

.. _replicate_on_write_pending:
    
Replicate on Write Tasks Pending
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When an insert or update to a row is written, the affected row is replicated to
all other nodes that manage a replica for that row. This is called the
ReplicateOnWriteStage. This metric tracks the pending tasks related to this
stage of the write process. During low or moderate write load, you should see 0
pending replicate on write tasks (or at most a very low number). A continuous
high number signals a need to investigate disk I/O or network contention
problems.

Pending Task Metrics for Reads
------------------------------
Pending tasks for the following metrics indicate I/O contention, and can
manifest in degrading read performance.

.. _reads_pending:

Read Requests Pending
^^^^^^^^^^^^^^^^^^^^^^^
The number of read requests that have arrived into the cluster but are
waiting to be handled. During low or moderate read load, you should see 0
pending read operations (or at most a very low number). A continuous high
number of pending reads signals a need for more capacity in your cluster or to
investigate disk I/O contention. Pending reads can also indicate an application
design that is not accessing data in the most efficient way possible.

.. _read_repair_pending:
    
Read Repair Tasks Pending
^^^^^^^^^^^^^^^^^^^^^^^^^
The number of read repair operations that are queued and waiting for
system resources in order to run. The optimal number of pending read repairs is
0 (or at most a very small number). A value greater than 0 indicates that read
repair operations are in I/O contention with other operations. If this graph
shows high values for pending tasks, this may suggest the need to run a node
repair to make nodes consistent. Or, for column families where your
requirements can tolerate a certain degree of stale data, you can lower the
value of the column family parameter read_repair_chance.

.. _compactions_pending:
    
Compactions Pending
^^^^^^^^^^^^^^^^^^^
An upper bound of the number of compactions that are queued and waiting for system resources in order to run. This is a worst-case estimate. The compactions pending metric is often misleading. An unrealistic, high reading often occurs. The optimal number of pending compactions is
0 (or at most a very small number). A value greater than 0 indicates that read
operations are in I/O contention with compaction operations, which usually
manifests itself as declining read performance. This is usually caused by
applications that perform frequent small writes in combination with a steady
stream of reads. If a node or cluster frequently displays pending compactions,
that is an indicator that you may need to increase I/O capacity by adding nodes
to the cluster. You can also try to reduce I/O contention by reducing the
number of insert/update requests (have your application batch writes for
example), or reduce the number of SSTables created by increasing the memtable
size and flush frequency on your column families.


Pending Task Metrics for Cluster Operations
-------------------------------------------
Pending tasks for the following metrics indicate a backup of cluster
operational processes such as those maintaining node consistency, system
schemas, fault detection, and inter-node communications. Pending tasks for
resource-intensive operations (such as repair, bootstrap or decommission) are
normal and expected while that operation is in progress, but should continue
decreasing at a steady rate in a healthy cluster.

.. _repair_tasks_pending:

Manual Repair Tasks Pending
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The number of operations still to be completed when you run
anti-entropy repair on a node. It will only show values greater than 0 when a
repair is in progress. Repair is a resource-intensive operation that is
executed in stages: comparing data between replicas, sending changed rows to
the replicas that need to be made consistent, deleting expired tombstones, and
rebuilding row indexes and bloom filters. Tracking the state of this metric can
help you determine the progress of a repair operation. It is not unusual to see
a large number of pending tasks when a repair is running, but you should see
the number of tasks progressively decreasing.

.. _gossip_pending:
    
Gossip Tasks Pending
^^^^^^^^^^^^^^^^^^^^
Cassandra uses a protocol called gossip to discover location and state
information about the other nodes participating in a Cassandra cluster. In
Cassandra, the gossip process runs once per second on each node and exchanges
state messages with up to three other nodes in the cluster. Gossip tasks
pending shows the number of gossip messages and acknowledgments queued and
waiting to be sent or received. The optimal number of pending gossip tasks is 0
(or at most a very small number). A value greater than 0 indicates possible
network problems (see network traffic for indications of network health).
    
.. _hinted_handoff_pending:

Hinted Handoff Pending
^^^^^^^^^^^^^^^^^^^^^^
While a node is offline, other nodes in the cluster will save hints about rows
that were updated during the time the node was unavailable. When a node comes
back online, its corresponding replicas will begin streaming the missed writes
to the node to catch it up. The hinted handoff pending metric tracks the number
of hints that are queued and waiting to be delivered once a failed node is back
online again. High numbers of pending hints are commonly seen when a node is
brought back online after some down time. Viewing this metric can help you
determine when the recovering node has been made consistent again. Hinted
handoff is an optional feature of Cassandra. Hints are saved for a configurable
period of time (an hour by default) before they are dropped. This prevents a
large accumulation of hints caused by extended node outages.

.. _internal_responses_pending:
    
Internal Responses Pending
^^^^^^^^^^^^^^^^^^^^^^^^^^
The number of pending tasks from various internal tasks such as nodes
joining and leaving the cluster.
    
.. _migrations_pending:

Migrations Pending
^^^^^^^^^^^^^^^^^^
The number of pending tasks from system methods that have modified the
schema. Schema updates have to be propagated to all nodes, so pending tasks for
this metric can manifest in schema disagreement errors.

.. _misc_pending:    

Misc. Tasks Pending
^^^^^^^^^^^^^^^^^^^
The number of pending tasks from other miscellaneous operations that
are not ran frequently.

.. _request_responses_pending:
    
Request Responses Pending
^^^^^^^^^^^^^^^^^^^^^^^^^
The progress of rows of data being streamed from the *receiving* node. Streaming of data between nodes happens during operations such as bootstrap and
decommission when one node sends large numbers of rows to another node. 

.. _streams_pending:
    
Streams Pending
^^^^^^^^^^^^^^^
The progress of rows of data being streamed from the *sending* node. Streaming of data between nodes happens during operations such as bootstrap and
decommission when one node sends large numbers of rows to another node. 

