# Base Metrics

Contents

* [Base Metrics](/agents-guide/the-hosted-graphite-agent/base-metrics.md)
  * [Load](#load)
    * [CPU utilization](#cpu-utilization)
    * [Load average](#load-average)
    * [Processes](#processes)
  * [Memory](#memory)
    * [Activity](#activity)
    * [Virtual memory](#virtual-memory)
    * [Writeback](#writeback)
  * [Disk](#disk)
    * [iostat](#iostat)
    * [Capacity](#capacity)
  * [Network](#network)
    * [Interfaces](#interfaces)
    * [Sockets](#sockets)

We focus on the default “base” dashboard and also provide notes on related metrics not displayed there.

As Diamond collectors rely heavily on `/proc` data, many of the notes below are from Linux kernel documentation, e.g. [proc.txt](https://www.kernel.org/doc/Documentation/filesystems/proc.txt);

We list metric units - percentage, count, bytes, etc. - in brackets after each metric description.

If you find anything unclear or incorrect here, please [let us know](mailto:help%40hostedgraphite.com)!

### [Load](#load)

#### [CPU utilization](#cpu-utilization)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_cpu.png" alt=""><figcaption><p>User &#x26; system CPU graphs</p></figcaption></figure>

These metrics are found under:

```bash
hg_agent.hostname.cpu.cpuid.*
```

and represent percentages of time each *`cpuid`* spends in particular states.

We display two of the most interesting on the dashboard:

* `user`: normal processes executing in user mode (percentage);
* `system`: processes executing in kernel mode (percentage).

Others you can use in your own graphs or investigations:

* `nice`: niced processes executing in user mode (percentage);
* `idle`: nothing to do (percentage);
* `iowait`: not really reliable - see note in [proc.txt](https://www.kernel.org/doc/Documentation/filesystems/proc.txt);
* `irq`: servicing interrupts (percentage);
* `softirq`: servicing software interrupts (percentage);
* `steal`: executing other virtual hosts (percentage);
* `guest`: running a normal virtual guest (percentage);
* `guest_nice`: running a niced virtual guest (percentage).

#### [Load average](#load-average)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_loadavg.png" alt=""><figcaption><p>Load average graph</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.loadavg.*
```

Load average, [roughly speaking](https://prutser.wordpress.com/2012/05/28/understanding-linux-load-average-part-3/), is the average number of tasks waiting with “something to do” over a period of time:

* `01`: 1-minute load average (count);
* `05`: 5-minute load average (count);
* `15`: 15-minute load average (count).

Since the interpretation of load average is [affected](http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages) by the number of cores a machine has, you might like to use these “normalized” versions in your own graphs or investigations:

* 01\_normalized: 1-minute load average normalized by #cores (count);
* 05\_normalized: 5-minute load average normalized by #cores (count);
* 15\_normalized: 15-minute load average normalized by #cores (count).

#### [Processes](#processes)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_processes.png" alt=""><figcaption><p>Processes graph</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.loadavg.*
```

These are simple “snapshot” counters of the process numbers. Note that the number running will typically be maxed out at #cores.

* processes\_total: total number of processes on the system (count);
* processes\_running: number of processes running (count).

### [Memory](#memory)

#### [Activity](#activity)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_memory.png" alt=""><figcaption><p>Memory and swap graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.memory.*
```

In the “memory activity” graph, we display some of the metrics most relevant to physical memory usage:

* `MemTotal`: total usable ram, i.e. physical ram minus a few reserved bits and the kernel binary code (bytes);
* `MemAvailable`: an estimate of how much memory is available for starting new applications requires kernel 3.14 or later (bytes);
* `Active`: memory used recently, usually not reclaimed unless absolutely necessary (bytes);
* `Cached`: in-memory cache for files read from the disk, i.e. the pagecache (bytes).

And “swap activity” displays:

* `SwapTotal`: the total amount of swap space configured (bytes);
* `SwapFree`: the amount of swap space available for use (bytes).

There are several other metrics available under memory.\*. If you’re digging further, you can find out what they mean in the [docs for /proc/meminfo](https://www.kernel.org/doc/Documentation/filesystems/proc.txt).

#### [Virtual memory](#virtual-memory)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_vmstat.png" alt=""><figcaption><p>vmstat graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.vmstat.*
```

These are metrics from `/proc/vmstat` and give some insight into the activity of the Linux virtual memory system. Unfortunately, the counters are [a little underdocumented](https://access.redhat.com/solutions/1160343).

First, pages in and out:

* `pgpgin`: pages brought in from disk (count);
* `pgpgout`: pages written out to disk (count).

Note that because everything goes through the [page cache](https://en.wikipedia.org/wiki/Page_cache), these are recorded for essentially all pages read from or written to disk, so if you’re doing a lot of IO they’ll be elevated.

Next, [swap usage](https://wiki.archlinux.org/index.php/swap) which generally you want to keep low or nonexistent. See [this article](http://www.linuxjournal.com/article/8178) for more information.

* `pswpin`: pages brought in from swap space (count);
* `pswpout`: pages swapped out into swap space (count).

Finally, [page faults](https://en.wikipedia.org/wiki/Page_fault) made by the virtual memory system to page memory into process address spaces:

* `pgfault`: [minor](https://en.wikipedia.org/wiki/Page_fault#Minor) page faults (count);
* `pgmajfault`: [major](https://en.wikipedia.org/wiki/Page_fault#Major) page faults (count).

Note that page faults will stimulate paging in, so you can expect these to correlate.

#### [Writeback](#writeback)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_writeback.png" alt=""><figcaption><p>Memory writeback graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.memory.*
```

* `Dirty`: memory waiting to be written back to disk (bytes).

When you change disk-backed memory in the page cache, it’s not written to disk immediately, just marked as “dirty”. This graph allows you to see how much is building up & being written back over time.

### [Disk](#disk)

#### [iostat](#iostat)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_diskthru.png" alt=""><figcaption><p>iostat graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.iostat.*
```

These metrics are per-disk, and are gathered from [/proc/diskstats](https://www.kernel.org/doc/Documentation/iostats.txt).

* `iops`: “I/O operations per second”, i.e. `reads` + `writes` (count);
* `write_byte_per_second`: bytes written per second (bytes);
* `read_byte_per_second`: bytes read per second (bytes);
* `util_percentage`: how much of the time the disk is performing I/O operations (percentage).

There are many other `iostat` metrics exported per disk; you can browse your metric tree to see which and compare with [/proc/diskstats](https://www.kernel.org/doc/Documentation/iostats.txt) and [the ‘diskusage’ diamond collector](https://github.com/python-diamond/Diamond/blob/master/src/collectors/diskusage/diskusage.py).

#### [Capacity](#capacity)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_diskavail.png" alt=""><figcaption><p>Disk capacity graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.diskspace.*
```

Again, these metrics are per-disk.

* `byte_avail`: available bytes, i.e. space available for use by non-privileged users (bytes).

Apart from this useful graphed value, there are also some more available to you:

* `byte_free`: available bytes for the superuser (bytes);
* `byte_percentfree`: `byte_free` as a percentage of the total (percentage);
* `byte_used`: bytes used (bytes);
* `inodes_avail`: available inodes for use by non-privileged users (count);
* `inodes_free`: available inodes for the superuser (count);
* `inodes_percentfree`: `inodes_free` as a percentage of the total (percentage);
* `inodes_used`: inodes used (count).

### [Network](#network)

#### [Interfaces](#interfaces)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_interfaces.png" alt=""><figcaption><p>Network interface graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.network.*
```

These metrics are per-interface. We graph the following:

* `tx_packets`, `rx_packets`: packets transmitted, received (count);
* `tx_byte`, `rx_byte`: bytes transmitted, received (bytes);
* `tx_drop`, `rx_drop`: packets dropped by the driver on transmit, receive (count).

There are many other network metrics exported per interface; you can browse your metric tree to see which and compare with [/proc/net/dev](http://www.onlamp.com/pub/a/linux/2000/11/16/LinuxAdmin.html), which is fairly self-explanatory, and [the ‘network’ diamond collector](https://github.com/python-diamond/Diamond/blob/master/src/collectors/network/network.py).

#### [Sockets](#sockets)

<figure><img src="https://www.hostedgraphite.com/docs/_images/base_sockets.png" alt=""><figcaption><p>Socket graphs</p></figcaption></figure>

These metrics are found under:

```
hg_agent.hostname.sockets.*
```

They’re drawn from /proc/net/sockstat, which is under-documented.

* `used`: total number of sockets [in kernel socket lists](http://elixir.free-electrons.com/linux/latest/source/net/socket.c#L169) (count);
* `tcp_inuse`: TCP sockets currently in use (count);
* `udp_inuse`: UDP sockets currently in use (count).

Others you can use in your own graphs or investigations:

* `tcp_mem`: the number of [pages](http://blog.tsunanet.net/2011/03/out-of-socket-memory.html) in use for TCP (count);
* `udp_mem`: the same for UDP (count);
* `tcp_alloc`: number of sockets allocated for TCP (count);
* `tcp_orphan`: sockets [not associated to file descriptors](http://blog.tsunanet.net/2011/03/out-of-socket-memory.html) (count);
* `tcp_tw`: sockets in `TIME_WAIT`, i.e. waiting after closing to handle packets still in the network (count).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.hostedgraphite.com/agents-guide/the-hosted-graphite-agent/base-metrics.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
