Hosted Graphite Docs
Get StartedBook a Demo
  • Welcome to Hosted Graphite
  • Getting Started
  • HG-CLI
  • Sending Metrics
    • Supported Protocols
    • Graphite Tag Support
    • Metric Management
    • Metric APIs
  • Language Guide
    • Metric Libraries
    • .NET
    • Go
    • Java
    • Javascript
    • Node.js
    • PHP
    • Postman
    • Python 2.x
    • Python 3.x
    • Python Pickle
    • Ruby
    • Shell
    • TypeScript
  • Dashboard and Graphs
    • Primary Dashboards
    • Dashboard Library
    • Local Dashboard Integration
    • Worldmap Panel
    • Graphite Dashboard Guide
    • Graphite Graph Menu Reference
    • Other Dashboard Options
  • Alerting Guide
    • Alerting Overview
    • Alerts API
    • Notification Channels API
    • Scheduled Mutes API
    • Using Your Own Alerting
  • Agents Guide
    • The Hosted Graphite Agent
      • Base Metrics
      • System Layout
    • Telegraf
    • K8 Telegraf Daemon
    • OpenTelemetry
    • collectd Agent
    • StatsD Agent
    • Diamond
  • Add-Ons and Integrations Guide
    • AWS CloudWatch
    • Azure Monitor Metrics
    • GCP Metrics
    • Carbon-C-Relay
    • Circle CI
    • Cloudbees
    • Collectd Add Ons
    • GitHub
    • GitLab
    • Heroku
    • Hosted StatsD
    • New Relic
    • Papertrail
    • Pingdom
    • Sentry
    • Sitespeed
    • StatsD Add-on
    • Statuspage
  • Account Management
    • Access Keys
    • Account Diagnostics
    • Account Settings
    • Team Access: Limited Access Groups
    • SAML Authentication
    • Team Access
  • Additional Features
    • Aggregation Rules
    • Data Views
  • API Guides
    • Metrics API
    • Tag API
    • Graphite Render API
    • Render Variables API
    • Dashboard API
    • Annotations and Events API
    • Aggregation Rules API
    • Alerts APIs
  • FAQ
    • General
    • Business
    • Technical
    • Account Metrics and Limiting
    • Customization
    • Troubleshooting
    • Support
    • Changelog
Powered by GitBook
On this page
  • Load
  • Memory
  • Disk
  • Network

Was this helpful?

  1. Agents Guide
  2. The Hosted Graphite Agent

Base Metrics

This document describes the base system metrics exported by the Hosted Graphite agent.

PreviousThe Hosted Graphite AgentNextSystem Layout

Last updated 1 year ago

Was this helpful?

Contents

We focus on the default “base” dashboard and also provide notes on related metrics not displayed there.

We list metric units - percentage, count, bytes, etc. - in brackets after each metric description.

These metrics are found under:

hg_agent.hostname.cpu.cpuid.*

and represent percentages of time each cpuid spends in particular states.

We display two of the most interesting on the dashboard:

  • user: normal processes executing in user mode (percentage);

  • system: processes executing in kernel mode (percentage).

Others you can use in your own graphs or investigations:

  • nice: niced processes executing in user mode (percentage);

  • idle: nothing to do (percentage);

  • irq: servicing interrupts (percentage);

  • softirq: servicing software interrupts (percentage);

  • steal: executing other virtual hosts (percentage);

  • guest: running a normal virtual guest (percentage);

  • guest_nice: running a niced virtual guest (percentage).

These metrics are found under:

hg_agent.hostname.loadavg.*
  • 01: 1-minute load average (count);

  • 05: 5-minute load average (count);

  • 15: 15-minute load average (count).

  • 01_normalized: 1-minute load average normalized by #cores (count);

  • 05_normalized: 5-minute load average normalized by #cores (count);

  • 15_normalized: 15-minute load average normalized by #cores (count).

These metrics are found under:

hg_agent.hostname.loadavg.*

These are simple “snapshot” counters of the process numbers. Note that the number running will typically be maxed out at #cores.

  • processes_total: total number of processes on the system (count);

  • processes_running: number of processes running (count).

These metrics are found under:

hg_agent.hostname.memory.*

In the “memory activity” graph, we display some of the metrics most relevant to physical memory usage:

  • MemTotal: total usable ram, i.e. physical ram minus a few reserved bits and the kernel binary code (bytes);

  • MemAvailable: an estimate of how much memory is available for starting new applications requires kernel 3.14 or later (bytes);

  • Active: memory used recently, usually not reclaimed unless absolutely necessary (bytes);

  • Cached: in-memory cache for files read from the disk, i.e. the pagecache (bytes).

And “swap activity” displays:

  • SwapTotal: the total amount of swap space configured (bytes);

  • SwapFree: the amount of swap space available for use (bytes).

These metrics are found under:

hg_agent.hostname.vmstat.*

First, pages in and out:

  • pgpgin: pages brought in from disk (count);

  • pgpgout: pages written out to disk (count).

  • pswpin: pages brought in from swap space (count);

  • pswpout: pages swapped out into swap space (count).

Note that page faults will stimulate paging in, so you can expect these to correlate.

These metrics are found under:

hg_agent.hostname.memory.*
  • Dirty: memory waiting to be written back to disk (bytes).

When you change disk-backed memory in the page cache, it’s not written to disk immediately, just marked as “dirty”. This graph allows you to see how much is building up & being written back over time.

These metrics are found under:

hg_agent.hostname.iostat.*
  • iops: “I/O operations per second”, i.e. reads + writes (count);

  • write_byte_per_second: bytes written per second (bytes);

  • read_byte_per_second: bytes read per second (bytes);

  • util_percentage: how much of the time the disk is performing I/O operations (percentage).

These metrics are found under:

hg_agent.hostname.diskspace.*

Again, these metrics are per-disk.

  • byte_avail: available bytes, i.e. space available for use by non-privileged users (bytes).

Apart from this useful graphed value, there are also some more available to you:

  • byte_free: available bytes for the superuser (bytes);

  • byte_percentfree: byte_free as a percentage of the total (percentage);

  • byte_used: bytes used (bytes);

  • inodes_avail: available inodes for use by non-privileged users (count);

  • inodes_free: available inodes for the superuser (count);

  • inodes_percentfree: inodes_free as a percentage of the total (percentage);

  • inodes_used: inodes used (count).

These metrics are found under:

hg_agent.hostname.network.*

These metrics are per-interface. We graph the following:

  • tx_packets, rx_packets: packets transmitted, received (count);

  • tx_byte, rx_byte: bytes transmitted, received (bytes);

  • tx_drop, rx_drop: packets dropped by the driver on transmit, receive (count).

These metrics are found under:

hg_agent.hostname.sockets.*

They’re drawn from /proc/net/sockstat, which is under-documented.

  • tcp_inuse: TCP sockets currently in use (count);

  • udp_inuse: UDP sockets currently in use (count).

Others you can use in your own graphs or investigations:

  • udp_mem: the same for UDP (count);

  • tcp_alloc: number of sockets allocated for TCP (count);

  • tcp_tw: sockets in TIME_WAIT, i.e. waiting after closing to handle packets still in the network (count).

As Diamond collectors rely heavily on /proc data, many of the notes below are from Linux kernel documentation, e.g. ;

If you find anything unclear or incorrect here, please !

User & system CPU graphs

iowait: not really reliable - see note in ;

Load average graph

Load average, , is the average number of tasks waiting with “something to do” over a period of time:

Since the interpretation of load average is by the number of cores a machine has, you might like to use these “normalized” versions in your own graphs or investigations:

Processes graph

Memory and swap graphs

There are several other metrics available under memory.*. If you’re digging further, you can find out what they mean in the .

vmstat graphs

These are metrics from /proc/vmstat and give some insight into the activity of the Linux virtual memory system. Unfortunately, the counters are .

Note that because everything goes through the , these are recorded for essentially all pages read from or written to disk, so if you’re doing a lot of IO they’ll be elevated.

Next, which generally you want to keep low or nonexistent. See for more information.

Finally, made by the virtual memory system to page memory into process address spaces:

pgfault: page faults (count);

pgmajfault: page faults (count).

Memory writeback graphs

iostat graphs

These metrics are per-disk, and are gathered from .

There are many other iostat metrics exported per disk; you can browse your metric tree to see which and compare with and .

Disk capacity graphs

Network interface graphs

There are many other network metrics exported per interface; you can browse your metric tree to see which and compare with , which is fairly self-explanatory, and .

Socket graphs

used: total number of sockets (count);

tcp_mem: the number of in use for TCP (count);

tcp_orphan: sockets (count);

proc.txt
let us know
proc.txt
roughly speaking
affected
docs for /proc/meminfo
a little underdocumented
page cache
swap usage
this article
page faults
minor
major
/proc/diskstats
/proc/diskstats
the ‘diskusage’ diamond collector
/proc/net/dev
the ‘network’ diamond collector
in kernel socket lists
pages
not associated to file descriptors
Base Metrics
Load
CPU utilization
Load average
Processes
Memory
Activity
Virtual memory
Writeback
Disk
iostat
Capacity
Network
Interfaces
Sockets
Load
CPU utilization
Load average
Processes
Memory
Activity
Virtual memory
Writeback
Disk
iostat
Capacity
Network
Interfaces
Sockets