Account Diagnostics

Visualize traffic statistics within your account by navigating to the “Account Diagnostics” submenu from the “Metrics” tab.

Contents

In the graph above, the orange line is the live metric limit - the number of metrics you can update in a rolling 5-minute period. One metric name might look like: my.server.cpu.load. In this example, the limit is set at 1,500,000 meaning that up to 1.5M metric names can be sent concurrently.

The green line is the number of live metrics incoming for the account, on this graph it fluctuates between around 275K and 350K live metrics. When we see more than the limit sent at the same time, some metrics will be dropped. Note: Live metrics can also be referred to as 'concurrent' or 'active' metrics.

In the graph above, the dark blue line is the number of data points allowed per second or the data point rate limit. In this example it is set at 750,000, allowing the user to send 750K data points per second.

The other lines in the graph represent the number of data points per second hitting your account, by protocol.

The above graph will give you visual insights into recent metrics being created, deleted, and expired. This can be useful for tracking traffic spikes, and monitoring any configured expiry rules.

This card provides a quick overview of your current traffic and the icon in the status column provides information on your last received data. The green icon indicates that we have seen data arrive recently on that interface. A yellow or red icon indicates that no data has arrived for that protocol for at least 5 and 15 minutes respectively. A blue icon indicates that we have never seen any traffic on that interface.

If a user is sending a high volume of datapoints per second to a single metric, we implement per-metric rate limiting rules to protect our backend. These rules are defined differently than the Live Metric ratelimiting rules, and only target individual metrics with a very high rate of Data Points Per Second. You can read more about why these rules are important, and how they work in this informative blog article.

If a user sends metrics that do not match the Graphite format, they will be reported as 'invalid' and cannot be ingested. In the above panel you can see the offending metrics, reason for reporting as invalid, protocol, IP sent from, and timestamp of attempted ingestion.

To avoid heavy impact on our ingestion servers, the list is refreshed every 5min, there is a limit of 100 metrics, and the invalid metric names are only stored for 24hrs. We also include a related panel in your HG Traffic Dashboard, as well as an alert for Datapoints Dropped in every Hosted Graphite account. NOTE: we currently do not track invalid metrics for StatsD.

TL;DR - As a prevention measure against accidents and malice.

It’s possible for a user to run a script that accidentally (or deliberately) updates millions of metrics a second. Sensible limits on what data we process ensure that one customer cannot affect the quality of service for others. Generally, we want customers to be able to send data at a high rate and we can monitor and increase any limits as necessary. Check out this article for more details on why these limits are put in place.

Last updated