TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Time series statistics are tricky

In one of my last projects at BigCo, I had to implement a bunch of metrics monitoring and alerting for our services. This was years ago now but every so often I think back on it and how it drove me a bit nutty... Here's a short write up of some of the problems I remember, from just considering one of its aspects: metrics on a single API endpoint request.

People seem to think statistics like "average requests per minute" or "p99 response times" are straightforward metrics that can be pulled with a simple query. But they can be quite complex, and the results can be very misleading depending on how the events and queries have been defined.

So again, starting with something simple, we just want a requests counter for a single endpoint. How you define this counter changes how you interpret its data. One approach is to emit an event each time the endpoint is hit, logging every single request. Alternatively, you could maintain an asynchronous counter that emits its value at fixed intervals (every minute, say), incrementing only when new requests come in.

See Full Post and Comments