Seeing Beyond Points: Distributions
When looking at a large number of measurements -- say, the time it took for the the page to load for each user who has visited your site --, there is no alternative to aid understanding aside from summarizing it.
The most commonly enlisted tools is the arithmetic mean: the sum of every element in the set, divided by the number of elements. The arithmetic mean represents the magnitude that each sample would have to assume, if every sample would have to have the same value.
It follows that, for sets of measurements which are more or less homogeneous, the aggregate error in assuming that every sample can be approximated by the overall average is relatively small.
But reality is very often not homogeneous. To tackle that, distributions provide the analytical tools to summarize non-homogeneous sets of measurements with greater detail and fidelity.
Coming back to the example: the time it takes for a page to load for each visitor of a website. There are plenty of factors that are likely to affect page load times: how close is the visitor physically to the server, how fast and reliable is the connection, how powerful is the device, its form factor -- mobile or desktop --, and so on.
If we were only to look at the average case, we might assume that things are fine when a very significant part of the users might be experiencing times greater than it.
This distribution can be visualized as a graph, putting on the x-axis the time it took for the page to load and on the y-axis the number of samples for which that page load time was observed.
The median of the distribution means the value “in the middle” of the distribution, if we were to sort the samples. The median would mean, in this example, that half of our users would be experiencing a time greater than that, still.
A more demanding metric can be designed with percentiles. The median is equivalent to the 50% percentile (which is often also called “p50”).
A more rigorous approach would be to ensure that the 99% percentile of page load times are kept in check. The p99 is the value which separates the lower half (with lesser magnitude) of 99% of samples from the upper half (with greater magnitude) of 1% of samples.
Looking at p99 would mean that only 1% of users would be experiencing page load times greater than it. Depending on the level of rigor and the absolute number of samples, one might track the 90% percentile instead, or 99.5%, 99.9%, and so on.
(Looking after 0.1% of users might sound extreme -- or not, really depends on your absolute numbers --, but remember that 0.1% of a year is still 9 hours!)