In general, in statistics, one wishes to find out about the population, but cannot sample all individuals from that population so a sample of individuals from that population is used with the aim of generalising the findings from that sample to the overall population. This can only be done well if the sampling and research process is free from bias (see Bias). However, even without bias in the sample, there will be random variation.
For instance, if you wish to estimate the height of women living in a specific geographical area then you could measure a sample of women living in that area (for instance, by asking people on the street or using the telephone area code for that area and telephoning people within that area). Assuming there is no bias, then you should get an indication of the height of women in that specific area. However, if you go back the next day or randomly select another batch of telephone numbers, then you will get another set of women and you will likely get different height measurements. It is likely that you will a similar set of values or average height, but there will be variation associated with the measurements you obtain. The more people you survey, then the more likely you will obtain a better estimate of the true (average) value in your population, and if there is less variation within that population (for instance, not much difference in height within the overall population) then you will also get a better estimate of the value from your sample which is closer to the ‘true’ value within the population you are trying to estimate. If there are only a small number of people surveyed or events (for instance, deaths or hospital admissions) occurring then any estimates from your sample are less likely to represent the overall population (see Small Numbers).
It is possible to use the information contained within the data or sample of observations or data points to measure the variability within the sample, and it is often possible to calculate confidence intervals which give an indication of the range of values rather than a single value, for instance, producing a 95% confidence interval (see Confidence Intervals) as well as a single value (for example, the Average to represent a ‘typical’ value).
In public health, information is often known about the entire population (for instance, the total number of deaths within that entire population or the total number of hospital admissions over a specific period of time), however, there is generally random variation associated with that measure, and in which case, bias and random variation are still important to consider when examining the data. For instance, the number of deaths will be associated with more ‘random’ events such as exposure to viruses and the development of disease, the uptake rates for vaccinations, how well the flu vaccine works against the current flu circulating in the community, the chance of accidents and injuries, as well as external factors such as the weather (particularly hot or cold spells of weather which can affect mortality rates, or snow or ice which could affect road traffic accidents and falls), etc. There is often a problem with small numbers in public health data which can result in misleading conclusions.
Also see: Bias, Causality, Confounding, Confidence Intervals, Effect Modification, Interaction and Small Numbers.