Small numbers can have a huge impact on the reliability of statistics, and can cause problems when examining trends over time. Small numbers can occur because of numerous reasons, for example: (1) an event is rare such as examining a disease with a low prevalence; (2) an event is relatively common but you are looking at the occurrence within a specific geographically defined population such as examining admissions or deaths at a small geographical level like at electoral ward level or a smaller geographical area; (3) an event is relatively common overall but you are looking at the occurrence in a specific age group for example examining strokes among young people; (4) only a small number of people have been asked to participate in a survey. In all these cases, the number of events or occurrences can be small. One solution might be to extend the range of medical conditions examined, extend the geographical area or population, survey more people or combine data from a number of different years of data. In some cases, this might be possible but in other cases this might not be possible. The risk with small numbers is that results can be misinterpreted and conclusions made which are not substantiated by the data.
In general, if small numbers occur then there is more variability associated with any estimates being made. If statistics are presented such as a percentage or an average, and small numbers are involved the results can be misleading. For instance, you could survey two people and come to the conclusion that half (or 50%) of people are over 2 meters tall because you happened to survey one person over that height and another under that height.
In the case of surveys where small numbers of people have participated in the survey, if small numbers are combined with survey bias then this could mean that the survey results might be misleading and not be representative of the overall population the researcher is attempting to produce estimates for. Furthermore, if results and conclusions are based on small numbers, then other factors become more important such as bias, confounding, effect modification and interaction.
In all cases, it is important to provide information on the numbers involved if numbers are small, to avoid presenting percentages only and present confidence intervals which gives a range of ‘likely’ values for a statistic (percentage, average, etc) rather than a single value. If the confidence interval is then wide it means the degree of confidence of that estimate is low.
In some cases, more than one year of data is combined to reduce the impact of small numbers on data.
For some data contained within this profile such as life expectancy estimates, admissions due to alcohol or drugs among young people or information relating to road traffic accidents, data is combined for several years as it is not reliable to report numbers for one single year. For instance, one might present life expectancy estimates at the Hull level for a three year period but when presenting life expectancy at an electoral ward level one might report the data for a five year period. However, combining multiple years of data also has its own problems if a particular year has a particularly high (or low) number of events as it is then a number of years before that single year is no longer part of the reporting period. This can occur for life expectancy if there is a high number of deaths for a particular year, for instance, if there was a flu vaccine was less effective than a usual year. For instance, mortality was unusually high in 2020 due to the COVID-19 pandemic, and the year 2020 will be included for three years if reporting data over a three year period, 2018-20, 2019-21 and 2020-22 before it is excluded from the period (when reporting on the period 2021-23). In the case of the pandemic, the mortality rate for 2021 could also be higher so the impact on life expectancy figures might last even longer. Problems can also occur when examining hospital admissions, particularly if the number of admissions are counted (which is often the case) rather than people admitted as it is possible that a single person can be admitted numerous times over a single year or period of years which could well be the case if the person was admitted for behaviours and issues that might be repeated such as self-harm, alcohol, drugs, or mental health reasons. Combining a number of years of data for road traffic accidents could also be influenced by a single year with a particularly high number of casualties, although in this case it could be influenced by a single road traffic accident if there were more than one casualty (and the effect could be greater if a large vehicle was involved such as a bus).
The issues mentioned above are noted on a purely statistical basis, but there are other issues relating to small numbers such as confidentiality.
Also see: Bias, Causality, Confounding, Confidence Intervals, Effect Modification and Interaction.