It is often useful to compare a particular summary parameter (for instance, mean, median, measure of risk such as a rate etc) among different groups. Since there is natural variation associated with virtually all measurements and since we generally only have a sample and have not measured the entire population, it is necessary to distinguish between differences which are close enough together to be explained by chance and differences which are ‘unlikely’ to be explained by chance. Such a comparison can be undertaken using a statistical test which takes into the account chance variation. When undertaking a statistical test, we assume that there is no difference in the summary measure among the groups and then calculate the probability of obtaining the difference we observe in our sample (i.e. in the data we have). If the calculated probability, or so-called p-value, is small then this means that there is a small chance of obtaining such a result under the assumption that there is no difference. Therefore, if the probability is small enough (generally, less than one in twenty or less than 0.05) then we assume that the original assumption must be incorrect and that there really is a difference. Since this is based on probabilities and assumptions, just because a small p-value is observed, it does not necessarily mean that the original assumption of no difference between the groups is untrue. However, clearly the smaller the p-value, the more likely it is that the original assumption is untrue. Similarly, just because you obtain a large p-value and therefore have no evidence to reject the original assumption, it does not mean that it is actually true, it could be that there is simply insufficient evidence to show otherwise (for example, a small number of people or small number of people with a particular event). If a small p-value is obtained (p<0.05) then the difference is deemed ‘statistically significant’. However, this does not necessarily mean that the result is important clinically. It is possible that 50% of those living in one area report poor health compared to another area whose residents report 45%. If the number of people involved in the survey was sufficiently large, it is possible to obtain a statistically significant difference between these areas. However, from a medical point of view it may be considered not very important and the fact that both areas report high levels of poor health may be more important.
Even when data from a population is known, for example, the total number of deaths within a specific geographical area over a specific period of time, there will still be year-on-year random variation and variability in the number of deaths, so significance testing can be undertaken. There will be random factors which will influence the number of deaths such as the weather, accidents, ‘flu epidemics, etc.