Which measures of central tendency to use
One should keep the extreme points and use more resistant measures. For example, use the sample median to estimate the population median. We will discuss methods using the median in Lesson What happens to the mean and median if we add or multiply each observation in a data set by a constant?
What effect does this have on the mean and the median? The result of adding a constant to each value has the intended effect of altering the mean and median by the constant. For example, if in the above example where we have 10 aptitude scores, if 5 was added to each score the mean of this new data set would be Similarly, if each observed data value was multiplied by a constant, the new mean and median would change by a factor of this constant.
Returning to the 10 aptitude scores, if all of the original scores were doubled, the then the new mean and new median would be double the original mean and median. As we will learn shortly, the effect is not the same on the variance! Why would you want to know this? One reason, especially for those moving onward to more applied statistics e. For many applied statistical methods, a required assumption is that the data is normal, or very near bell-shaped. When the data is not normal, statisticians will transform the data using numerous techniques e.
We just need to remember the original data was transformed!! The shape of the data helps us to determine the most appropriate measure of central tendency. The three most important descriptions of shape are Symmetric, Left-skewed, and Right-skewed. Skewness is a measure of the degree of asymmetry of the distribution. Salary distributions are almost always right-skewed, with a few people that make the most money.
To illustrate this, consider your favorite sports team or even the company for which you work. This will produce a shape that is skewed to the right. Knowing this can be a useful aid in negotiating a higher salary. That is, they are offering you the average salary for someone with your particular skill set e.
But is this average the mode, median, or mean? The company — for whom business is business! Since salaries tend to be skewed to the right, the offer will most likely reflect the mode or median. Once you have these averages, you can begin to negotiate toward the highest number.
Breadcrumb Home 1 1. However, a consensus has not been reached among statisticians about whether the mean can be used with ordinal data, and you can often see a mean reported for Likert data in research. The mean is usually the best measure of central tendency to use when your data distribution is continuous and symmetrical, such as when your data is normally distributed.
However, it all depends on what you are trying to show from your data. The mode is the least used of the measures of central tendency and can only be used when dealing with nominal data. For this reason, the mode will be the best measure of central tendency as it is the only one appropriate to use when dealing with nominal data. The median is usually preferred to other measures of central tendency when your data set is skewed i. However, the mode can also be appropriate in these situations, but is not as commonly used as the median.
The median is usually preferred in these situations because the value of the mean can be distorted by the outliers. However, it will depend on how influential the outliers are.
If they do not significantly distort the mean, using the mean as the measure of central tendency will usually be preferred. If the data set is perfectly normal, the mean, median and mean are equal to each other i. The mean is being skewed by the two large salaries. Therefore, in this situation, we would like to have a better measure of central tendency.
As we will find out later, taking the median would be a better measure of central tendency in this situation. Another time when we usually prefer the median over the mean or mode is when our data is skewed i. If we consider the normal distribution - as this is the most frequently assessed in statistics - when the data is perfectly normal, the mean, median and mode are identical. Moreover, they all represent the most typical value in the data set. However, as the data becomes skewed the mean loses its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value.
However, the median best retains this position and is not as strongly influenced by the skewed values. This is explained in more detail in the skewed distribution section later in this guide. The median is the middle score for a set of data that has been arranged in order of magnitude. The median is less affected by outliers and skewed data.
In order to calculate the median, suppose we have the data below:. Our median mark is the middle mark - in this case, 56 highlighted in bold. It is the middle mark because there are 5 scores before it and 5 scores after it.
This works fine when you have an odd number of scores, but what happens when you have an even number of scores? What if you had only 10 scores? Well, you simply have to take the middle two scores and average the result.
So, if we look at the example below:. Only now we have to take the 5th and 6th score in our data set and average them to get a median of The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or histogram. You can, therefore, sometimes consider the mode as being the most popular option.
An example of a mode is presented below:. Normally, the mode is used for categorical data where we wish to know which is the most common category, as illustrated below:. We can see above that the most common form of transport, in this particular data set, is the bus.
However, one of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more values that share the highest frequency, such as below:.
We are now stuck as to which mode best describes the central tendency of the data. This is particularly problematic when we have continuous data because we are more likely not to have any one value that is more frequent than the other. For example, consider measuring 30 peoples' weight to the nearest 0.
0コメント