CHF 1500 Human Development PrevNext
wsu_online_bar.gif (1352 bytes)


   

  Correlation: What is it?  
 
One of the things researchers are interested in is the relationship that variables have with each other. For instance, there is a relationship between the age of a chicken and the number of eggs that the chicken will lay in a week. If you were in the egg selling business, you would want to know what that relationship is. You would want to know if older chickens lay more eggs or fewer eggs and you would want to know if that egg laying pattern changes at a particular age. The relationship between the age of chickens and the egg laying behavior of chickens is what researchers refer to as a correlation. In other words, there is a correlation between "chicken age" and "egg production". This correlation will be represented by a number called a correlation coefficient. The actual coefficient value is calculated from a mathematical formula using data collected in the study of the variables involved. Our interest in this number  is to give us a better idea of how particular variables are related and whether the relationship is strong enough to warrant further consideration.

Caution:  The correlation only describes the relationship between the variables, how the variables behave in relationship to each other. You can not use a correlation to infer cause and effect, even with a strong correlation.

NOTE:
The correlation between variables in a study is represented by a statistical calculation (correlation coefficient) expressed as a number which has a value ranging between -1 and +1.

Interpreting the correlation
The sign of the correlation (- or +) tells us the direction of the relationship between the variables. In other words:
A negative correlation (- sign) tells us that as one variable X increases in value, the other variable Y decreases in value, as in the negative correlation plot at left.. Using our chicken example, it means that as chickens get older they tend to lay fewer eggs.
Inverse Relationship
A positive correlation (+ sign) means that as one variable X increases in value, the other variable Y also increases in value... or as one variable decreases in value, the other variable also decreases in value. Using our chicken example, it means that as chickens get older they tend to lay more eggs.
Direct Relationship


In other words, a positive correlation means that the variables have a direct relationship (changing in the same direction, as X increases in value Y also tends to increase in value) and a negative correlation means that the variables have an inverse relationship (changing in opposite directions, as X increases in value Y tends to decrease in value).

The number
associated with the correlation (always a decimal number such as .95 or .40) tells us the strength of the correlation. Strong correlations in research are better than weak ones. Therefore, we always look at both parts of the correlation to get a better understanding of the relationship between the variables. Look at the sign, is it positive or negative.... and look at the value (or number) of the correlation, is it close to the value 1 or close to the value 0 ?? The closer to the value of 1, the stronger the correlation.

Stronger correlations (.80, .90, .95) are represented by less variability between data points. That means that we are better able to predict the value of Y when given the value of X.

In our chicken example, that would be important, because if there is a strong negative correlation between "chicken age" (variable X) and "egg production" (variable Y) we would be able to predict when to make chicken soup.

Do you understand that point?
If not...
email the instructor

The closer the dots (data points) are to the general tendency line (the line represents an average or tendency direction) the higher the correlation and consequently, the stronger the correlation. As the variability between data points increases, the correlation decreases in strength, as noted by the graph for correlation .40 below.

Notice how the data points are spread out from the general tendency line. This makes it very difficult to predict the value of Y when X is the value of 3. Look at the scatter plot for 0.40 and notice that when X is 3 (horizontal values) that Y could be anywhere between 1 and 5 (vertical values).
Look also at the correlation graph for -.95 and notice how the data points are closely arranged along the general tendency line, this indicates very little variability, and much better predictability.

Caution: The correlation only describes the relationship between the variables, how the variables behave in relationship to each other. You can not use a correlation to infer cause and effect, even with a strong correlation. In our chicken example, "chicken age" is correlated with "egg production" but this does not mean that "chicken age" causes "egg production."
In another example... there is a correlation between "power outage" and "birth rate 9 months later", but this does NOT mean that "power outage" causes "pregnancy." Get the point??


See additional information in the text regarding correlations, or email the instructor with your questions on this research issue.

See sample scatter plots below for selected correlations ... Can you tell which are strong and which are weak??

Try Correlation Exercise - Click here