Assumed knowledge

Francis Section 3.1 "Relations Between Metric
Variables"

Francis Section 3.2 "Relations Between
Categorical Variables"

Francis Section 4.3 "Recoding Variables"
Data files
General advice
The general recommended strategy for tackling these correlation
analyses is:
 Determine the level of measurement for each variable in the
analysis
 Obtain univariate descriptive statistics and graphical displays
for each variable, to:
 check for misentered data
 check the frequency / central tendency / distribution
 Recode as necessary
 Create a bivariate visual display (e.g,. clustered bar graph,
scatterplot)
 Create tables (e.g., crosstabs with separate tables for row and
column %s) and relevant correlational statistics
 Interpret/conclude
Phi (φ) & Cramer's V
 qfsall.sav
 Phi and Cramer's V are used for analyzing the relationship between
two nominal/categorical variables
 Phi is used when you have 2x2, 2x3 or 3x2 tables
 Cramer’s V is is used when >=3x3 tables are analysed
 These are nonparametric tests which do not rely much on
assumptions about distribution. But you should make sure that there is
a minimum expected frequency of at least 5 in each cell. You can
get this via descriptives  crosstabs  cells  expected. If you
don't have enough data in the cells, you should recode the data into
fewer categories.
 Note that the sign (+ or ) of Phi doesn't mean much because there
is no meaningful order to the way the variables are coded.

Is there an association between Gender and Belief
in God? (recode to remove misentered data)
(φ is small (.024, p = .94) and not significant; there is no evidence of
relationship; use crosstabs and bar graph  clustered)

Is there an
association between snoring and smoking? (recode smoking from
continuous to dichotomous)
(φ is ~.24 and significant, p = .001; smokers are almost twice as
likely to snore as nonsmokers, but be careful in interpretation 
this could be due to noncasual factors (e.g., age?); use crosstabs
and bar graph  clustered)

Is there an association between favourite season
and favourite sense? (recode to remove misentered data)
(Cramer's V is ~.23 and significant, p = .005; in
other words there is a different
profile of favourite senses, depending on favourite season, e.g.,
Almost 50% of Summer and Spring people are Visual people. Winter
people, in contrast, tend to prefer Taste and Smell; use crosstabs and
stacked area graph)

Is there an association between type of
household (urban/rural) and whether or not the household has chickens
(Yes/No)? [chickens.sav].
The file contains hypothetical data for two categorical
variables. Resid indicates whether households are in urban or rural
areas. Chickens indicates whether or not the household owns chickens.
(the answer to this is potential quiz question material  no clues!)
Point Biserial Correlation

Point biserial correlation is for analyzing
the relationship between a dichotomous and a continuous variable

Point biserial correlation is computed as for
the productmoment correlation, but
interpretation must appropriate to the direction of coding for the
dichotomous scale.

If you interpret the significance of a
point biserial, it is equivalent to doing ttest of the mean
difference between male and female's ratings of their Australianness.

What is the relationship between Gender
(dichotomous) and
Australianness (assume continuous)? [qfsall.sav]
(no relationship (technically it is slightly negative, i.e., males
in the sample perceive themselves as very slightly more Australian),
i.e., the (point biserial) correlation is very small and
nonsignificant; use correlation  bivariate  pearson and scatterplot
 chart options  sunflowers and line of best fit)

What is the relationship between Belief in God
(recode to dichotomous) and number of Countries visited?
(important to check the scatterplot on this one  there are
outliers which look like they are influencing the small,
nonsignificant correlation; use correlation  bivariate  pearson and
scatterplot  chart options  sunflowers and line of best fit)
ProductMoment Correlation

What is the relationship
between Australianness and Femininity/Masculinity? [qfsall.sav]
(the r here is .12, p = .100,
which is larger than the point biserial correlation for Gender and
Australianness, but is still very small and nonsignificant; use
correlation  bivariate  pearson and scatterplot  chart options 
sunflowers and line of best fit)
Correlation Explore & Correlation Guess
 These exercises help you to intuitively estimate a correlation based on a
scatterplot
 Correlation Explore
(explore 20 plots with .1 increments)
 Correlation Guess
(guess 20 plots with .1 increments)  try to get 25 out of 50
 Note: The following three exercises are desirable, but
unfortunately they are java applets which will not currently run due
to the UC proxy host firewall. Try to access these from offcampus if
you can  the problem has been reported, but there's no word on when
it may be fixed.

Guessing Correlations
(4 plot exact match to correlations)  try to average over 75%

Guess the Correlation
(single plot, guess exact correlation)  try to get within .1

Spearman's rank correlation
Exploring the Effect of Outliers

regressp.exe (Continue “Explore the impact
of an outlier”)

Drag the white point to explore how an outlier can
inflate or deflate the correlation, hitting “Recalculate” to recompute
the correlation.

Where would you put the white dot to maximise the
correlation?
(as far to the ends of the line of best fit as possible)

Where would you put the white dot to minimise the
correlation?
(to shift the correlation towards zero, place the outlier as far as
possible to the ends of a line which would run perpendicular to the line
of best fit, crossing at the mean for X and the mean for Y)

Where would you put the white dot to not change
the correlation?
(on the mean for X and the mean for Y)
Correlations and Nonlinear Distributions
 xy.sav
 Draw scatterplots, compute the correlations (they
are all r=.82) and explain the relationships between:
 X1 Y1
(r is appropriate – linear relationship)
 X1 Y2
(curvilinear – r not appropriate)
 X1 Y3
(strong linear, with outlier, r=.82 is not appropriate)
 X2 Y4
(restricted range, with outlier, r=.82 is not appropriate)
Outliers and Restricted Range

aggr.sav

This is a dataset collected by Bernd Heubeck
(Division of Psychology, ANU) comparing a sample of 89 children, aged
814, from Western Sydney with a sample of 89 children from the same
area who had been referred to a Child Psychiatric Clinic. Separate
aggressiveness ratings of the child were obtained independently from
both parents. Aggressiveness ratings can range from 0 (low) to 40
(high).

To what extent the mothers’ and fathers’
Aggressive Behaviour ratings agree with one another?
