CROSSTABS
CROSSTABS
/TABLES=VAR_LIST BY VAR_LIST [BY VAR_LIST]...
/MISSING={TABLE,INCLUDE,REPORT}
/FORMAT={TABLES,NOTABLES}
{AVALUE,DVALUE}
/CELLS={COUNT,ROW,COLUMN,TOTAL,EXPECTED,RESIDUAL,SRESIDUAL,
ASRESIDUAL,ALL,NONE}
/COUNT={ASIS,CASE,CELL}
{ROUND,TRUNCATE}
/STATISTICS={CHISQ,PHI,CC,LAMBDA,UC,BTAU,CTAU,RISK,GAMMA,D,
KAPPA,ETA,CORR,ALL,NONE}
/BARCHART
(Integer mode.)
/VARIABLES=VAR_LIST (LOW,HIGH)...
The CROSSTABS
procedure displays crosstabulation tables requested
by the user. It can calculate several statistics for each cell in the
crosstabulation tables. In addition, a number of statistics can be
calculated for each table itself.
The TABLES
subcommand is used to specify the tables to be reported.
Any number of dimensions is permitted, and any number of variables per
dimension is allowed. The TABLES
subcommand may be repeated as many
times as needed. This is the only required subcommand in "general
mode".
Occasionally, one may want to invoke a special mode called "integer
mode". Normally, in general mode, PSPP automatically determines what
values occur in the data. In integer mode, the user specifies the range
of values that the data assumes. To invoke this mode, specify the
VARIABLES
subcommand, giving a range of data values in parentheses for
each variable to be used on the TABLES
subcommand. Data values inside
the range are truncated to the nearest integer, then assigned to that
value. If values occur outside this range, they are discarded. When it
is present, the VARIABLES
subcommand must precede the TABLES
subcommand.
In general mode, numeric and string variables may be specified on
TABLES
. In integer mode, only numeric variables are allowed.
The MISSING
subcommand determines the handling of user-missing
values. When set to TABLE
, the default, missing values are dropped on
a table by table basis. When set to INCLUDE
, user-missing values are
included in tables and statistics. When set to REPORT
, which is
allowed only in integer mode, user-missing values are included in tables
but marked with a footnote and excluded from statistical calculations.
The FORMAT
subcommand controls the characteristics of the
crosstabulation tables to be displayed. It has a number of possible
settings:
-
TABLES
, the default, causes crosstabulation tables to be output. -
NOTABLES
, which is equivalent toCELLS=NONE
, suppresses them. -
AVALUE
, the default, causes values to be sorted in ascending order.DVALUE
asserts a descending sort order.
The CELLS
subcommand controls the contents of each cell in the
displayed crosstabulation table. The possible settings are:
COUNT
Frequency count.ROW
Row percent.COLUMN
Column percent.TOTAL
Table percent.EXPECTED
Expected value.RESIDUAL
Residual.SRESIDUAL
Standardized residual.ASRESIDUAL
Adjusted standardized residual.ALL
All of the above.NONE
Suppress cells entirely.
/CELLS
without any settings specified requests COUNT
, ROW
,
COLUMN
, and TOTAL
. If CELLS
is not specified at all then only
COUNT
is selected.
By default, crosstabulation and statistics use raw case weights,
without rounding. Use the /COUNT
subcommand to perform rounding:
CASE
rounds the weights of individual weights as cases are read,
CELL
rounds the weights of cells within each crosstabulation table
after it has been constructed, and ASIS
explicitly specifies the
default non-rounding behavior. When rounding is requested, ROUND
,
the default, rounds to the nearest integer and TRUNCATE
rounds
toward zero.
The STATISTICS
subcommand selects statistics for computation:
CHISQ
Pearson chi-square, likelihood ratio, Fisher's exact test, continuity correction, linear-by-linear association.PHI
Phi.CC
Contingency coefficient.LAMBDA
Lambda.UC
Uncertainty coefficient.BTAU
Tau-b.CTAU
Tau-c.RISK
Risk estimate.GAMMA
Gamma.D
Somers' D.KAPPA
Cohen's Kappa.ETA
Eta.CORR
Spearman correlation, Pearson's r.ALL
All of the above.NONE
No statistics.
Selected statistics are only calculated when appropriate for the statistic. Certain statistics require tables of a particular size, and some statistics are calculated only in integer mode.
/STATISTICS
without any settings selects CHISQ. If the STATISTICS
subcommand is not given, no statistics are calculated.
The /BARCHART
subcommand produces a clustered bar chart for the
first two variables on each table. If a table has more than two
variables, the counts for the third and subsequent levels are aggregated
and the chart is produced as if there were only two variables.
Currently the implementation of
CROSSTABS
has the following limitations:
- Significance of some symmetric and directional measures is not calculated.
- Asymptotic standard error is not calculated for Goodman and Kruskal's tau or symmetric Somers' d.
- Approximate T is not calculated for symmetric uncertainty coefficient.
Fixes for any of these deficiencies would be welcomed.
Example
A researcher wishes to know if, in an industry, a person's sex is
related to the person's occupation. To investigate this, she has
determined that the personnel.sav
is a representative, randomly
selected sample of persons. The researcher's null hypothesis is that a
person's sex has no relation to a person's occupation. She uses a
chi-squared test of independence to investigate the hypothesis.
get file="personnel.sav".
crosstabs
/tables= occupation by sex
/cells = count expected
/statistics=chisq.
The syntax above conducts a chi-squared test of independence. The
line /tables = occupation by sex
indicates that occupation and sex
are the variables to be tabulated.
As shown in the output below, CROSSTABS
generates a contingency
table containing the observed count and the expected count of each sex
and each occupation. The expected count is the count which would be
observed if the null hypothesis were true.
The significance of the Pearson Chi-Square value is very much larger than the normally accepted value of 0.05 and so one cannot reject the null hypothesis. Thus the researcher must conclude that a person's sex has no relation to the person's occupation.
Summary
┌────────────────┬───────────────────────────────┐
│ │ Cases │
│ ├──────────┬─────────┬──────────┤
│ │ Valid │ Missing │ Total │
│ ├──┬───────┼─┬───────┼──┬───────┤
│ │ N│Percent│N│Percent│ N│Percent│
├────────────────┼──┼───────┼─┼───────┼──┼───────┤
│occupation × sex│54│ 96.4%│2│ 3.6%│56│ 100.0%│
└────────────────┴──┴───────┴─┴───────┴──┴───────┘
occupation × sex
┌──────────────────────────────────────┬───────────┬─────┐
│ │ sex │ │
│ ├────┬──────┤ │
│ │Male│Female│Total│
├──────────────────────────────────────┼────┼──────┼─────┤
│occupation Artist Count │ 2│ 6│ 8│
│ Expected│4.89│ 3.11│ .15│
│ ────────────────────────────┼────┼──────┼─────┤
│ Baker Count │ 1│ 1│ 2│
│ Expected│1.22│ .78│ .04│
│ ────────────────────────────┼────┼──────┼─────┤
│ Barrister Count │ 0│ 1│ 1│
│ Expected│ .61│ .39│ .02│
│ ────────────────────────────┼────┼──────┼─────┤
│ Carpenter Count │ 3│ 1│ 4│
│ Expected│2.44│ 1.56│ .07│
│ ────────────────────────────┼────┼──────┼─────┤
│ Cleaner Count │ 4│ 0│ 4│
│ Expected│2.44│ 1.56│ .07│
│ ────────────────────────────┼────┼──────┼─────┤
│ Cook Count │ 3│ 2│ 5│
│ Expected│3.06│ 1.94│ .09│
│ ────────────────────────────┼────┼──────┼─────┤
│ Manager Count │ 4│ 4│ 8│
│ Expected│4.89│ 3.11│ .15│
│ ────────────────────────────┼────┼──────┼─────┤
│ Mathematician Count │ 3│ 1│ 4│
│ Expected│2.44│ 1.56│ .07│
│ ────────────────────────────┼────┼──────┼─────┤
│ Painter Count │ 1│ 1│ 2│
│ Expected│1.22│ .78│ .04│
│ ────────────────────────────┼────┼──────┼─────┤
│ Payload Specialist Count │ 1│ 0│ 1│
│ Expected│ .61│ .39│ .02│
│ ────────────────────────────┼────┼──────┼─────┤
│ Plumber Count │ 5│ 0│ 5│
│ Expected│3.06│ 1.94│ .09│
│ ────────────────────────────┼────┼──────┼─────┤
│ Scientist Count │ 5│ 2│ 7│
│ Expected│4.28│ 2.72│ .13│
│ ────────────────────────────┼────┼──────┼─────┤
│ Scrientist Count │ 0│ 1│ 1│
│ Expected│ .61│ .39│ .02│
│ ────────────────────────────┼────┼──────┼─────┤
│ Tailor Count │ 1│ 1│ 2│
│ Expected│1.22│ .78│ .04│
├──────────────────────────────────────┼────┼──────┼─────┤
│Total Count │ 33│ 21│ 54│
│ Expected│ .61│ .39│ 1.00│
└──────────────────────────────────────┴────┴──────┴─────┘
Chi─Square Tests
┌──────────────────┬─────┬──┬──────────────────────────┐
│ │Value│df│Asymptotic Sig. (2─tailed)│
├──────────────────┼─────┼──┼──────────────────────────┤
│Pearson Chi─Square│15.59│13│ .272│
│Likelihood Ratio │19.66│13│ .104│
│N of Valid Cases │ 54│ │ │
└──────────────────┴─────┴──┴──────────────────────────┘