crosanal

PRACTICAL SOCIAL INVESTIGATION:

ANALYSING CROSS-TABULATED DATA USING SPSS FOR WINDOWS (Versions 8 and 10)

This page discusses the analysis of cross-tabulated data, and also considers how variables can be recoded (for example, to reduce the number of categories of a variable by aggregation).

More sophisticated analyses of cross-tabulated data can be carried out with log-linear models (see mwayanal.html).

Analyses involving interval-level variables use different techniques (see intanal.html).

The analyses discussed below are examples from Chapter 9 of the book, and can also (like all the other cross-tabular analyses described in Chapter 9) be replicated by using a file of commands (which can be downloaded from these web pages) within a syntax window (see synwins.html).

The analyses were all carried out using SPSS for Windows, and can be duplicated using the data file BSA91.SAV, which can be downloaded from these web pages (see datasets.html).

GETTING TO KNOW THE DATA

Once one has loaded SPSS for Windows, and clicked on 'Cancel' to remove the first menu that appears, the first step is to read in the relevant data file, which is the SPSS system file called bsa91.sav. This is achieved by using the mouse and pointer to click on File (which is located in the menu bar towards the top of the screen), then to click on Open in the resulting menu (and then to click on Data... if you are using Version 10 of SPSS). Once bsa91.sav has been located and selected (by clicking on it), it can then be read in by clicking on the Open box. The data then appear in the data window (Data Editor).

The next step is to have a look at the variables included in the dataset by clicking first on Utilities, and then on File Info in the menu that appears. Details of the variables appear in the output window (SPSS Viewer). Using either the Page Up and Page Down keys or the mouse and the arrows adjacent to this window you can scroll up and down the window until you find details of the two variables srsoccl (self-reported social class) and rrgclass (respondent's Registrar General's occupational class). [An alternative is to look at specific variables via Utilities and then Variables...].

At this stage the data analysis process really begins. Clicking on Statistics, then on Summarize, and then on Frequencies... (or on Analyze, Descriptive Statistics and then Frequencies... if using Version 10) brings up a box in which you can specify the variables whose frequencies you want to examine. This is achieved by using the mouse and arrows (or the Page Up and Page Down keys) to locate srsoccl ('Self-rated social class') in the alphabetical list of variables, then clicking on it to select it, and then clicking on the black triangle pointing to the other (empty) variable list to move it between the lists. The other variable, rrgclass ('Registrar General's Social Class R') can then be located and selected in the same way. Clicking on the OK box generates the frequencies for the two variables, which appear in the output window (SPSS Viewer) and bear a strong resemblance to Tables 9.1 and 9.3.

RECODING THE VARIABLES

The next stage is to restrict attention to the five categories of srsoccl ('Self-rated social class') which correspond to classes. This is achieved by creating a new version of the variable. Once one has moved to the Data Editor via clicking on Window, the first step is to click on Transform, and then on Recode, and then on Into Different Variables. In the box that appears, select srsoccl ('Self-rated social class') from the alphabetical list of variables by clicking on it and choose it as the input variable by clicking on the triangle to the right of the alphabetical list. Next, click on the small box immediately below Name: and type in a name, say src1, for the output variable or new version of srsoccl ('Self-rated social class'). The next step is to register src1 as the name by clicking on Change. Now click on Old and New Values....

We wish to retain categories 1 to 5 of srsoccl ('Self-rated social class'). Looking at the left hand side of the resulting box, click on the fourth little circle down (next to Range:). The black dot then moves to this circle. Type 1 in the adjacent small box, and then click on the small box on the other side of through. Type 5 into this box. Now, looking at the New Value section on the right hand side of the main box, click on the small circle next to Copy old value(s). Then click on Add to enter this part of the recoding procedure in the Old -> New: box. The remaining categories of srsoccl ('Self-rated social class') are not of interest to us, and can be treated as missing. Returning to the left hand side of the main box, click on the little circle next to All other values. Then, in the New Value section, click on the small circle next to System missing. The next steps are to click on Add, and then to click on Continue. Finally, clicking on OK creates the new version, src1, of the old variable, srsoccl ('Self-rated social class'). Now once again click on Statistics, then Summarize, then Frequencies... (or on Analyze, Descriptive Statistics and then Frequencies... if using Version 10). Remove srsoccl ('Self-rated social class') and rrgclass ('Registrar General's Social Class R') from the list of variables whose frequencies are required by clicking on them and then clicking once again on the little triangle. Then select src1 instead. Clicking on OK generates frequencies whose accompanying (valid) percentages are those shown in Table 9.2.

A similar sequence of procedures (one stage of which is illustrated in the screen dump that follows) is used to collapse the five class categories of srsoccl ('Self-rated social class') into two broader categories. First, having returned to the Data Editor by clicking on Window, one clicks on Transform, then on Recode, then on Into Different Variables. (Clicking on Reset clears the recoding box). Once again one selects srsoccl ('Self-rated social class') as the input variable, but this time round one types a different name for the output variable, say src2, into the small box below Name:, clicking on Change to register it. Once one has clicked on Old and New Values... it is time to specify the way in which the categories are to be aggregated. Categories 1 and 2 of srsoccl ('Self-rated social class') are to be combined to give the first of the new categories, and categories 3, 4 and 5 are to be combined to give the second new category. To achieve this, one clicks on the fourth little circle down (next to Range:), then one types 1 in the adjacent box, clicks on the small box on the other side of through, and types 2 into this box. Moving across to the New Value section, one clicks on the box next to Value: and types in 1. Clicking on Add enters this part of the aggregation procedure into the Old -> New: box. One then needs to enter old and new values (3 through 5 and 2) to create the second new category; this is the stage illustrated in the screen dump below. Once again, the remaining categories of srsoccl ('Self-rated social class') of no interest, so one clicks on the little circle next to All other values, then clicks on the small circle next to System missing in the New Value section, and then clicks on Add. Clicking on Continue and then on OK creates the new version, src2, of srsoccl ('Self-rated social class').

Before the cross-tabulation of occupational class by self-rated class discussed in Chapter 9 can be reproduced, rrgclass ('Registrar General's Social Class R') needs to be recoded to restrict attention to the six Registrar General's Social Classes (i.e. categories 1 to 6). This is done in a very similar fashion to the recoding of srsoccl ('Self-rated social class') to give src1, the only real differences being the input and output variable names, rrgclass ('Registrar General's Social Class R') and rrg1, and the range of categories retained. (Similarly, rrgclass ('Registrar General's Social Class R') can be collapsed into a non-manual/manual dichotomy, rrg2, in a similar way, by assigning categories 1 to 3 to the new category 1, categories 4 to 6 to the new category 2, and treating all other values as missing).

Recoding variables within SPSS for Windows (Version 8)

GENERATING THE CROSS-TABULATION

To obtain the required cross-tabulation, one clicks on Statistics, then on Summarize, then on Crosstabs... (or on Analyze, Descriptive Statistics and then Crosstabs... if using Version 10). This brings up a large box within which one defines the cross-tabulation. This is achieved by selecting rrg1 from the variable list by clicking on it and moving it across to the Row(s): box by clicking on the triangle pointing towards the Row(s): box, and then selecting src2 from the variable list and moving it across to the Column(s): box in a similar fashion. To obtain percentages which add up to 100% across the rows (i.e. row percentages), one clicks on Cells... and then clicks on the little square box next to Row, which puts a cross in it. Having thus requested row percentages, one clicks on Continue, and then on OK to generate the cross-tabulation.

The resulting table which appears in the output window (SPSS Viewer) can be seen to be a composite of Tables 9.4 and 9.5. Note that neither the rows nor the columns are labelled in any detail, since recoded variables do not have variable or value labels unless these are entered separately (using Data and Define Variable... within the Data Editor in Version 8, or Variable View within the Data Editor in Version 10).

TESTING THE OBSERVED RELATIONSHIP FOR (STATISTICAL) SIGNIFICANCE

Testing the relationship between occupational class and self-rated class for significance is simply a question of making a small addition to the above sequence of instructions. If one once again clicks on Statistics, then on Summarize, then on Crosstabs... (or on Analyze, Descriptive Statistics and then Crosstabs... if using Version 10), the crosstabs box can be seen to still have rrg1 and src2 specified as the row and column variables. Clicking on Statistics... where it appears within the crosstabs box, and then clicking on the little boxes next to Chi-square and next to Phi and Cramer's V generates relevant additional material. Once one has clicked on Continue to return to the crosstabs box, one can also add some extra chi-square related material to the cross-tabulation by clicking on Cells... and then clicking on the little boxes next to Expected (counts) and Unstandardized (residuals). Clicking on Continue and then on OK generates the revised version of the cross-tabulation.

Each cell of the revised version of the cross-tabulation contains the observed number of cases and the row percentage, as in Tables 9.4 and 9.5, and also the expected number of cases (Table 9.7) and the (unstandardized) residual, i.e. the difference between the observed and expected numbers (Table 9.8). Beneath the table are the Pearson chi-square statistic itself, together with its degrees of freedom and its significance level (P-value), which as noted in Chapter 9 shows the relationship in the table to be highly significant. Note that the chi-square statistic (156.8) is slightly different from the figure given in Chapter 9 (157.1); it is more accurate because computers are less likely to make rounding errors! The Likelihood Ratio version of chi-square, which is very similar, can be ignored for our purposes; the Linear-by-Linear Association chi-square statistic (mentioned in Chapter 9 as Mantel and Haenszel's version) takes account of the ordinality of the two variables. SPSS checks how many expected cell frequencies are less than 5; in this case the minimum expected frequency is greater than 19 so there is no problem of small expected frequencies. Cramer's V appears towards the bottom of the cross-tabulation output.

EXAMINING RESIDUALS

While an examination of the unstandardized residuals can give one an idea of the nature of the relationship in the cross-tabulation, an alternative would have been to have requested standardized residuals (which are in fact the square roots of the figures in Table 9.10). The standardized residual is a good way of assessing whether the observed figure for a particular cell differs significantly from what would have been expected; a standardized residual of greater than 2 (or less than -2) has an implicit P-value of less than 0.05 and can be taken as evidence of a significant excess or shortfall in the cell in question, though it is worth bearing in mind that about one cell in every twenty will have a standardized residual of greater than 2 (or less than -2) 'by chance'! Requesting standardized residuals is simply a question of clicking on the relevant little box, having clicked on Cells... within the crosstabs box.

EXAMINING SUB-TABLES

Examinations of sub-tables can be achieved by recoding the two variables so that the categories which one is not interested in are treated as missing. For example, one could compare Social Classes I and II by recoding rrgclass ('Registrar General's Social Class R') so that only the first two categories were retained, and then cross-tabulating the new variable by src2. However, on occasions it is not a part of a cross-tabulation which is of interest but a sub-sample of respondents. For example, only male respondents are included in Table 9.11. Attention can be restricted to a sub-sample of respondents from within the Data Editor by clicking on Data and then on Select Cases.... If one wants to restrict attention to men, then the next steps are to click on the small circle next to If condition is satisfied and then to click on If.... The selection criterion can then be entered into the empty box towards the top right of the Select Cases: If box. Men are category 1 of the variable rsex ('Respondent's sex'), so the appropriate criterion is rsex = 1. This can either be typed into the box or entered using the alphabetical list of variables and the on-screen key pad. Clicking on Continue and then on OK focuses attention on male respondents.

At this stage requesting a cross-tabulation of rrgclass ('Registrar General's Social Class R') by sexrole ('Husband earn money, wife's job family') will produce (something akin to) Table 9.11. (Note that the crosstabs box can be cleared of previously selected variables and options by clicking on Reset). Attention can be returned to the whole sample by moving to the Data Editor by clicking on Window, clicking again on Data and Select Cases..., clicking on the little circle next to All cases, and then clicking on OK.

ELABORATING CROSS-TABULATIONS

Elaborating a cross-tabulation by introducing a third variable, e.g. looking at the relationship between occupational class and self-rated class for each sex (Table 9.14), is quite straightforward. First one clicks on Statistics, then on Summarize, then on Crosstabs... (or on Analyze, Descriptive Statistics and then Crosstabs... if using Version 10). One then clears the crosstabs box by clicking on Reset. Then one selects rrg2 and src2 as the row and column variables. The elaboration of the cross-tabulation is achieved by clicking on rsex ('Respondent's sex') in the alphabetical list of variables to select it, and then clicking on the appropriate little triangle to move it to the 'Layer' box (see the screen dump that follows). Chi-square statistics and Cramer's V values can then be requested using the Statistics... sub-menu within the crosstabs box. Clicking on OK generates the three-way table (Table 9.14) in the output window (SPSS Viewer).

Generating and elaborating cross-tabulations using SPSS for Windows (Version 8)

REPRODUCING THE CROSS-TABULAR ANALYSES DESCRIBED IN CHAPTER 9

The above examples, together with the examples of cross-tabular analyses in Chapter 9 more generally, can be reproduced using the commands in the file catex.sps which can be downloaded from these web pages and read into an SPSS syntax window. Syntax windows are described in the next page (synwins.html).