Posts Tagged ‘SPSS’

Cross-tabulation in Cohort and Case-Control Studies

Friday, September 3rd, 2010

by Annette Gerritsen, Ph.D.

Cross-tabulation in cohort studies

Assume you have just done a cohort study. How do you actually do the cross-tabulation to calculate the cumulative incidence in both groups?

Best is to always put the outcome variable (disease yes/no) in the columns and the exposure variable in the rows. In other words, put the dependent variable–the one that describes the problem under study–in the columns. And put the independent variable–the factor assumed to cause the problem–in the rows.

Let’s take as example a cohort study used to see whether there is a causal relationship between the use of a certain water source and the incidence of diarrhea among children under five in a village with different water sources. In this case, the variable diarrhea (yes/no) should be in the columns. The variable water source (suspected/other) should be in the rows.

SPSS will put the lowest value of the variable in the first column or row. So in order to get those with diarrhea in the first column you should label ‘diarrhea’ as 1 and ‘no diarrhea’ as 2. The same is true for the exposure variable: label the ‘suspected water source’ as 1 and the ‘other water source’ as 2.

You will then be able to calculate the cumulative incidence (risk of developing the disease) among those with the exposure: a / (a + b) and among those without the exposure: c / (c + d).

In the case of the diarrhea study (Table 1), you could calculate the cumulative incidence of diarrhea among those exposed to the suspected water source, which would be (78 / 1,500 =) 5.2%.

You can also do this for those exposed to other water sources, which would be (50 / 1,000 =) 5.0%.


SPSS can give you these percentages immediately (in cell ‘a’ and ‘c’ respectively), when you ask to display row percentages in the Cells option (Table 2).


Cross-tabulation in Case-Control Studies

When you have used a case-control design for the diarrhea study, the actual cross-tabulation is quite similar, only “presence of diarrhea yes/no”, is now changed into “cases” and “controls.

Label the cases as 1, and the controls as 2. Be aware that row percentages have no meaning in terms of occurrence of disease in case-control studies. This is because in case-control studies the researcher determines how many patients and how many controls are included.

The ratio between the number of patients and controls (e.g. 2 : 1 or 4 : 1) influences the row percentages. So in a case-control study, the cumulative incidence cannot be calculated.

When having conducted a case-control study, you can ask to display column percentages. That gives you the proportion of those exposed to the suspected water source among the cases (in cell ‘a’) and among the controls (in cell ‘b’).

Table 3 gives the SPSS output for the same diarrhea study assuming that it had a case-control design. Using the data provided, (78 / 128 =) 60.9% of the cases were exposed to the suspected water source, while this was (1,422 / 2,372 =) 59.9% of the controls (asked for column percentages).

Another article will be devoted to measures of association: How do you actually compare cumulative incidence rates in cohort studies? And what measure of association can be used in case-control studies?

About the Author: With expertise in epidemiology, biostatistics and quantitative research projects, Annette Gerritsen, Ph.D. provides services to her clients focussing on the methodological soundness of each phase of an epidemiological study to ensure getting valid answers to the proposed research questions. She is the founder of Epi Result.

Computing Cronbach’s Alpha in SPSS with Missing Data

Friday, July 16th, 2010

I recently received this question:

I have scale which I want to run Chronbach’s alpha on.  One response category for all items is ‘not applicable’. I want to run  Chronbach’s alpha requiring that at least 50% of the items must be answered for the scale to be defined.  Where this is the case then I want all missing values on that scale replaced by the average of the non-missing items on that scale. Is this reasonable? How would I do this in SPSS?

My Answer:

In RELIABILITY, the SPSS command for running a Cronbach’s alpha, the only options for Missing Data (more…)

The 3 Stages of Mastering Statistical Analysis

Wednesday, October 14th, 2009

Like any applied skill, mastering statistical analysis requires:

1. building a body of knowledge

2. adeptness of the tools of the trade (aka software package)

3. practice applying the knowledge and using the tools in a realistic, meaningful context.

If you think of other high-level skills you’ve mastered in your life–teaching, survey design, programming, sailing, landscaping, anything–you’ll realize the same three requirements apply.

These three requirements need to be developed over time–over many years to attain mastery. And they need to be developed together. Having more background knowledge improves understanding of how the tools work, and helps the practice go better. Likewise, practice in a real context (not perfect textbook examples) makes the knowledge make more sense, and improves skills with the tools.

I don’t know if this is true of other applied skills, but from what I’ve seen over many years of working with researchers as they master statistical analysis, the journey seems to have 3 stages. Within each stage, developing all 3 requirements–knowledge, tools, and experience–to a level of mastery sets you up well for the next stage. (more…)

Quick-R: A guide for SPSS, SAS, and Stata Users

Thursday, August 20th, 2009

If you are a SPSS, SAS, or Stata user who finds yourself needing to use R (I mean, it’s free), I just found this great website: http://statmethods.net/index.html.

SPSS Inc. just bought by IBM

Wednesday, July 29th, 2009

Hot off the presses: IBM has just bought SPSS, Inc.  According to an article at Yahoo Finance, this is to bolster IBM’s Predictive Analytics capabilities for business.

It remains to be seen what the effect will be on social science researchers (the ones who used SPSS when it was Statistical Package for the Social Sciences).  But I can’t help but think it isn’t going to be positive.

It seems the name change to PASW was just the beginning of a slippery slope.

Join me in Finding Good Solutions to Missing Data

Monday, May 18th, 2009

About 10 years ago, when I first started consulting, I had a client, Linda, who had a lot of data missing from her data set for her master’s thesis.  She had a pretty big model–about 15 predictors.  And while no one variable was missing more than 5 or 10% of the data, in combination, listwise deletion was getting rid of more than half the cases.  She wasn’t getting any significant results because of the huge loss of power, and with that many dropped cases, it wasn’t clear that she still had a random sample that gave her unbiased results.

At that point, modern approaches to dealing with missing data did exist, but they were just beginning to become available in specialized software.  Neither Linda nor I had learned about them in statistics classes, because they just hadn’t hit the mainstream yet.  With a lot of research and a lot of learning (more…)

Statistical Computing at UCLA

Thursday, May 7th, 2009

If you need to learn how to do something in SPSS, SAS, or Stata or to use a host of specialized statistical packages, head over to the web page of the Statistical Computing office in UCLA’s Academic Technology Services. They offer an amazing selection of resources for learning how to analyze data in a number of general and specialized stats packages.

One of the best resources they have created is textbook examples in different stats software.  They have literally gone through numerous statistics textbooks and articles and rerun all the examples in different software packages.  That way, if you use a different package than the authors, the text becomes useful to you.

SPSS is now PASW

Thursday, April 30th, 2009

Yes, it’s true.

SPSS has changed its name to PASW.

It stands for Predictive Analytics Software.

The company has not changed its name–it’s still SPSS, Inc.  And the software itself has not changed.  Only the name of the software has changed.

You can get a full list of the new names at http://www.spss.com/software/product-name-guide/?tab=1.

The change is intended to reflect “SPSS Inc.’s leadership role in Predictive Analytics.”

If you’re not familiar with the term Predictive Analytics, it is a collection of methods, including statistics, operations research, data mining, and monte carlo techniques for financial and customer predictive forecasting.   It’s business oriented, not science oriented.

I have to say, as someone who has used SPSS for 18 years to do statistics on scientific data, my gut reaction is that we statistics users are like the devoted friend who becomes taken for granted once the new pretty, rich new girl moves into town.

But in reality, nothing has changed.  That’s how things were going anyway–it’s just spelled out now.  SPSS (I’m not ready to call it PASW yet) is still the same statistical software for those of us who use it for statistics.

When Unequal Sample Sizes Are and Are NOT a Problem in ANOVA

Monday, April 6th, 2009

In your statistics class, your professor made a big deal about unequal sample sizes in one-way Analysis of Variance (ANOVA) for two reasons.

1. Because she was making you calculate everything by hand.  Sums of squares require a different formula if sample sizes are unequal, but SPSS (and other statistical software) will automatically use the right formula.

2. Nice properties in ANOVA such as the Grand Mean being the intercept in an effect-coded regression model don’t hold when data are unbalanced.  Instead of the grand mean, you need to use a weighted mean.  That’s not a big deal if you’re aware of it. (more…)

SPSS, SAS, R, Stata, JMP? Choosing a Statistical Software Package or Two.

Monday, March 16th, 2009

In addition to the five listed in this title, there are quite a few other options, so how do you choose which statistical software to use?

The default is to use whatever software they used in your statistics class–at least you know the basics.

And this might turn out pretty well, but chances are it will fail you at some point. Many times the stat package used in a class is chosen for its shallow learning curve, (more…)