Mediators, Moderators, and Suppressors: What IS the difference?

March 10th, 2010

One of the biggest questions I get is about the difference between mediators, moderators, and how they both differ from control variables.

I recently found a fabulous free video tutorial on the difference between mediators, moderators, and suppressor variables, by Jeremy Taylor at Vary Your Stats Consulting.   The witty example is about the different types of variables–talent, practice, etc.–that explain the relationship between having a guitar and making lots of $$.

R Workshops by Ghement Statistical Consulting

March 8th, 2010

Workshop Announcement:

Have you been wanting to learn R?  Have the Olympic made you want to go to Vancouver?  Ghement Statistical Consulting is offering two upcoming workshops on R.  Here is the announcement as sent to me:

1.  “An Introduction to the Statistical Software Package R” (April 15-16, 2010; 8:30am-4:30pm; BCIT; Vancouver)

2.  “Advanced Statistical Modeling Using the Statistical Software Package R” (May 20-21, 2010; 8:30am-4:30pm; BCIT; Vancouver)

The workshops take place at the downtown campus of BCIT at 555 Seymour Street, Vancouver, B.C.  (Workshops not affiliated with BCIT.)

The introductory R workshop is ideal for participants who are new to R or have started learning R on their own and would like to get a structured and efficient overview of R’s main capabilities. The advanced R workshop is suitable for participants who have some experience with R and would like to expand their skill set to include the ability to perform advanced statistical analyses in R.

Both workshops are limited to 20 participants and consist of a series of short lectures and demonstrations, followed by hands-on, interactive sessions for the participants.  Participants attending the introductory R workshop (April 15-16) will get a jumpstart in R programming and develop a solid foundation for using R to manipulate, summarize and visualize data and solve basic data analysis problems.  Participants attending the advanced R workshop (May 20-21) will become familiar with a variety of advanced statistical modeling techniques, know when and how to use them, learn their practical implementation using R and become adept at interpreting the results produced from the use of these techniques.

The attendance fee for each day of the introductory R workshop (April 15-16) is $195.00 per person plus GST. The attendance fee for each day of the advanced R workshop (May 20-21) is $195.00 per person plus GST. Participants can attend either day or both days of each workshop. For each workshop, the attendance fee includes a bound copy of the Workshop Notes, a CD-ROM containing all workshop examples and exercises, various workshop handouts, 30 days of free workshop-related technical support following the workshop, and morning and afternoon coffee, tea and snacks.  Individual participants attending both days of each workshop and groups of 3 or more participants from the same organization will benefit from discounts.

Participants should bring a laptop computer pre-installed with R for Windows.  Detailed instructions for downloading and installing R are available at www.ghement.ca/Rinstructions.html. Upon request, we can provide computers for use during the workshop for an additional cost of  $100 plus GST per person per day.

To get more information about each workshop and to register, please visit  www.ghement.ca.

Please note that the registration deadline for the introductory R workshop is April 6, 2010 and the registration deadline for the advanced R workshop is May 11, 2010.

Bookmark and Share

Get Started with SPSS: Tutorial Videos

March 5th, 2010

If you’re just getting started using SPSS, here’s a nice series of SPSS video tutorials, created by Dr. Ian Walker at the University of Bath.

They cover many of the basics: histograms, Two-sample t-tests, Mann Whitney U tests, one-way anova, regression, etc.

They’re nice because not only does he show you how to do them very clearly, but he goes over the output, so you can see what the numbers mean.


Bookmark and Share

Another Great SPSS book: SPSS Programming and Data Management

March 3rd, 2010

Have you ever needed to do some major data management in SPSS and ended up with a syntax program that’s pages long?  This is the kind you couldn’t even do with the menus, because you’d tear your hair out with frustration because it took you four weeks to create some new variables.

I hope you’ve gotten started using Syntax, which not only gives you a record of how you’ve recoded and created all those new variables and exactly which options you chose in the  data analysis you’ve done.

But once you get started, you start to realize that some things feel a little clunky.  You have to run the same descriptive analysis on 47 different variables.  And while cutting and pasting is a heck of a lot easier than doing that in the menus, you wonder if there isn’t a better way.

There is.

SPSS syntax actually has a number of ways to increase programming efficiency, including macros, do loops, repeats.

I admit I haven’t used this stuff a lot, but I’m increasingly seeing just how useful it can be.  I’m much better trained in doing these kinds of things in SAS, so I admit I have been known to just import data into SAS to run manipulations.

But I just came across a great resources on doing sophisticated SPSS Syntax Programming, and it looks like some fabulous bedtime reading.  (Seriously).

And the best part is you can download it (or order it, if you’d like a copy to take to bed) from the author’s website, Raynald’s SPSS Tools, itself a great source of info on mastering SPSS.

So once you’ve gotten into the habit of hitting Paste instead of Okay, and gotten a bit used to SPSS syntax, and you’re ready to step your skills up a notch, this looks like a fabulous book.


Bookmark and Share

A Few Resources on Zero-Inflated Poisson Models

February 15th, 2010

1. For a general overview of modeling count variables, you can get a free download of the audio recording of one of my first Craft of Statistical Analysis Webinars:

The Other Regression Models Part 2: Poisson and Negative Binomial for Count Outcomes

2. One of my favorite books on Categorical Data Analysis is:

Long, J. Scott. (1997).  Regression models for Categorical and Limited Dependent Variables.  Sage Publications.

It’s moderately technical, but written with social science researchers in mind.  It’s so well written, it’s worth it.  It has a section specifically about Zero Inflated Poisson and Zero Inflated Negative Binomial regression models.

3. Slightly less technical, but most useful only if you use Stata is Regression Models for Categorical Dependent Variables Using Stata, by J. Scott Long and Jeremy Freese.

4. UCLA’s ATS Statistical Software Consulting Group has some nice examples of Zero-Inflated Poisson and other models in various software packages.


Bookmark and Share

Zero-Inflated Poisson Models for Count Outcomes

February 12th, 2010

There are quite a few types of outcome variables that will never meet ordinary linear model’s assumption of normally distributed residuals.  A non-normal outcome variable can have normally distribued residuals, but it does need to be continuous, unbounded, and measured on an interval or ratio scale.   Categorical outcome variables clearly don’t fit this requirement, so it’s easy to see that an ordinary linear model is not appropriate.  Neither do count variables.  It’s less obvious, because they are measured on a ratio scale, so it’s easier to think of them as continuous, or close to it.  But they’re neither continuous or unbounded, and this really affects assumptions.

Continuous variables measure how much.  Count variables measure how many.  Count variables can’t be negative—0 is the lowest possible value, and they’re often skewed–so severly that 0 is by far the most common value.  And they’re discrete, not continuous.  All those jokes about the average family having 1.3 children have a ring of truth in this context.

Count variables often follow a Poisson or one of its related distributions.  The Poisson distribution assumes that each count is the result of the same Poisson process—a random process that says each counted event is independent and equally likely.  If this count variable is used as the outcome of a regression model, we can use Poisson regression to estimate how predictors affect the number of times the event occurred.

But the Poisson model has very strict assumptions.  One that is often violated is that the mean equals the variance.  When the variance is too large because there are many 0s as well as a few very high values, the negative binomial model is an extension that can handle the extra variance.

But sometimes it’s just a matter of having too many zeros than a Poisson would predict.  In this case, a better solution is often the Zero-Inflated Poisson (ZIP) model.  (And when extra variation occurs too, its close relative is the Zero-Inflated Negative Binomial model).

ZIP models assume that some zeros occurred by a Poisson process, but others were not even eligible to have the event occur.  So there are two processes at work—one that determines if the individual is even eligible for a non-zero response, and the other that determines the count of that response for eligible individuals.

The tricky part is either process can result in a 0 count.   Since you can’t tell which 0s were eligible for a non-zero count, you can’t tell which zeros were results of which process.  The ZIP model fits, simultaneously, two separate regression models.  One is a logistic or probit model that models the probability of being eligible for a non-zero count.  The other models the size of that count.

Both models use the same predictor variables, but estimate their coefficients separately.  So the predictors can have vastly different effects on the two processes.

But a ZIP model requires it be theoretically plausible that some individuals are ineligible for a count.  For example, consider a count of the number of disciplinary incidents in a day in a youth detention center.  True, there may be some youth who would never instigate an incident, but the unit of observation in this case is the center.  It is hard to imagine a situation in which a detention center would have no possibility of any incidents, even if they didn’t occur on some days.

Compare that to the number of alcoholic drinks consumed in a day, which could plausibly be fit with a ZIP model.  Some participants do drink alcohol, but will have consumed 0 that day, by chance.   But others just do not drink alcohol, so will never have a non-zero response.  The ZIP model can determine which predictors affect the probability of being an alcohol consumer and which predictors affect how many drinks the consumers consume.  They may not be the same predictors for the two models, or they could even have opposite effects on the two processes.


Bookmark and Share

The Craft of Statistical Analysis Webinars: Program Updates

February 10th, 2010

The Craft of Statistical Analysis Webinars are one of our most popular programs.  On the last Wednesday of each month at 1pm eastern we’ve been meeting for a free statistics training webinar.  Each month is on a different statistical topic–some overviews, some clarifications of what confusing statistical concepts mean, some about the general approach and steps to take.

We are making a few changes in the structure of the program, so I just wanted to outline what’s changing and why, what’s staying the same, and what we’re still working on.

What’s staying the same:

1.The approach: The webinars will still be on the same types of topics, an hour long, and will focus on improving the craft of implementing statistical analysis.

2. The timing: We’re still meeting for one-hour on Wednesdays at 1pm eastern (GMT-5).  I realize this is a bad time for those of you in Australia and Asia, but there just isn’t a good time for everyone.  I can only hope you consider it worth it to attend at 3am or get the video.

3. The price: Because of some of the changes we’re making, we’re able to keep the live webinars free.  I feel it’s really important to make the information accessible to everyone.  So bring your research group, some snacks, and enjoy.

4. Recordings: The Webinars will continue to be recorded and available for purchase after the webinar.  So if you miss one we already did, you can always get one.  There are still a bunch of them that are available for free download as well.

What’s changing:

1.  The frequency:  Instead of meeting every month, we’ll be meeting every 8 weeks or so.  I say “or so” so we can work around holidays, some of my other workshops, and the like.  But it’ll generally be every other month and I’ll announce it far enough ahead of time that you can plan ahead.

2. Handouts and recordings: Handouts and recordings of new webinars will no longer be available for free.  There’s a surprising amount of administrative wrangling that they create.  And in order to keep the webinars themselves free, we’re going to charge for the extras.   We’re still keeping the price very low–under $20–so even students can afford them.  So if you do want handouts (still available ahead of time) and the recording, there’s a nominal fee to cover our expenses.

I’ve thought long and hard about making these changes, but as the program has grown, it’s getting harder to keep things frequent, free, and really high quality.  Lower quality isn’t an option in my mind, and it’s important to me that the webinars stay free.

So enjoy, and I hope to see you at the next webinar….

Free Webinar: Understanding Mediation and Path Analysis

January 26th, 2010

The Next Craft of Statistical Analysis Webinar* is tomorrow: Understanding Mediation and Path Analysis

Path Analysis is a system of regression equations used to determine if a third variable (a mediator) is driving the relationship between an independent and dependent variable. It is one of the simplest forms of structural equation models (SEM), but you don’t need specialized SEM software to run it.

This webinar will give an overview of the concepts, terminology, and steps involved in detecting mediation using three regression equations.  We’ll cover the difference between Mediators, control variables, and moderators.  They’re all different!

Date: Wednesday, January 27, 2010

Time: 1pm Eastern Time (12pm Central, 11am Mountain, 10am Pacific)

Where: Anywhere you have a fast internet connection

Length of Program: An Hour

Cost: Always FREE

Register at: http://www.analysisfactor.com/learning/webinar14.html

What’s a Craft of Statistical Analysis Webinar?  It’s a regular webinar series for researchers to help you hone the craft of statistical analysis.  Each webinar is about a single statistical topic that is often confusing, misunderstood, or not well known to researchers.  Check it out and pass the word along–they’re free!


Bookmark and Share

Statistical Workshop Announcements: Complex Surveys, Hierarchical Models, Survival Analysis, Categorical Data Analysis, and Factor Analysis

January 22nd, 2010

The announcements have begun for statistical workshops this summer.  Here’s the first.*

2010 Summer Quantitative Method Series at Portland State University

June 11-12. Secondary Data and Complex Survey Design, Clyde Dent & Nathalie Huguet.
June 14-15. Hierarchical Linear Models and Their Applications, Jason T. Newsom.
June 16-17. Introduction to Survival Analysis with SPSS, Jong-Sung Kim.
June 18-19 Categorical Data Analysis for Social Science. Hyeyoung Woo.
June 21-22. Introduction to Factor Analysis and Structural Equation Modeling. Mo Wang.

More information and online registration:
http://www.upa.pdx.edu/IOA/newsom/SQMS/

This series is comprised of two-day courses on data analysis taught by nationally recognized methodological experts. Course descriptions and more information about instructors can be found at the website.
The goal of the Series is to provide additional statistical and
methodological training for research professionals from either the private or public sector. Although course credit is not available, graduate students are welcome and offered a discounted fee.

Participants may enroll in courses separately or in combination.

Each course takes an applied perspective with special attention given to when and how to implement each technique. Statistical, mathematical, and conceptual foundations will be included with the objective of providing a solid introduction to each area. All courses will provide extensive software illustrations, and, unless otherwise specified, will provide computer lab time where participants have one-on-one assistance available when running computer examples. Some graduate-level coursework in statistics (social science departments or otherwise) and some experience with one or more statistical software packages are usually assumed.

Individual courses may require additional prerequisite knowledge as indicated, however.

All classes will be held at the Portland State University Campus
located in beautiful downtown Portland, OR. The campus is within easy walking distance of many local restaurants and attractions such as the Portland Art Museum, the Portland Farmer´s Market, brewpubs, and wine bars.

Early registration deadline is June 1, 2010.

*Karen here: I’m happy to pass along announcements of any workshops that I think may be of interest to my readers.

Disclaimer: These are not an endorsement (I don’t know these people), and I don’t get any kickbacks. I’m just spreading the news.  My opinion is you can’t have too much statistics learning.


Bookmark and Share

What Makes a Statistical Analysis Wrong?

January 21st, 2010

One of the most anxiety-laden questions I get from researchers is whether their analysis is “right.” I’m always slightly uncomfortable with that word because often there is no one right analysis.

It’s like finding Mr. or Ms. Right—most of the time, there is not just one Right. But there are many that are clearly Wrong.

Luckily, what makes an analysis right for your analysis is more easily defined than what makes a person right for you. It pretty much comes down to two things: whether the assumptions of the statistical method are being met and whether the analysis answers the research question.

Assumptions are very important. A test needs to reflect the scale of the variables, the study design, and issues in the data. A repeated measures study design requires a repeated measures analysis. A binary dependent variable requires a categorical analysis method.

But within those general categories, there are often many analyses that meet assumptions. A logistic regression or a chi-square test can both handle a binary dependent variable if there is only a single categorical predictor. But a logistic regression can also incorporate covariates, directly test interactions, and calculate predicted probabilities. A chi-square test can do none of these.

So you get different information from different tests. They answer different research questions.

An analysis that is correct from an assumptions point of view is totally useless if it doesn’t answer the research question. A data set can spawn an endless number of statistical tests (and you can spend an endless number of days running them) that don’t answer the research question. And the real bummer is it’s not always clear that the analyses aren’t relevant until you sit down to write up the research paper.

That’s why writing out the research questions in theoretical and operational terms is the first step of any statistical analysis. It’s absolutely fundamental. And I mean writing them in minute detail. Issues of mediation, interaction, subsetting, control variables, et cetera, should all be blatantly obvious in the research questions.

The part on writing results sections in Daryl Bem’s chapter “Writing the Empirical Journal Article” is an excellent resource for planning a data analysis. It contains the best examples I’ve ever seen on how to write testable research questions. Thinking about how to write results before solidifying the research questions ensures the analysis is able to answer the questions. Whether the answer is what you expected or not is a different issue.

So when you are concerned about getting an analysis “right,” clearly define the design, variables, and data, but most importantly, get explicitly clear about what you want to learn from this analysis. Once you’ve done this, it’s much easier to find the statistical methods that answers the research questions and meets assumptions.


Bookmark and Share