Posts Tagged ‘Logistic Regression’

Interpreting Regression Coefficients in Models other than Ordinary Linear Regression

Tuesday, January 5th, 2010

Someone who registered for my upcoming Interpreting (Even Tricky) Regression Models workshop asked if the content applies to logistic regression as well.

The short answer: Yes

The long-winded detailed explanation of why this is true and the one caveat:

One of the greatest things about regression models is that they all have the same set up: (more…)

Chi-square test vs. Logistic Regression: Is a fancier test better?

Monday, November 9th, 2009

I recently received this email, which I thought was a great question, and one of wider interest…

Hello Karen,
I am an MPH student in biostatistics and I am curious about using regression for tests of associations in applied statistical analysis.  Why is using regression, or logistic regression “better” than doing bivariate analysis such as Chi-square?

I read a lot of studies in my graduate school studies, and it seems like half of the studies use Chi-Square to test for association between variables, and the other half, who just seem to be trying to be fancy, conduct some complicated regression-adjusted for-controlled by- model. But the end results seem to be the same. I have worked with some professionals that say simple is better, and that using Chi- Square is just fine, but I have worked with other professors that insist on building models. It also just seems so much more simple to do chi-square when you are doing primarily categorical analysis.

My professors don’t seem to be able to give me a simple justified
answer, so I thought I’d ask you. I enjoy reading your site and plan to begin participating in your webinars.

Thank you!

My response:

Gee, thanks.  I look forward to seeing you on the webinars.

Per your question, there are a number of different reasons I’ve seen.

You’re right that there are many situations in which a sophisticated (and complicated) approach and a simple approach both work equally well, and all else being equal, simple is better.

Of course I can’t say why anyone uses any particular methodology in any particular study without seeing it, but I can guess at some reasons.

I’m sure there is a bias among researchers to go complicated because even when journals say they want simple, the fancy stuff is so shiny and pretty and gets accepted more.  Mainly because it communicates (on some level) that you understand sophisticated statistics, and have checked out the control variables, so  there’s no need for reviewers to object.  And whether any of this is actually true, I’m sure people worry about it.

Including controls truly is important in many relationships.  Simpson’s paradox, in which a relationship reverses itself without the proper controls, really does happen.

Now you could debate that logistic regression isn’t the best tool.  If all the variables, predictors and outcomes, are categorical, a log-linear analysis is the best tool.  A log-linear analysis is an extension of Chi-square.

That said, I personally have never found loglinear models  intuitive to use or interpret.  So, if given the choice, I will use logistic regression.  My personal philosophy is that if two tools are both reasonable, and one is so obtuse your audience won’t understand it, go with the easier one.

Which brings us back to chi-square.  Why not just use the simplest of all?

A Chi-square test is really a descriptive test, akin to a correlation.  It’s not a modeling technique, so there is no dependent variable.  So the question is, do you want to describe the strength of a relationship or do you want to model the determinants of and predict the likelihood of an outcome?

So even in a very simple, bivariate model, if you waonnt to explictly define a dependent variable, and make predictions, a logistic regression is appropriate.

June Webinar: What Happened to R squared?: Assessing Model Fit for Logistic, Multilevel, and Other Models that use Maximum Likelihood

Tuesday, June 23rd, 2009

Have you ever been dismayed to discover that there is no R-squared for models that use Maximum Likelihood estimation–multilevel models, logistic regression, among others? Instead there are a scattering of foreign-sounding statistics on your output?

This webinar will give you an overview of the fit statistics that are available for these models: what they mean, how to calculate them, how to use them, and why R squared isn’t applicable. Statistics include:

  • -2 log likelihood
  • BIC
  • Pseudo R-squared

Date: Wednesday, June 24, 2009

Time: 1pm Eastern Time (12pm Central, 11am Mountain, 10am Pacific) GMT - 5

Where: Anywhere you have a fast internet connection

Length of Program: An Hour

Cost: Always FREE

For More Information and to register: http://www.analysisfactor.com/learning/teletraining8.html

About The Craft of Statistical Analysis Training Webinars

The Analysis Factor’s monthly training webinars provide you with practical information to take the frustration out of statistical analysis. Each month’s training will cover topics with real issues that researchers face in data analysis. These events are always free and conducted by Karen Grace-Martin, a professional statistical consultant, who will be available to answer your questions.

See the schedule of upcoming webinars

Multiple Imputation of Categorical Variables

Monday, June 1st, 2009

Most Multiple Imputation methods assume multivariate normality, so a common question is how to impute missing values from categorical variables.

Paul Allison, one of my favorite authors of statistical information for researchers, did a study that showed that the most common method actually gives worse results that listwise deletion.  (Did I mention I’ve used it myself?) (more…)

Why Logistic Regression for Binary Response?

Tuesday, May 5th, 2009

Logistic regression models can seem pretty overwhelming to the uninitiated.  Why not use a regular regression model?  Just turn Y into an indicator variable–Y=1 for success and Y=0 for failure.

For some good reasons.

1. It doesn’t make sense to model Y as a linear function of the parameters because Y has only two values.  You just can’t make a line out of that (at least not one that fits the data well).

2. The predicted values can be any positive or negative number, not just 0 or 1.

3. The values of 0 and 1 are arbitrary. The important part is not to predict the numerical value of Y, but the probability that success or failure occurs, and the extent to which that probability depends on the predictor variables.

So okay, you say.  Why not use a simple transformation of Y, like probability of success–the probability that Y=1.

Well, that doesn’t work so well either.

Why not?

1. The right hand side of the equation can be any number, but the left hand side can only range from 0 to 1.

2. It turns out the relationship is not linear, but rather follows an S-shaped (or sigmoidal) curve.

To obtain a linear relationship, we need to transform this response too, Pr(success).

As luck would have it, there are a few functions that:

1. are not restricted to values between 0 and 1

2. will form a linear relationship with our parameters

These functions include:

Arcsine

Probit

Logit

All three of these work just as well, but (believe it or not) the Logit function is the easiest to interpret.

But as it turns out, you can’t just run the transformation then do a regular linear regression on the transformed data.  That would be way too easy, but also give inaccurate results.  Logistic Regression uses a different method for estimating the parameters, which gives better results–better meaning unbiased, with lower variances.

Understanding Probability, Odds, and Odds Ratios in Logistic Regression

Wednesday, March 25th, 2009

Free teleseminar today:

Understanding Probability, Odds, and Odds Ratios in Logistic Regression

Ever find that interpretations of odds ratios are a little, well, strange?

Odds ratios are the bane of many data analysts. Interpreting them can be like learning a whole new language. This teleseminar will go over an example to show how to interpret the odds ratios in binary logistic regression. You will learn:

  • how probability and odds both measure the same thing on different scales
  • the meaning of odds
  • how to interpret an odds ratio for continuous and categorical predictors in logistic regression

Date: Wednesday, March 25, 2009

Time: 1pm Eastern Time (12pm Central, 11am Mountain, 10am Pacific)

Register to get call in info at: http://www.analysisfactor.com/learning/teletraining5.html

Understanding Probability, Odds, and Odds Ratios in Logistic Regression

Friday, March 20th, 2009

I’m happy to announce The Analysis Factor’s next free Teleseminar:

*Understanding Probability, Odds, and Odds Ratios in Logistic Regression*

You will learn:

* how probability and odds both measure the same thing on different scales
* the meaning of odds
* how to interpret an odds ratio for continuous and categorical predictors in logistic regression

Find out more and register at:
http://www.analysisfactor.com/learning/teletraining5.html.

*Date*: Wednesday, March 25, 2009
*Time*: 1pm Eastern, 10am Pacific

>> How it Works <<

Each month, The Analysis Factor offers a free statistics teleseminar.

On what is essentially a giant conference call, I will talk for 30-40 minutes on a very specific applied statistical topic. I will stop frequently for questions, and at the end you can ask questions relevant to your research. Each call covers an issue many researchers get stuck on when practicing statistics.

You need to register for the call to get call-in instructions and directions for downloading handouts. Spots are limited, so register early.

The call will be recorded, so if you miss it, you can still listen in.

Listen over and over if it helps. Or listen again as the topic becomes relevant to your research. But you need to register to get access to the recording.

Whether you can make the call live or not, register now at:

http://www.analysisfactor.com/learning/teletraining5.html

When NOT to Center a Predictor Variable in Regression

Monday, February 9th, 2009

There are two reasons to center predictor variables in any time of regression analysis–linear, logistic, multilevel, etc.

1. To lessen the correlation between a multiplicative term (interaction or polynomial term) and its component variables (the ones that were multiplied).

2. To make interpretation of parameter estimates easier.

I was recently asked when is centering NOT a good idea? (more…)

A call for regression analyses

Wednesday, January 14th, 2009

Want some free statistical consulting?  Direct feedback on how to interpret your linear or logistic regression output?

I’d like to try something new in a couple upcoming teleseminars, suggested by a participant in January’s teleseminar.   But I need your help.  If I don’t get enough response, we won’t be able to do it.

For each teleseminar, I need 2-3 researchers who would like help going over their Linear or Logistic Regression Output.  We will go over how to interpret coefficients, evaluate the model, and I will answer any other questions you have about the analysis, as time permits.  I will spend 10-15 minutes on each one in the teleseminar.

If you are interested, it’s best if you can be on the call.  I am tentatively scheduling Linear Regression on February 25th and Logistic Regression on May 27th.  Both are at 1pm Eastern.

I could just use a generic example, but I think it will be more engaging and informative to work with real research.  If you are working on a linear or logistic regression and are interested, here is your chance for some free consulting.  Please comment below or email me: karen at analysisfactor dot com.

Poisson Regression Analysis for Count Data

Wednesday, December 31st, 2008

There are many dependent variables that no matter how many transformations you try, you can not get to be normally distributed.  The most common culprit are count variables–the variable measures the count or rate of some event in a sample.  Some examples I’ve seen from a variety of disciplines are:

Number of eggs in a clutch that hatch
Number of domestic violence incidents in a month
Number of times juveniles needed to be restrained during tenure at correctional facility
Number of infected plants per transect

A common quality of these variables is that 0 is the mode–the most common value.  1 is the next most common, 2 the next, and so on.  In variables with low expected counts (number of cars in a household, number of degrees earned), (more…)