Is the mean always greater than the median in a right skewed distribution?

July 3rd, 2009

One of the basic tenets of statistics that every student learns in about the second week of intro stats is that in a skewed distribution, the mean is closer to the tail in a skewed distribution.

So in a right skewed distribution (the tail points right on the number line), the mean is higher than the median.

It’s a rule that makes sense, and I have to admit, I never questioned it.

But a great article in the Journal of Statistical Education shows that it really only holds in idealized, unimodal, continuous distributions:  http://www.amstat.org/publications/jse/v13n2/vonhippel.html.

On Puzzles, Statistics, Algorithms, and Understanding

July 1st, 2009

My 8 year-old son got a Rubik’s cube in his Christmas stocking this year.

I had gotten one as a birthday present when I was about 10.  It was at the height of the craze and I was so excited.  I distinctly remember bursting into tears when I discovered that my little sister sneaked playing with it, and messed it up the day I got it.  I knew I would mess it up to an unsolvable point soon myself, but I was still relishing the fun of creating patterns in the 9 squares, then getting it back to 6 sides of single-colored perfection.  (I loved patterns even then). Read the rest of this entry »

New version released of Amelia II: A Program for Missing Data

June 30th, 2009

A new version of Amelia II, a free package for multiple imputation, has just been released today.  Amelia II is available in two versions.  One is part of R, and the other, AmeliaView, is a GUI package that does not require any knowledge of the R programming language.  They both use the same underlying algorithms and both require having R installed.

At the Amelia II website, you can download Amelia II (did I mention it’s free?!), download R, get the very useful User’s Guide, join the Amelia listserve, and get information about multiple imputation.

If you want to learn more about multiple imputation:

Beyond Median Splits: Meaningful Cut Points

June 26th, 2009

I’ve talked a bit about the arbitrary nature of median splits and all the information they just throw away.

But I have found that as a data analyst, it is incredibly freeing to be able to choose whether to make a variable continuous or categorical and to make the switch easily.  Essentially, this means you need to be Read the rest of this entry »

June Webinar: What Happened to R squared?: Assessing Model Fit for Logistic, Multilevel, and Other Models that use Maximum Likelihood

June 23rd, 2009

Have you ever been dismayed to discover that there is no R-squared for models that use Maximum Likelihood estimation–multilevel models, logistic regression, among others? Instead there are a scattering of foreign-sounding statistics on your output?

This webinar will give you an overview of the fit statistics that are available for these models: what they mean, how to calculate them, how to use them, and why R squared isn’t applicable. Statistics include:

  • -2 log likelihood
  • BIC
  • Pseudo R-squared

Date: Wednesday, June 24, 2009

Time: 1pm Eastern Time (12pm Central, 11am Mountain, 10am Pacific) GMT - 5

Where: Anywhere you have a fast internet connection

Length of Program: An Hour

Cost: Always FREE

For More Information and to register: http://www.analysisfactor.com/learning/teletraining8.html

About The Craft of Statistical Analysis Training Webinars

The Analysis Factor’s monthly training webinars provide you with practical information to take the frustration out of statistical analysis. Each month’s training will cover topics with real issues that researchers face in data analysis. These events are always free and conducted by Karen Grace-Martin, a professional statistical consultant, who will be available to answer your questions.

See the schedule of upcoming webinars

Position Announcement: Statistical Consultant at Cornell University

June 22nd, 2009

I don’t usually post job announcements here, but this one is special.  I am posting it for one of my colleagues–please don’t send responses to me.  This is the position I used to have at Cornell, and it’s a great job.  And fyi, although it says Division of Nutritional Sciences, you will consult across 5 colleges.  For application information go to:

https://cornellu.taleo.net/careersection/10164/jobdetail.ftl

Statistical Consultant-10807

Description

The Division of Nutritional Sciences at Cornell University is seeking a full-time Statistical Consultant.  Responsibilities include providing statistical consulting and research support for faculty, graduate and undergraduate students and research staff through the Cornell Statistical Consulting Unit (CSCU).
Specific responsibilities include providing general and advanced statistical and methodological consulting for clients; providing instructional workshops and other training on basic and advanced statistical techniques; writing documentation and expository instructional handouts on applied statistical topics; testing and adapting statistical software and writing statistical procedures; assisting clients in identifying appropriate software to run their statistical analysis and in integrating statistical and other analytical techniques into their research; and work on various contract projects handled by CSCU .

This position is a one-year full-time position that is renewable contingent upon funding availability.

Qualifications

Required:

Bachelor’s degree with 3-4 yrs experience with research or supporting research preferred or equivalent combination required. Sound knowledge of theoretical statistics as well as experience in their application. Demonstrated excellent organizational, interpersonal and communication (oral and written) skills. Enthusiasm for working with and the ability to communicate technical information in an understandable way to a large and diverse clientele. Experience using various statistical packages required. Interest in learning new statistical techniques a must. Sets and maintains high professional standards and takes personal responsibility for making things happen.


Preferred:

Master’s degree in statistics or biometry preferred.

No relocation assistance is provided for this position.

Cornell University, located in Ithaca, New York, is an inclusive, dynamic, and innovative Ivy League university and New York’s land-grant institution. Its staff, faculty, and students impart an uncommon sense of larger purpose and contribute creative ideas and best practices to further the university’s mission of teaching, research, and outreach.

Cornell University is an equal opportunity, affirmative action educator and employer.

Observed Values less than 5 in a Chi Square test–No biggie.

June 19th, 2009

I was recently asked this question about Chi-square tests.  This question comes up a lot, so I thought I’d share my answer.

I have to compare two sets of categorical data in a 2×4 table. I cannot run the chi-square test because most of the cells contain values less than five and a couple of them contain values of 0. Is there any other test that I could use that overcomes the limitations of chi-square?

And here is my answer: Read the rest of this entry »

A Fabulous Guide to Writing the Statistical Section of Grant Proposals

June 17th, 2009

Spending the summer writing a research grant proposal?  Stuck on how to write up the statistics section?

An excellent handbook that outlines how to prepare the statistical content for grant proposals is “Statistics Guide for Research Grant Applicants.” Sections include “Describing the Study Design”, “Sample Size Calculations”, and “Describing the Statistical Methods,” among others.

The navigation for the guide is not obvious–it is in the left margin menu, among other menus, toward the bottom. You have to scroll down from the top of the page to see it.

The authors, JM Bland, BK Butland, JL Peacock, J Poloniecki, F Reid, P Sedgwick, are statisticians at St. George’s Hospital Medical School, London.

3 Ad-hoc Missing Data Approaches that You Should Never Use

June 15th, 2009

The default approach to dealing with missing data in most statistical software packages is listwise deletion–dropping any case with data missing on any variable involved anywhere in the analysis.  It also goes under the names case deletion and complete case analysis.

Although this approach can be really painful (you worked hard to collect those data, only to drop them!), it does work well in some situations.  By works well, I mean it fits 3 criteria:

- gives unbiased parameter estimates

- gives accurate (or at least conservative) standard error estimates

- results in adequate power.

But not always.  So over the years, a number of ad hoc approaches have been proposed to stop the bloodletting of so much data.  Although each solved some problems of listwise deletion, they created others.  All three have been discredited in recent years and should NOT be used.  They are:

Pairwise Deletion: use the available data for each part of an analysis.  This has been shown to result in correlations beyond the 0,1 range and other fun statistical impossibilities.

Mean Imputation: substitute the mean of the observed values for all missing data.  There are so many problems, it’s difficult to list them all, but suffice it to say, this technique never meets the above 3 criteria.

Dummy Variable: create a dummy variable that indicates whether a data point is missing, then substitute any arbitrary value for the missing data in the original variable.  Use both variables in the analysis.  While it does help the loss of power, it usually leads to biased results.

There are a number of good techniques for dealing with missing data, some of which are not hard to use, and which are now available in all major stat software.  There is no reason to continue to use ad hoc techniques that create more problems than they solve.

5 Ways to Increase Power in a Study

June 12th, 2009

To increase power:

  1. Increase alpha
  2. Conduct a one-tailed test
  3. Increase the effect size
  4. Decrease random error
  5. Increase sample size

Sound so simple, right?  The reality is that although these 5 ways all work Read the rest of this entry »