What is the difference between principal component analysis and factor analysis?

Problem with multiple comparisons. Can someone explain how to deal with this in the situation described?

  • My question is simultaneously philosophical and practical. It's practical because it stems from an analysis I need to do. I have about 16 data points divided between 3 different groups. Each data point has very many attributes I can look into. Let's say the combinations of attributes are about 100 ... Any statistical test that I do on the attributes will never reach significance as soon as I account for multiple comparisons. However, if I were to just pick a few attributes without knowledge of the results, I'd likely come up with a few statistically significant findings. So, essentially, I'm just penalizing myself and the chances of finding anything by comparing more and more things. This seems like a paradox to me. Either corrections for multiple comparisons are far too conservative in my case (which is likely given that many of the attributes are probably strongly correlated or provide redundant information), or it seems like there's a philosophical problem here, though I intuitively understand the rationale behind correcting for multiple comparisons (you're just more likely to find differences purely due to chance). The problem is this: imagine I were to randomly pick an attribute (or a few attributes) to measure and focus on a priori. I'd likely find a statistically significant difference in some of the attributes between groups. And there's a chance that this difference occurred just by pure chance (and that is my p value). This is already measured and computed. Now, let's say I start to measure many additional attributes. How does measuring additional attributes actually change the truth? In other words, just because I measure more and more stuff, how does this actually change the probability of the first measurement being just due to chance? If in one parallel universe a person did an experiment where she measured quantity A, and in another universe she measured quantity A as well as B, C, D, E, F, etc., if all else is equal, chances of incorrectly rejecting the null hypothesis for A would come out differently in both universes, and yet, the physical reality of A, its measurement, and the population the data points are drawn out of are all identical in both universes. So, here's a way I'm proposing looking at it, and I'm wondering if there's ever justification to look at things this way. Can one not account for multiple comparisons and instead say that we're asking about each attribute separately (i.e. does the grouping factor affect this particular attribute?), as opposed to saying we're interested in finding any difference whatsoever among any/all the attributes? I mean, these are two different questions: (1) "are the groups different with regard to attribute A, and, separately, are the groups different with regard to attribute B, and C, ... etc.?"  versus (2) "Are the groups different with regard to any of the attributes?" They're not the same question. So why should they be addressed the same way? Alternatively, is there a way to do a post hoc analysis on the attributes that seem promising and ignore the others? If not, in what way can I actually highlight some of the differences between groups? I'm finding the whole thing a little puzzling, because intuitively, it seems clear from the data that there are some real differences, not just due to chance.

  • Answer:

    I think using permutation statistics can help you get an intuitive handle on multiple comparison problems like this. Let's say that, hypothetically, brain scans are a poor tool for diagnosing our three clinical groups (A, B, and C), and therefore, none of our data points are going to help us predict someone's diagnosis. If this were true, then any differences we observed in our current data set arose just by chance. In fact, we would expect to see differences of the same size if we assigned our brain data to the three groups completely at random. Using permutation statistics we can test this assumption empirically. First, assign each subject to a random group. Then run all your same tests again and look for the largest effect size (t-value) that you get for any measure in any brain region. Given your large number of variables and brain regions this t-value is likely going to be quite big - (and that just occurred by chance!) So the real question is, is your original data so outstanding that it convinces you that these subjects weren't assigned to the groups at random? Are the results so shockingly different from all the possible random assignments (permutations) that we have to reject the null hypothesis? Again let's test this empirically. Shuffle your original subjects again, and again record the largest t-value. Now do it 1000 more times (automated of course using a free program like R) getting a distribution of your maximum "by chance" t-values, which you can then plot like a histogram. Now comes the fun part - compare this histogram to the maximum t-values in your real dataset. How do they stack up? If all your real t-values are sitting right in the middle of the distribution then there's nothing convincing in this dataset - either the technique doesn't work, or you don't have enough data to answer such a complex question (do any of these three groups differ on any of these measures in any of these brain regions). BUT, if any of your real t-values are waaaay off in the tails (a 5% criteria works just like your regular alpha level) then you have convincing evidence that these differences are real. In fact, any tests beyond the 5% cutoff can be safely regarded as significant. This technique might not give you the answer you're looking for, but it will give you an appreciation for what ridiculously (t=6.5!?) significant looking results you can see sometimes when performing extensive multiple comparisons. It's always hard finding a needle in a haystack, and this might give you an appreciation for why.

Trevor Brothers at Quora Visit the source

Was this solution helpful to you?

Other answers

(It has been years since the last time I work on statistical hypothesis testing. I am not sure if I remember everything correctly.) In statistical tests, we usually control αα\alpha, the Type I error rate, which is the probability of incorrectly rejecting the null hypothesis (assume it is a simple hypothesis). There is another kind of error, called the Type II error, which occurs when we fail to reject a false null hypothesis. The rate of Type II error is usually more difficult to track than the rate of Type I error, so we usually control the Type I error rate only (and hope that the Type II error rate is not too much). Let's say we would perform an experiment containing three tests using attributes A,B,C, and we would reject the null hypothesis when at least one of the tests rejects it. We want to keep the overall probability of Type I error at α¯=0.05α¯=0.05\bar{\alpha}=0.05. The problem is that, if we keep the probability of Type I error of each of the three tests at αA=αB=αC=0.05αA=αB=αC=0.05\alpha_A=\alpha_B=\alpha_C=0.05, the overall probability of Type I error may be greater than 0.05. The more tests we perform, the more likely we incorrectly reject the null hypothesis in one of those tests. Therefore sometimes we would reduce the alphas αA,αB,αCαA,αB,αC\alpha_A,\alpha_B,\alpha_C for the individual tests so that the overall alpha α¯α¯\bar{\alpha} is kept (approximately) at the desired value. See http://en.wikipedia.org/wiki/Multiple_comparisons#What_can_be_done for some correction methods. Note that the alphas have different meanings. αA,αB,αCαA,αB,αC\alpha_A,\alpha_B,\alpha_C are the probability of Type I error for individual tests, while α¯α¯\bar{\alpha} is the probability of Type I error of the whole experiment. If we keep the probability of Type I error of individual tests constant, adding the tests using attributes B,C won't affect the probability that the test using A alone gives Type I error (that is αAαA\alpha_A). But it will increase the probability that at least one test gives Type I error (that is α¯α¯\bar{\alpha}). It is inaccurate to say that "chances of incorrectly rejecting the null hypothesis for A would come out differently in both universes". The Type I error rate is usually something we would control (or try to control). If we decide to keep αAαA\alpha_A at the same value for both universes, then they are the same. Adding more tests won't change αAαA\alpha_A; it will only change α¯α¯\bar{\alpha} (which makes perfect sense as the experiment has changed and contains more tests). If we decide to keep α¯α¯\bar{\alpha} at the same value instead (or use certain correction to keep α¯α¯\bar{\alpha} approximately at the desired value), then αAαA\alpha_A may be different in the two universes, but it is just because we decide to perform the test using different values of αAαA\alpha_A. I am unsure about other forms of tests which does not control the Type I error rate. Things may be a bit different if we consider the http://en.wikipedia.org/wiki/P-value instead. My answer may not be useful if this is the case.

Ivan Li

In most natural occurring situations, it will be better to use multiple comparisons on all attributes than to select a subset of attributes completely at random.  At the very least, you will usually not do much worse by testing all hypotheses. An example in which it would be better to randomly select attributes is if, for some reason, the non-null attributes only have "small" p-values but never have "extremely small" p-values.  But even this strange possibility can be ruled out if, for example, your data points have a non-negligible amount of measurement error. On the other hand, if you do have prior knowledge about which attributes are likely to be important, *then* you have an important choice to make.  In many cases you will be better off only testing those pre-selected attributes. It is easy to get confused if you view frequentist statistical procedures in terms of "truth" or "evidence."  For example, rolling a twenty-sided die provides a universal hypothesis test at p=0.05, but you should not interpret rolling a 1 as "evidence" that the sun has exploded (relevant xkcd: http://xkcd.com/1132/).  Instead, you should view frequentist statistical procedures as completely specified algorithms and then judge them by their average performance on random instances of the data. As a result of this, the interpretation of the same data may vary depending on which algorithm you choose.  (This is indeed counterintuitive, and it is one justification for using a more intuitive Bayesian approach.) Here are three different reasonable procedures you could have used for your thought experiment--all potentially producing differing interpretations of your data. Procedure 1: Choose a random attribute and do a classical hypothesis test at level αα\alpha.  Then stop. Procedure 2: Choose a random attribute and do a classical hypothesis test at level αα\alpha.  If it is significant, choose another random attribute and test it at level αα\alpha.  Continue in this way until you accept the null hypothesis. Procedure 3:  Test all the hypothesis, adjusting for multiple comparisons.

Charles Yang Zheng

Sixteen data points on three groups, pre and post experiment? You don't have enough data for anything. I would get the theory straight in your head and simplify it to the key elements, then control as much of the extraneous variables as possible in an experimental setting, and run the subjects one at a time until you have sufficient data. Use Bayesian stats if you can manage it, but to be honest if you have to look at the stats to find a significant difference, it isn't there.

Donald McMiken

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.