How to calculate error for squared data?

Why leave out that one error bar?

  • A statistics / scientific convention question. I've noticed in scientific journals that often when a set of data is presented with values normalized to one of the sample groups, and the value for that sample group is arbitrarily set to 1, 10, 100 or whatever, to simplify interpretation, the variability/error data for that one sample group is left out. Is there a good statistical reason for that or is it just some random convention with no good reason? Here's an example: you have a set of data on the height of trees according to their age (say trees that are 5, 10 and 20 years old). You calculate the mean height and standard deviation for each age group. For whatever reason, you want to normalize the mean values for all three groups to the 5-year-old group and set that value to 1 to present the data. My question is why would people not show the standard deviation (adjusted for the normalization) for the 5-year-old group along with those for the other two groups.

  • Answer:

    I had a long explanation, but I couldn't explain it very well anyway, so here's a shorter one: To account for different outside conditions when an experiment is repeated at a different time, it's often useful to always normalize to an internal control that was taken the same day as the original data set. So on April 11 you measure something and normalize to the April 11 control, and on May 15 you repeat the experiment and normalize to the May 15 control. That way you rule out external influences that are very different on both days. (Maybe the airco was on in May but not yet in April.) Since they're both normalized to the internal control, both sets of data have a 100% control sample, and other variations are really due to whatever you're measuring. I can't explain this very well at all, and it doesn't fit with the tree example. But basically: the sets were individually set to the normalized value, and the error given is the one AFTER normalization (so it's 0 for the one that it's normalized to)

shoos at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

Easternblot, I understand what you mean, but that's a somewhat different question. The normalization you're describing is normalization to some relatively constant standard (say GAPDH signals in a northernblot :)) I'm talking about normalization to one of the experimental groups for the purpose of simplifying data interpretation, without showing the error/variation for the group to which the other groups are normalized. I don't know if you have access, but figures 1B, 1C and 2C in http://mcb.asm.org/cgi/content/full/26/1/362 published a few weeks ago show examples of what I'm talking about. (But, strangely, figure 5C does show the error for the normalizing group).

shoos

My guess is that when you do the normalization, you set the standard deviation for the normal group to zero (i.e. the result for group is defined as having a value of exactly 1, 10, or 100). The error in measuring the normal group remains in the data, however, as it is propogated through to the other normalized values according to the http://www.rit.edu/~uphysics/uncertainties/Uncertaintiespart2.html#muldiv. Intuitively, this seems valid. For whatever my intuituion is worth.... As for figure 5C in that paper you've linked to, I have no idea.

mr_roboto

shoos, that link needs a sign-in. It's an unusual way to do it, but a valid way would be to express all the uncertainty in the normalized data. For example, consider a data set a ± ua, b ± ub, c ± uc. If one normalizes on say b, one could then plot a/b ± a/b*√(ua2 + ub2), b (with no error bar) and c/b ± c/b*√(ub2 +uc2). I can't imagine how someone would think that that's a desirable way of doing things (unless b is an internal control), but its mathematically correct. Is this what your authors are doing?

bonehead

d'oh! That middle term is of course b/b (with no error bar). ....and mr_roboto beats me to it anyway. JINX!

bonehead

I agree with bonehead; I think it's clearer to just divide everything by an exact number equal to the mean of the normal group and leave all the error bars on.

mr_roboto

Ok, http://www.pubmedcentral.gov/picrender.fcgi?artid=16893&blobtype=pdf an article (pdf) that should be accessible to anyone in which they do the same thing, in figures 4, 5 and 6. All that is said about the error bars is that they represent standard deviations. Since I've never even heard of the method bonehead describes being used in biology research (the field I'm in), and haven't seen it suggested anywhere in the papers I've seen that do this sort of normalization, I'd doubt that that's what they are doing, although I may just be out of it.

shoos

(and I see I managed to get the math wrong anyway. Those uncertainties in the square roots are relative uncertainties, not absolute ones).

bonehead

The "it's clearer" hypothesis seems to be holding the day, but experimentally I think the general idea is this: When you use normalization, you're not making any comparisons between your experimental data and the "internal control" group that you divided by. Instead, you're comparing two different groups, each normalized on its own to an analogous baseline. So, the idea is that you don't need the variance for the normalization group, since you'll never ever run statistics on it. So, it just muddies the water and you can leave it out. Of course, the rest of us have to believe that there are good reasons to choose a particular normalization factor. I've certainly seen papers that made no sense because the normalization was inappropriate - but usually the pre-normalization data has to be shown before one can get away with it. (For example: you want to compare the growth rate of 20yr old trees between North and South America. To control for variation in tree type and whatnot, you normalize by the growth rate for 5 year old trees. In this scenario, you're not comparing anything to the population of 5 year old trees, so its variance is meaningless. Does that jive?

metaculpa

shoos http://ask.metafilter.com/mefi/31427#492747 "Since I've never even heard of the method bonehead describes being used in biology research (the field I'm in), and haven't seen it suggested anywhere in the papers I've seen that do this sort of normalization, I'd doubt that that's what they are doing, although I may just be out of it." The method bonehead describes is run-of-the-mill error propagation (give or take a couple of typos). I wouldn't expect them to describe something so mundane. metaculpa http://ask.metafilter.com/mefi/31427#492753 "In this scenario, you're not comparing anything to the population of 5 year old trees, so its variance is meaningless." Hold on, though: if you're measuring the growth rate of the 5-year-old trees, you need to propagate through the error on that measurement to the normalized growth rates for the 20-year-old trees, right? So the variance on that measurement does matter in that it will increase the variance of your reported data.

mr_roboto

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.