statisticians, I need your help
-
I need statistics help. I badly need statistics help. Involves three groups, several tests that don't test exactly the same thing, missing data. This is for analysis of data collected over a year ago in Brazil. It's the data I have and I'm stuck with it, so I hope that I can make valid analyses of it. I've been studying the book Understanding Research in Second Language Learning by James Dean Brown for nearly two months now, along with other statistics texts, I've been combing the internet for information, and I've emailed tutors near where I live. But I still feel lost. Any advice, guidance or pointers will be welcomed. I'm studying Brazilians' perception of English word-final /l/, in words like "pool," "pile," "goal." For my tests, there were three groups: 20 Advanced Brazilian students of English 20 Intermediate Brazilian students of English 9 Native speakers of English Each group took the same 6 tests of word perception. The tests were to evaluate perception of the sound /l/ at the end of English words (pool, pile, pal, etc.) following several different vowels in one syllable words. test 1 - 56 4-choice questions vowels ai, au, e, ei, i, o, u test 2 - 32 4-choice questions vowels ai, au, o, u test 3 - 28 2-choice questions vowels ai, au, e, ei, i, o, u test 4 - 20 2-choice questions vowels ai, au, o, u test 5 - 16 2-choice questions vowels ai, au, e, ei, i, o, u test 6 - 20 2-choice questions vowels a, ai, o, u There are not equal numbers of examples for each vowel. a 4 ai 30 au 26 e 16 ei 14 i 16 o 32 u 30 Missing data: in many cases, participants did not answer every item. I have dealt with this through pairwise deletion, as I don't feel I have enough participants to substitute means, and entirely removing anyone who didn't respond to a question would cut my sample in half! Is pairwise deletion the best way to deal with the missing data, given my small sample? For each participant, I have an overall error percentage and an error percentage by vowel. With missing values, I can't use total error numbers, so I have to calculate percentages. These could be converted to scores or ranks if a test needed that. Is there anything wrong with comparing percentages? Should I weight the percentages based on number of responses? Some tests of some participants had very high non-response rates, but most tests had very low non-response rates. There was no middle ground - either very high non-response, or very low. As an "objective" criterion, if the participant didn't respond to 25% of the items or more in a particular test, I removed that test from consideration. I could choose any percentage down to 7.5% and the same tests would be excluded. Is this a valid way to remove "bad" results? I'm concerned that I don't have equal numbers of items for each vowel, but I can't be more specific. What, if anything, is wrong with having unequal numbers? The overall error rates are roughly normally distributed. I haven't checked on the rates per vowel and I know I need to do that. I plan to compare error rates among the groups and have already run some ANOVA tests. I also have a lot of personal data about the participants (things like years of study, age, etc.) and I plan to test correlations. Is there anything I'm obviously missing or not thinking about?
-
Answer:
Might you be a scientist/grad student? I've known of several doctoral candidates who have turned to the fine folks at http://www.minimaxconsulting.com/?gclid=CO7buLeTsJACFQJFgQodcku-Hw for assistance. These http://www.stat-help.com/ do it for free, and there are several sites like this that I use when I get lost (sorry, I can't provide more specific information, I'm not great with statistics either) Also, my adviser has told me several times to spend a lot of time with the statisticians at happy hour. Being their best friends and offering them their names on your paper are often the only ways to convince them to help with complicated projects not of their own devising (though, of course, cash can be a good bribe) Best of luck!
dmo at Ask.Metafilter.Com Visit the source
Other answers
Also, immense amounts of gratitude to anyone who sifts through all this!
dmo
These thoughts are more 'huh' than 'the guru speaks,' so... The missing data is data; you can correlate 'no response' with your demographics. If you've got lots of demographics, it's always fun to whack it with a big factor analysis or cluster analysis hammer to see what squirts out the other side. More specifically, you could treat the vowels as a continuum in vocal space. You needn't get deep into linguistics to do this; forward (extreme = 'oo') vs rear ('aah') placement could suffice. Tongue position is a second dimension. Check against diphthongs to see if vowel complexity is serving as noise.
dragonsi55
I can't really make inroads in your overall stats problem, but I did find one questionable practice: "Some tests of some participants had very high non-response rates, but most tests had very low non-response rates. There was no middle ground - either very high non-response, or very low." Is it possible that non-response is really a response on those tests? If I don't know or didn't hear the /l/ sound, would I get a no-response score? That would be a serious problem in choosing to drop people on that basis. Also, why can't you collapse all of the tests into one test? If you give the subjects percent-correct score, it wouldn't matter if they didn't take every test. The most important thing is to be clear about what you want to know. If you go blindly running with a factor analysis and (a) don't know what the results mean and/or (b) don't know what you want to know, you're highly likely to do something stupid. Stick with the light stuff like ANOVA, it sounds good enough based on my understanding of your question (i.e., do they make more errors?).
parkerjackson
can you share your raw data for us to play with? (may want to clear that with your IRB or at least your advisor)
laminarial
This sounds straightforward enough. ANOVAs are equivalent to a regression, except regressions are more flexible and easier to set up. You also have the whole familiar suite of regression analysis statistics to work with. It sounds like the basic model is y = A + B_type * X_dummie_type +C_group*X_dummie_group where y is the percent correctly answering that question. A is the baseline p of getting a question randomly. The X's are dummied out {0,1} variables for each person and question representing what group they're in and what kind of question it is. Probably better is to go to a logistic regression where the outcome is y={0,1} based on if they got that question right. Then, logit(y)=A+B_type*X_dummie_type +C_group*X_dummie_group +D*demographic_var (+E*test_type?) You could add a term for each question if you don't think that they are equivalent in difficulty aside from the vowel, but then you're comparing coefficients across groups of question. You can also stratify by language group, or check for interaction terms. It seems to me like "didn't answer" is the same as "didn't know" for the purpose of your study.
a robot made out of meat
Related Q & A:
- I need desperate help.Best solution by Yahoo! Answers
- I need Ebay help.Best solution by Yahoo! Answers
- I need some help about yahoo answers.Best solution by Yahoo! Answers
- I need major help with iTunes.Best solution by Yahoo! Answers
- I need slight help with filling out the FAFSA.Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.