What statistical analysis method can I use to find the relationship between a disease (single variable) and environmental factors (multiple variables)?
-
I want to find the relationship between a disease incidence in a region and a number of other environmental factors such as temperature, elevation etc. I have tables containing this data for a particular country. The data files are raster files of the country map with each pixel having certain numeric value for the particular parameter. Example [126, 540, 359...] say disease cases or [23.34, 19.02...] for temperature etc. I have read about multiple regression, Pearson Chi square test, t-tests etc. But I didn't find them suitable: Multiple regression because what I read it assumes a linear relationship between the variables. Chi square test applies to categorical data. t-tests for small datasets. I've also read about EOF (Empirical Orthogonal Function) but it requires time series data. CCA (Canonical Correlation Analysis) but it finds the relationship between two sets of multiple variables. So, Which statistical method can I use to find the relationship between these variables? I want to use a method that gives a relationship, say, more is the elevation, more is the number of disease incidence etc. Without assuming anything before hand like, "there'd be a linear relationship between the variables, so using linear relationship model." Also, is some method I mentioned appropriate but my understanding is not correct? I don't necessarily need to find relationship one to many, it'd be fine to use one on one methods. On a side note, I'm working with GIS data and would use Python or R modules to find and plot the relationship. Thanks.
-
Answer:
[Disclaimer: I'm a "punk statistician", not a real one :-P] You're saying that there may be an arbitrarily complex relationship between your variables and the target, and you want to model this relationship... that's generally a really difficult problem. Typically the way one goes about this is to assume the relationship follows some parametric model: e.g., linear, polynomial, exponential. Then you can try to fit the model to the data and see how good the fit is. In the simplest case you'd test for a linear relationship and look at correlation or R2 score, as you mentioned. As you pick more and more complex models, you need to start worrying about the bias variance dilemma [i.e., your model might get a perfect fit, but it'll be fitting noise in the training data; see http://en.wikipedia.org/wiki/Bias%E2%80%93variance_dilemma]. The obvious question is: how do I pick a model to fit? No easy answer here, unfortunately. One way to do this is to just look at the data and get a feel for what the possible relationships may look like. Then pick the SIMPLEST model that you feel may capture the relationship (see again bias-variance). The ultimate test is to see how well your model can predict the target for new (or held out) data points. Another option is to fit a non-parametric model. These models are great if your ultimate goal is to make predictions for novel data points (e.g., what will be the disease incidence if temperature rises to 45, and elevation is X?). You could try random forest regression [see http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor], or radial basis functions [see http://en.wikipedia.org/wiki/Radial_basis_function]. The main disadvantage to these types of models is that while they may be able to make good predictions, they are not as interpretable. When relationships are really complex though, this is your best bet.
Boris Babenko at Quora Visit the source
Other answers
I'd say: plot it. It is the best way to understand how the data works. Then you can try some functions which seem to describe best the plotted data. After that, apply the least square method for each function, and use the function with the lowest square distance. And to be honest there isn't really a true mathematical way to derive a function from a plot - after all, this is statistics, not mathematics.
Gerwin Dox
Related Q & A:
- Can I use my Philippine passport instead of a US passport?Best solution by Yahoo! Answers
- Can I use an Xbox 360 controller on a PC?Best solution by Super User
- Can I use my Military ID to get a Passport?Best solution by Yahoo! Answers
- How can I use to prepaid cards to make a purchase?Best solution by Yahoo! Answers
- Can I use my Wells Fargo ATM at a Bank Of America ATM?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.