What are good ways to find a linear regression in case of incomplete information about output variable?
-
need to do a linear regression y <- x1 + x2+ x3 + x4 y is not known but instead of y we have f(y) which depends on y for example, y is a probability from 0 to 1 of a binomial distribution over 0, 1 and instead of y we have (the number of 0, the number of 1) out of (the number of 0 + the number of 1) experiments How should I perform linear regression to find correct y How should I take into account the amount of information provided that for some x1 x2 x3 we have n experiments which give high confidence value of y, but for other x1 x2 x3 we have low confidence value of y due to small number of measurements
-
Answer:
I'm trying to understan your question, and as far as I can see you are faced with a typical binary classification problem, and if so you should probably use some form of logistic regression, or large margin method to find your model. But let me rephrase your question sentence by sentence so that you can tell me if I misunderstood: 1) You have some feature vector x = (x1, x2, x3, x4,...) describing your experiment, and you observe whether some experiment was successful (outcome = 1) or not (outcome = 0); or when there are multiple experiments with the same feature vector you observe how many of those were successes and failures (the two are the same really, assuming that experiments are exchangeable, that is there is no information in the temporal order of the experiments). 2) The probability of success (which you refer to as y) is a function of the feature vector x. This is the function we'd like to uncover by regression. 3) For some feature vectors you may observe multiple experiments, but the number of experiments that share a particular feature vector varies. If my understanding is correct, you are faced with a binary classification problem and you should look into logistic regression or support vector machines. Probably all general purpose statistics software will have logistic regression implemented. If you want a very flexible, advanced, non-linear regression model which also gives you nice confidence band on your regression, I can recommend Gaussian process classification (there's matlab code here: http://www.gaussianprocess.org/gpml/code/matlab/doc/) which will give you an uncertainty estimate as well as a regressor.
Ferenc Huszár at Quora Visit the source
Related Q & A:
- What are good ways to save up for an iPod Touch?Best solution by wikihow.com
- What are the easiest ways to find a Job?Best solution by Quora
- What are the best ways to find a job with a social entrepreneurial company?Best solution by answers.yahoo.com
- What are easy ways of writing a good essay?Best solution by Quora
- What are some good ways to advertise a business?Best solution by ChaCha
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.