How To Find Case Information?

What are good ways to find a linear regression in case of incomplete information about output variable?

  • need to do a linear regression y <- x1 + x2+ x3 + x4 y is not known but instead of y we have f(y) which depends on y for example, y is a probability from 0 to 1 of a binomial distribution over 0, 1 and instead of y we have (the number of 0, the number of 1) out of (the number of 0 + the number of 1) experiments How should I perform linear regression to find correct y How should I take into account the amount of information provided that for some x1 x2 x3 we have n experiments which give high confidence value of y, but for other x1 x2 x3 we have low confidence value of y due to small number of measurements

  • Answer:

    I'm trying to understan your question, and as far as I can see you are faced with a typical binary classification problem, and if so you should probably use some form of logistic regression, or large margin method to find your model. But let me rephrase your question sentence by sentence so that you can tell me if I misunderstood: 1) You have some feature vector x = (x1, x2, x3, x4,...) describing your experiment, and you observe whether some experiment was successful (outcome = 1) or not (outcome = 0); or when there are multiple experiments with the same feature vector you observe how many of those were successes and failures (the two are the same really, assuming that experiments are exchangeable, that is there is no information in the temporal order of the experiments). 2) The probability of success (which you refer to as y) is a function of the feature vector x. This is the function we'd like to uncover by regression. 3) For some feature vectors you may observe multiple experiments, but the number of experiments that share a particular feature vector varies. If my understanding is correct, you are faced with a binary classification problem and you should look into logistic regression or support vector machines. Probably all general purpose statistics software will have logistic regression implemented. If you want a very flexible, advanced, non-linear regression model which also gives you nice confidence band on your regression, I can recommend Gaussian process classification (there's matlab code here: http://www.gaussianprocess.org/gpml/code/matlab/doc/) which will give you an uncertainty estimate as well as a regressor.

Ferenc Huszár at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.