Can a mathematically sound prediction interval have a negative lower bound?
-
I have used R to form a 95% prediction interval for the number of endemic species on an island. My lower bound is negative – is that mathematically sound? In the linear model used in the prediction interval, the data used are: Area Surface area of island, hectares DiscSC Distance from Santa Cruz, kilometres Elevation Elevation of higher point in metres and it is coded as such: selected.model <- lm(ES ~ Area + Elevation + DistSC + I(Elevation^2) + (Elevation:DistSC) + (Area:Elevation)) and stepwise regression was performed to find this "best" model I'm not exactly sure how a prediction interval works. I just want to make sure it is OK. Obviously a negative number of species is incorrect, but I know it takes into account the uncertainty of the mean as well as data scatter.
-
Answer:
Mathematics are reality-agnostic. So your negative lower prediction band can certainly be mathematically sound. I would argue, however, that this is a good indication that you are using the wrong mathematics, e.g., Ordinary Least Squares (which assumes a normal distribution of errors) with count data (where a normal distribution makes no sense). I would suggest using Poisson regression or some similar method that is more suitable for count data.
user42835 at Cross Validated Visit the source
Other answers
It suggests to me that you haven't used any analytic approach with an appropriate transformation of the outcome. With count data, for instance, popular linear models (Poisson Regression or Negative Binomial Regression in particular) model the log of the process as a linear function of predictors. Then, any predicted values resulting from such a model would have to be exponentiated and, thus, positive. Similarly, when you use the predict.glm function with se.fit set to TRUE for these models, you calculate symmetric prediction intervals for counts on the log scale. Re-exponentiating those values ensures that you have intervals which do not include 0. You'll notice that the exponentiated predictions are the same as you would get from setting type='response' in the predict function. However, asking for both type='response', se.fit=TRUE will confuse R since the link transformation of the GLM means you'll have non-symmetric intervals (SE of FIT is calculated on the transformed outcome scale). There are additive count models, just like there are additive risk models for binary endpoints, but I think the results can be difficult to interpret and they behave untenably for predictions near to the boundaries values of the support (0 for count data). As such, I'd be dubious about not only your negative predictions but all other predictions from your model.
AdamO
Related Q & A:
- Can a nineteen year old girl get a chance in the Aupair?Best solution by Yahoo! Answers
- Can a heritage palace be converted into a shopping mall according to indian law?Best solution by Yahoo! Answers
- Can a 14 years old work in a Haunted house in Miami, Florida?Best solution by Yahoo! Answers
- Where can a 17-year-old work as a waitress in CA?Best solution by simplyhired.com
- Can a car dealership make me sign a new contract?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.