What's The Weight Of Evidence?

What are the benefits/risks to using a weight-of-evidence transformation for categorical data in logistic regression models relative to using a dummy variable for each factor level?

  • I've seen models built both ways, transforming a single attribute using weight-of-evidence and calculating n-1 dummy variables to represent the categorical data. Is one method inherently better than the other?

  • Answer:

    It really depends... When using weights-of-evidence (WOE), you only need to estimate one regression parameter. But this will only be meaningful/useful if you have a sufficiently large number of observations to have good WOE estimates for each category. Alternatively, when using an indicator for each class you need to estimate [math]n[/math] regression parameters (assuming [math]n[/math] categories). This may introduce considerable additional variance into your model and make it unnecessarily complex.

Ricardo Monti at Quora Visit the source

Was this solution helpful to you?

Other answers

I think it makes more sense to transform categorical features into numeric in order to be feed in an algorithm such as neural networks. It is traditionally used for credit scoring, but beyond that I can only think of a practical use in the mentioned case. Binning numeric features will involve some loss of information, so it shouldn't help your model, unless the numeric features have lots of ouliers, NA, etc.

Martin Bel

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.