What is the best classification technique where I have 1500 categories to predict and 100,000 observations?
-
Random Forest in R seems to fail after 32 categories.
-
Answer:
Random decision forests is actually a good solution, since it can predict a distribution over categories and does not need to resort to one-vs-all techniques for multi-label classification. I agree that R won't do it but other packages will.
Sean Owen at Quora Visit the source
Other answers
Classification with many categories has actually been a fairly active research area in recent years. See http://jmlr.org/proceedings/papers/v28/bi13.pdf for a recent paper with a decent literature survey.
Justin Rising
Haven't really worked on anything like that but perhaps I can offer some ideas. I would start with something simple for a baseline and work from there. In theory, multinomial logistic regression seems like the right type of algorithm for this. There could be some computational issues if you have 1500 classes but I don't have enough experience with this to be able to tell for sure. An alternative would be to use a binary classifier in a one-vs-all configuration. This would work with logistic regression, linear SVM, kernel SVM, etc. With 1500 classes, you would have to compensate for the massive ratio of negative to positive samples for each class, either by resampling or weighing classes to reflect the proportions. Before setting to work on classification, I would plot a histogram of the labels to get a feeling for their proportions. If they are very uneven, many learning algorithms might fit to the most common classes and end up rubbish at predicting the rare labels. The techniques that I mentioned for one-vs-all configurations might be helpful with this problem. After exploring the basic methods, I would try to see if there is any hierarchical structure in those labels. Some of them might be clusters of similar objects, so it could be helpful to predict the cluster first and then have classifiers to distinguish between different classes within clusters. Also have a look at Google scholar and the paper that @Justin-Rising suggested. And if you publish a paper on this, please let us know :-)
Michal Romaniuk
Depending on the nature of your problem, you might want to treat it as information retrieval, rather than classification problem. For example, if your observations are textual, and categories are some sort of document clusters or classes, then something as simple as bm25 might work good enough. You might want to put additional "smaller" rankers or classifiers after this step. For non-textual observations, you might be able to use simple Bayes network as described here http://arxiv.org/pdf/1304.1511.pdf
Oleksii Kuchaiev
I would start by asking if the observations are separable into 1500 categories, and whether they are adequate enough for any classifier to generalize well I would recommend reading the error correcting output code approach by Dietterich and Bakiri : Solving Multiclass Learning Problems via Error-Correcting Output Codes http://arxiv.org/pdf/cs/9501101.pdf
Krishna Janakiraman
You can define the problem as a whole optimization problem instead of naive one vs all or one vs one look and go into it.. There are many papers about this topic and some of the real applications are used for image net challenge. However it comes with some memory bloating since all the model is kept in a matrix optimized by each iteration.
Eren Golge
Related Q & A:
- What are some good serious plays where I could find a monologue?Best solution by Yahoo! Answers
- What is the best tattoo ink and where could i buy it?Best solution by Yahoo! Answers
- What is the best car I can get for less than 10,000 dollars?Best solution by carfax.com
- What is the best video recorder that i can download fro free for computer?Best solution by Yahoo! Answers
- What is the best electric paintball gun I could get for less than 400?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.