Why does most computer vision work lie in feature extraction rather than algorithms related to machine learning?
-
Rather than natural language processing work, I think most of the research on computer vision really focus on how to get a good feature. Some very famous algorithms in computer vision like SIFT or HOG actually just have a good representation of the features. However, in NLP, many people proposed many models and algorithms which is more related to machine learning approach. Why computer vision pay more attention to feature extraction, which they just use the result of machine learning research as part of the work(like linear SVM)?
-
Answer:
That is because the machine learning part is its own field. Machine learning is a very general abstract technique that can take any kind of features (whether from images, natural language, metadata or anything else) and detect patterns in them. It is not specific to computer vision and research on machine learning for computer vision applications is classed as machine learning research, not computer vision research. A lot of computer vision does not use machine learning, for that matter. If anything, solutions that do not rely on machine learning are generally considered better, because they are more transparent and verifiable.
David Khoo at Quora Visit the source
Other answers
I am not a CV/NLP researcher, but I am a developer in those problem domains. Here are my views why NLP is more into ML, and CV is more into feature detection(I guess this is what you implied): Machine learning is not endemic to any sub-fields(CV, NLP) of artificial intelligence. It belongs to AI as a whole and every sub-field of AI needs it, some need it more than the others. I think what you are trying to say here is, you have encountered more research in NLP contributing to ML than there is in CV research. It is because NLP needs it more than CV. Computer vision is all about features because features are the things that machines use to differentiate between images of different objects. Want to find lines, circles or a continuous connected component? It is all about a pixel's relation with respect to its neighouring pixels and let the program to know what the rules are for circles etc. Now, you want to extract a noun from a sentence. What is noun? For which language? What is the sentence structure of the language? What are the latest nouns added to the dictionary(twerking?). So, as you can see, the learning curve is pretty high. If you want to do anything with NLP, you have to teach first and teach well and it's a never-ending(read NP-hard) process. So, that's why NLP researchers are more into solving the ML problem first then CV researchers. And it is easily understandable why a CV researcher would want to develop methods that look for features and shapes in a more intelligent way. But, still CV needs ML to, it's just that CV experts traditionally don't do ML research that much. Think of it as a human. Almost very person, literate or illiterate who is not visually impaired can recognize leaves of trees. But can she recognize the object from the word 'leaves' in a text if she is illiterate or doesn't speak English. Natural language inherently commands more learning in human, and needless to say in computer.
Rifat Mahmud
I guess you limit the topic into recognition problems but there are many others based on homography, and vision geometry, like 3d modelling of 2d images, camera technologies. If we ponder about machine vision, then the extracted features are more important for the success of the system compared to ML algorithm. There are many works proving that saying (that I am lazy to refer). Thus, best representation comes first. I guess that is way people are more interested in feature engineering.
Eren Golge
A picture may be worth a thousand words, but it takes a million words to store a picture. (Please kindly leave a comment if you know the source.) The first "words" roughly refer to the meaning and descriptions that can be conveyed by a picture. The second "words" refer to machine words (which may be bytes, 32-bit integers, or 64-bit integers), even when image compression is applied.The smallest unit of input for computer vision is an image. The smallest unit of input for natural language processing is a sentence. Just because of the more than thousandfold difference in the size of the "smallest unit of input", it has always been beneficial to try to "preprocess" images so as to extract the relevant "features" from the input image before sending these extracted signals into a machine-learning algorithm.You may have noticed that distributed machine-learning algorithms is a new phenomenon. A decade ago, the processing power and the availability of open-source frameworks for distributed machine-learning wasn't really accessible to the average graduate students, except for the very lucky ones.Training data for NLP are abound - you can find and download a lot of text on the internet. Likewise, for certain types of computer vision tasks, such as object categorization, one can access millions of training data. However, there are certain types of computer vision tasks (especially in industrial image processing, traditionally speaking) where an algorithm developer has to develop a reasonable algorithm based on a single sample image provided by a customer. In fact, there was one that was asked on Quora a few days ago: For this types of tasks, the customer would have been angered if an algorithm developer suggested that they need a thousand or even a million images in order to take a deep-learning, automatic image feature discovery approach.In other words, the types of computer vision tasks that need deep-learning and those that don't, are really quite different in nature. To summarize: Is the computer vision task difficult enough such that deep-learning appears to be the only viable approach? Conversely, is it easy enough to be solved by a good human algorithm developer who can apply some intuitive algorithms? Is it easy to acquire a huge number of training data for the task? Is it relatively straightforward to extract the necessary image features for the task? If such knowledge is in plain sight, it would not be profitable for an algorithm developer to take the harder route of deep-learning and automatic image feature discovery, unless the algorithm developer's goal is to advance machine-learning instead of reaping immediate benefits.
Ryan Wong
Related Q & A:
- Why won't my camcorder work on my computer?Best solution by Yahoo! Answers
- Why won't yahoo messenger work on my computer??Best solution by Yahoo! Answers
- Why won't Yahoo messenger work on my computer?Best solution by Yahoo! Answers
- Why won't my computer speakers work on my TV?Best solution by Yahoo! Answers
- Why wont my dvd+r work on my computer?
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.