Apache Solr: Synonyms.txt - where can I download an example for English synonyms?
-
I'm using solr.SynonymFilterFactory as I'd like to find similar documents (dupe detection for new captions on http://www.caption.me/ ). I assume there is no 'built in' list of synonyms in Solr/Lucene, so I'll need to construct a synonyms.txt file or download one? I've Googled but can't find one anywhere. Would appreciate your help. <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords_en.txt" enablePositionIncrements="true" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/> </analyzer> </fieldType>
-
Answer:
Thats a rather good question! You might have to look at Openoffice for a thesaurus/synonyms file and convert it to the text format for solr. But! There is no such thing as the 'ultimate' synonyms file for your problem domain. A common strategy is to let your users vote and build the synomyns via an question and anwers app.
Olivier Dobberkau at Quora Visit the source
Other answers
You can download Roget's thesaurus from Project Gutenberg, http://www.gutenberg.org/ebooks/10681. There is a Perl module http://search.cpan.org/~rjbs/Parse-GutenbergRoget-0.021/lib/Parse/GutenbergRoget.pm for parsing it before you start using.
Lakshmi Narasimhan Parthasarathy
You can use Wordnet @https://wordnet.princeton.edu/ It is widely used in Natural language processing community.
Veeresh Beeram
Related Q & A:
- Where can I download the Yahoo toolbar with tabs?Best solution by Yahoo! Answers
- Where can I download free windows vista themes?Best solution by Yahoo! Answers
- Where can I download music?Best solution by Yahoo! Answers
- Where can I download brushes for the GIMP?Best solution by Yahoo! Answers
- Where can I download a copy of a live in foreign caregiver contract?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.