Synonyms for 'offered?

Apache Solr: Synonyms.txt - where can I download an example for English synonyms?

  • I'm using solr.SynonymFilterFactory as I'd like to find similar documents (dupe detection for new captions on http://www.caption.me/ ). I assume there is no 'built in' list of synonyms in Solr/Lucene, so I'll need to construct a synonyms.txt file or download one? I've Googled but can't find one anywhere. Would appreciate your help.  <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">       <analyzer type="index">         <tokenizer class="solr.WhitespaceTokenizerFactory"/>         <filter class="solr.StopFilterFactory"                 ignoreCase="true"                 words="stopwords_en.txt"                 enablePositionIncrements="true"                 />         <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>         <filter class="solr.LowerCaseFilterFactory"/>         <filter class="solr.EnglishPossessiveFilterFactory"/>         <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>         <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt" />       </analyzer>       <analyzer type="query">         <tokenizer class="solr.WhitespaceTokenizerFactory"/>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>         <filter class="solr.StopFilterFactory"                 ignoreCase="true"                 words="stopwords_en.txt"                 enablePositionIncrements="true"                 />         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>         <filter class="solr.LowerCaseFilterFactory"/>         <filter class="solr.EnglishPossessiveFilterFactory"/>         <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>         <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>       </analyzer>     </fieldType>

  • Answer:

    Thats a rather good question! You might have to look at Openoffice for a thesaurus/synonyms file and convert it to the text format for solr. But! There is no such thing as the 'ultimate' synonyms file for your problem domain. A common strategy is to let your users vote and build the synomyns via an question and anwers app.

Olivier Dobberkau at Quora Visit the source

Was this solution helpful to you?

Other answers

You can download Roget's thesaurus from Project Gutenberg, http://www.gutenberg.org/ebooks/10681. There is a Perl module http://search.cpan.org/~rjbs/Parse-GutenbergRoget-0.021/lib/Parse/GutenbergRoget.pm for parsing it before you start using.

Lakshmi Narasimhan Parthasarathy

You can use Wordnet @https://wordnet.princeton.edu/ It is widely used in Natural language processing community.

Veeresh Beeram

Related Q & A:

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.