How does Lowe compute the "repeatability" of his SIFT Algorithm?

What is the algorithm/data structure used by Lucene to compute the term frequency of documents?

  • It looks like use the Map structure to compute the term frequency, but I wanted to know the more detailed answer.

  • Answer:

    Terms and their frequencies are denoted by Vectors stored in invertedIndex. A Term is the basic unit for searching which consists of a pair of string elements:  <fieldname,text>. A term vector is a collection of terms.The inverted index maps terms to documents. For each term T , it should store the set of all documents containing that term. So it is the duty of analyzer to look for the terms in documents and create a token stream so that they can be mapped.Terms are stored in segments and they are sorted. The .frq file contains the ids of documents which contain each term, along with the frequency of the term in that document. Lucene stores the term data in inverted index format as described in the image below: TermDocs gives the TF of a given term in each document that contains the term. We can get the term documents from an IndexReader, using the term of interest. I hope following code will make it easy to understand. List<String> termlist = new ArrayList<String>(); IndexReader reader = IndexReader.open(indexFolder); TermEnum terms = reader.terms(); while (terms.next()) { Term term = terms.term(); String termText = term.text(); int frequency = reader.docFreq(term); termlist.add(termText); } reader.close();

Dhwaj Raj at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.