How to close instance of an XML DOCUMENT?

WildCARD search using lucene in a large file containing 100 millions Strings taking too much time. i want the result in 1-2 seconds. any help?

  • I have a file size 1.43 gb. the file contains 100 millions strings ( 3 - 80 characters length) separated line by line in the file. i am doing WILDCARD search on the file using lucene. right now i am creating one document for each string. i want total count of the search keyword (*searchkeyword*).here is my code lucene.demo.java public class LuceneDemo { //a path to directory where Lucene will store index files private static String indexDirectory = "C:\\indextofile"; // a path to directory which contains data files that need to be indexed private static String dataDirectory = "C:\\indexofilef"; public static int count = 0; private Searcher indexSearcher; public static void main(String[] args) throws FileNotFoundException, IOException { LuceneDemo luceneDemo = new LuceneDemo(); //create Lucene index luceneDemo.createLuceneIndex(); //create IndexSearcher luceneDemo.createIndexSearcher(); luceneDemo.termQueryExample(); } private void createLuceneIndex(){ Indexer indexer = new Indexer(indexDirectory,dataDirectory); //Create IndexWriter System.out.println("testing-4"); indexer.createIndexWriter(); try { //Index data indexer.indexData(); } catch (FileNotFoundException e) { throw new RuntimeException(e); } catch (IOException e) { throw new RuntimeException(e); } } private void createIndexSearcher() throws CorruptIndexException, IOException{ /* Create instance of IndexSearcher */ indexSearcher = new IndexSearcher(indexDirectory); } private void termQueryExample() throws CorruptIndexException, IOException{ try { Directory directory = FSDirectory.getDirectory(indexDirectory); //IndexSearcher indexSearcher = new IndexSearcher(directory); BooleanQuery.setMaxClauseCount(102400000); Term term = new Term("reversecontent", "bubble*com"); Query query = new WildcardQuery(term); Hits hits = indexSearcher.search(query); System.out.println("######## Hits :"+hits.length()); } catch (Exception e) { e.printStackTrace(); } } } Indexer.java public class Indexer { private IndexWriter indexWriter; /*Location of directory where index files are stored */ private String indexDirectory ; /*Location of data directory */ private String dataDirectory ; public String  FIELD_CONTENTS = "contents"; public Indexer(String indexDirectory, String dataDirectory){ this.indexDirectory = indexDirectory ; this.dataDirectory = dataDirectory ; } /** * This method creates an instance of IndexWriter which is used * to add Documents and write indexes on the disc. */ void createIndexWriter(){ if(indexWriter == null){ try{ //Create instance of Directory where index files will be stored Directory fsDirectory =  FSDirectory.getDirectory(indexDirectory); /* Create instance of analyzer, which will be used to tokenize the input data */ Analyzer standardAnalyzer = new KeywordAnalyzer(); //Create a new index boolean create = true; //Create the instance of deletion policy IndexDeletionPolicy deletionPolicy = new KeepOnlyLastCommitDeletionPolicy(); indexWriter = new IndexWriter(fsDirectory,standardAnalyzer,create, deletionPolicy,IndexWriter.MaxFieldLength.UNLIMITED); }catch(IOException ie){ System.out.println("Error in creating IndexWriter"); throw new RuntimeException(ie); } } } void indexData() throws FileNotFoundException, IOException{ File[] files = getFilesToBeIndxed(); for(File file:files){         FileReader fr = new FileReader(file);              // To store the contents read via File Reader              BufferedReader br = new BufferedReader(fr);                                                               // Read br and store a line in 'data', print data              String data;             System.out.println("start");             while((data = br.readLine()) != null)               {              String newdata = data+".com";             Document doc = new Document();             //doc.add(new Field("content", newdata,             // http://Store.NO, Index.NOT_ANALYZED));             doc.add(new Field("reversecontent", new StringBuffer(newdata).reverse().toString(),             http://Store.NO, Index.NOT_ANALYZED));             indexWriter.addDocument(doc);             } System.out.println("end"); // Add these fields to a Lucene Document //Step 3: Add this document to Lucene Index. } /* Requests an "optimize" operation on an index, priming the index for the fastest available search */ indexWriter.optimize(); System.out.println("optimization done"); /* * Commits all changes to the index and closes all associated files. */ indexWriter.close(); } private File[] getFilesToBeIndxed(){ File dataDir  = new File(dataDirectory); if(!dataDir.exists()){ throw new RuntimeException(dataDirectory+" does not exist"); } File[] files = dataDir.listFiles(); return files; } }

  • Answer:

    a. I believe the problem is your using a KeywordAnalyzer. This means each of your 100 million strings will get a different index term, unless some of them are identical. Try switching to a StandardAnalyzer, say. This will allow the index to be much more efficient. b. Try testing this on a small scale (say 10000 strings), to see that you are getting proper results. c. I believe the Lucene users mailing list should give you better responses than Quora for a specific technical Lucene question. d. Other than that, read http://www.manning.com/hatcher3/. This is the best book I know about information retrieval. And it explains Lucene like no other resource.

Yuval Feinstein at Quora Visit the source

Was this solution helpful to you?

Just Added Q & A:

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.