Skip to main content

Lucene, sample JAVA code to Search an indexed file folder


Please find below the Lucene sample JAVA code to search the files inside a folder. This code will search the indexed folder for a search query in an indexed field.

This java code is expecting the index path ( where the index files were created ) , field which need to be searched and the query need be searched as program arguments like  "java SearchFiles [-index dir] [-field f] [-query string]" .


import java.io.File;
import java.util.ArrayList;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class SearchFiles {

public static void main(String[] args){
try{
String usage = "Usage: SearchFiles [-index dir] [-field f] [-query string] \n\n";
String index = "index";
String field = "contents";
String queryString = null;
int hitsPerPage = 100;
for (int i = 0; i < args.length; i++) {
if ("-index".equals(args[i])) {
index = args[i + 1];
i++;
} else if ("-field".equals(args[i])) {
field = args[i + 1];
i++;
} else if ("-query".equals(args[i])) {
queryString = args[i + 1];
i++;
}
}
new SearchFiles().searchFiles(index,field,queryString,hitsPerPage);
}catch (Exception e) {
e.printStackTrace();
}
}

public ArrayList<String> searchFiles(String index,String field,
String queryString,int hitsPerPage) throws Exception {

ArrayList<String> returnStringList = new ArrayList<String>();

IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(index)));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_31);
QueryParser parser = new QueryParser(Version.LUCENE_31, field, analyzer);
Query query = parser.parse(queryString);
int numberOfPages = 5;

TopDocs results = searcher.search(query, numberOfPages * hitsPerPage);

ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
System.out.println(numTotalHits + " total matching documents");
returnStringList.add(numTotalHits + " total matching documents");

int start = 0;
int end = Math.min(numTotalHits, hitsPerPage);
for (int i = start; i < end; i++) {
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
if (path != null) {
System.out.println((i + 1) + ". " + path);
returnStringList.add((i + 1) + ". " + path);
String title = doc.get("title");
if (title != null) {
System.out.println("   Title: " + doc.get("title"));
returnStringList.add("   Title: " + doc.get("title"));
}
}
}
searcher.close();
return returnStringList;
}
}

Comments

Popular posts from this blog

How to convert your Blogger Blog to PDF ?

You can use a website called "blogbooker" @  http://www.blogbooker.com/blogger.php   to convert your Blogger Blog to a PDF . Please find the steps below : 1. Save your blog as an xml using Blogger Settings - Other - Export Blog option 2. Go to the website " http://www.blogbooker.com/blogger.php " and select this XML , give your blog address and select the options like date range, page size, font, ... 3. Click the  "Create Your BlogBook" button to view and save your blog as PDF

ATG User Profile schema ER diagram

Check out the Product Catalog  schema ER-Diagram @  http://tips4ufromsony.blogspot.in/2012/01/atg-product-catalog-schema-er-diagram.html Check out the O rder schema ER-Diagram @   http://tips4ufromsony.blogspot.in/2012/02/atg-order-schema-er-diagram.html If you would like to know the relationship between different User Profile schema tables, please find below screen shot of  Profile schema ER Diagrams.  

GC Log Analyzer from IBM

This blog is about the GC( garbage collection) log analyzer from IBM. If you have a GC log and you want to analyze the file,  this IBM tool will help you with some graphical analyzer and some recommendations. You can download it from the following URL : https://www.ibm.com/support/pages/ibm-pattern-modeling-and-analysis-tool-java-garbage-collector-pmat Please find below some screenshots and details that might help you. 1. Screenshot 1: 2. Screenshot 2 : 3. Screenshot 3 : 4. Screenshot 4:

Eclipse Color Theme

If you want to color your Eclipse, please follow the url : http://www.eclipsecolorthemes.org/ This provides you a plug-in with different color themes. Please find the Eclipse configuration screen shot below :

ATG License Files and Oracle Software Delivery Cloud

Oracle no longer generates license keys that are specific to your IP address(es). Oracle now provides generic license files that enable you to fully utilize all of the features for which you are licensed. Please find the ATG License files for different ATG versions @ http://www.oracle.com/us/support/licensecodes/atg/index. @ Oracle Software Delivery Cloud , you can find downloads for all licensable Oracle products –> https://edelivery.oracle.com/ Please find below a screen shot for ATG products download :