lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "anjana m" <anjana.mpra...@gmail.com>
Subject Re: Getting Start to LUCENE 2.2.0 - How to Indexing a Directory of Your WebSite.
Date Wed, 02 Jan 2008 12:39:34 GMT
Hope this info will help..
i have just finished the task...
run the indexer
then run the seracher..
check the mail completly
these are the steps that i ahve followed..
t search directories..




* Indexer.java*



import org.apache.lucene.index.IndexWriter;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.document.Document;

import org.apache.lucene.document.Field;

import java.io.File;

import java.io.IOException;

import java.io.FileReader;



public class Indexer {

    public static void index(File indexDir, File dataDir) throws IOException
{

        if (!dataDir.exists() || !dataDir.isDirectory()) {

            throw new IOException(dataDir + " does not exist or is not a
directory");

        }

        IndexWriter writer = new IndexWriter(indexDir, new
StandardAnalyzer(), true);

        indexDirectory(writer, dataDir);

        writer.close();

    }



    private static void indexDirectory(IndexWriter writer, File dir) throws
IOException {

        File[] files = dir.listFiles();

    System.out.println("TOTAL NUMBER OF FILES :"+files.length);

        for (int i=0; i < files.length; i++)

        {



             File f = files[i];

            if (f.isDirectory()) {

                indexDirectory(writer, f);  // recurse

            } else if (f.getName().endsWith(".txt")) {

                indexFile(writer, f);

            }

        }

    }



    private static void indexFile(IndexWriter writer, File f) throws
IOException {



        System.out.println("Indexing " + f.getName()); /*Print the Indexed
File Names*/

        Document doc = new Document();

        doc.add(Field.Text("contents", new FileReader(f)));

        doc.add(Field.Keyword("filename", f.getCanonicalPath()));

        writer.addDocument(doc);

    }

    public static void main(String[] args) throws Exception {



    /*Create 2 new directory called TESTDir for Indexing and DataDir which
will conatain a text file which willbe indexed.*/

       File indexDir=new File("TESTDIR");/*directory where we store the
Lucene index*/

        File dataDir = new File("DataDir");/*directory that contains the
files in txt we want to index*/

        index(indexDir, dataDir);

    }

}





*Searcher.java*

import org.apache.lucene.document.Document;

import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.Query;

import org.apache.lucene.search.Hits;

import org.apache.lucene.store.FSDirectory;

import org.apache.lucene.store.Directory;

import org.apache.lucene.queryParser.QueryParser;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

import java.io.File;

public class Searcher {

    public static void main(String[] args) throws Exception

    {

        File indexDir = new File("TESTDIR");

        String q ="jayanth";

        if (!indexDir.exists() || !indexDir.isDirectory())

        {

            throw new Exception(indexDir + " is does not exist or is not a
directory.");

        }



            search(indexDir, q);

        }

    public static void search(File indexDir, String q)  throws Exception{

    Directory fsDir = FSDirectory.getDirectory(indexDir, false);

    IndexSearcher is = new IndexSearcher(fsDir);

    Query query = QueryParser.parse(q, "contents", new StandardAnalyzer());

    Hits hits = is.search(query);

    System.out.println("Found " + hits.length() + " document(s) that matched
query '" + q + "':");

    for (int i = 0; i < hits.length(); i++) {

        Document doc = hits.doc(i);

        System.out.println(doc.get("filename"));

    }

}

}
---------------------------------------

*Prerequistes of Lucene 1.3 to perform search.*



   1. Lucene 1.3.jar
   2. Index Directory to Hold the Indexes.
   3. Data Directory that holds the files to be indexed
   4. Mandatory Indexing of Document or Files
   5. Only Indexed Data is Searchable



Lucene 1.3. jar

Lucene1.3. jar is an API that contains all the classes and methods packed
together to perform searching.

The jar file can be downloaded from the following site:

http://archive.apache.org/dist/jakarta/lucene/binaries/

The jar file that I have used for the sample program to demonstarte Lucene
search is Lucene 1.3 final.jar.

The jar file should be available in the application jar library.

The sample program will not work if we change the location of jar
file.Thesample program will not perform search if there are
conflicting jar versions
or Duplicate jar versions.

The following steps can be followed to add a Lucene 1.3. jar to NetBeans
IDE:

   1. Right Click on your Project->Properties
   2. Project->Properties->Add jar folder
   3. Browse for Lucene 1.3 jar
   4. Click OK

The Lucene 1.3 final.jar will be added to your sample application.

To cross check just click on the libraries on the project explorer window
and confirm that Lucene1.3 final.jar is visible in the folder structure.



Index Directory to Hold the Indexes

This direcory will hold the Indexes of the data that is supplied.I have
manually created this directory .The directory should be available to the
sample application folder.

I have used NetBeans for my sample program and I have created a directory in
the following path: C:\Documents and Settings\MyWorkSpace\TryLucene

TryLucne is the name of the sample project which contains Lucene1.3final.jar
.

Create a new folder and name the folder as "IndexDir".



Data Directory that holds the files to be indexed

This Directory will hold the the Data files.The sample program will search
.txt files.

Create a new folder manually in the following path and name the folder as
"DataDir"

C:\Documents and Settings\MyWorkSpace\TryLucene

Create some sample .txt files in the "DataDir" folder.



Indexing

The sample program Indexer.java performs Indexing of the .txt files.

Build the application in NetBeansIDE

Compile the Indexer.java in NetBeans

Run the Indexer.java in NetBeans.

The output of Indexer.java displays information about

Total Number of files in the DataDir

Names of the Indexed files in DataDir.

Indexer.java program will Index all the .txt files in the DataDir folder.



Mehods in Indexer.java

   - index(File indexDir, File dataDir)

Check if  dataDir is directory throw reate a new IndexWriter

Indexed writer construstor is used which the following aruguments ()

   - indexDirectory (IndexWriter writer, File dir)----recursively call to
   index all the files in the directory.
   - indexFile (IndexWriter writer, File f) --- get the document object,
   intialise the file reader.



Searching



Specify the directory that contains the index here its is "IndexDir"

Specify the stirng that needs to be searched.

Call->search (indexDir, q); where q is a string

search (indexDir, q)


http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes

http://www.dbsight.net/?q=node/47

http://www.roseindia.net/software-tutorials/detail/6533

http://static.compassframework.org/docs/latest/jdbcdirectory.html


http://blog.scalingweb.com/2007/11/03/full-text-search-for-database-using-lucene-search-engine/


http://uborik.org/files/portfolio/info340_history_places_project_full.pdf
---------------------------------------------------------------------------------------------------------------------

On Jan 2, 2008 5:37 PM, Jesiel Trevisan < jesieltrevisan@gmail.com> wrote:

> Hi everyone,
>
> I'm trying to use the Lucene 2.2.0 in my webpage, I would like to create a
> simple websearch field/function in my site.
>
> I'm intalled the example of Lucene, but, there is only a Search funcional
> implemented as example, not the indexing files.
>
> I would like to know how I can create a filter to create the lucene index
> files
>
> I need to indexing, for example, a directory of my website
>
> I created a example to try to Index a directory wirh a simple TXT file
> insite it :
>
> ...
>       String indexDir =
> "c:\\eclipse\\workspace\\JavaSource\\br\\com\\testeLucene"    (* there is
> a
> test.txt file with some words insite it )*
>
>       String dataDir =
> "c:\\eclipse\\workspace\\JavaSource\\br\\com\\lucene\\indexedFiles"
> ...
>
>    public static int index(File indexDir, File dataDir) throws IOException
> {
>
>        if (!dataDir.exists() || !dataDir.isDirectory()) {
>            throw new IOException(dataDir + "It is NOT a Directory");
>        }
>
>        IndexWriter writer = new IndexWriter(indexDir, new
> StandardAnalyzer(), true);
>        writer.setUseCompoundFile (false);
>
>        indexDirectory(writer, dataDir);
>
>        int numIndexed = writer.docCount();
>        writer.optimize();
>        writer.close();
>
>        return numIndexed;
>    }
>
> All right, the Lunece indexs files were created, but, following the Lucene
>
> Search example, I could find the words that were inside the test.txt file
>
> Is it work ? is it right ?
>
> If someone can help me to starting the Lucene 2.2.0 I will really
> appreciate
> it.
>
> Tks so much.
>
> --
> _______________________________________________________
> Jesiel A.S. Trevisan
> Email: jesieltrevisan@gmail.com.br
> MSN: jesieltrevisan@hotmail.com
> Skype & AIM: jesieltrevisan
> YahooMessager: jesiel.trevisan
> ICQ:: 46527510
> _______________________________________________________
> CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is
>
> for the sole use of the intended recipient(s) and may contain confidential
> and privileged information or otherwise be protected by law. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
>
> are not the intended recipient, please contact the sender by reply e-mail
> and destroy all copies of the original message.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message