lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liaqat Ali <>
Subject Indexing Non-English text
Date Tue, 04 Dec 2007 10:53:51 GMT
I m facing a problem while indexing a small .txt file with Lucene. The 
file which i want to index with lucene is in Urdu language (varient of 
Arabic and Persian). But the Index i get is in Unicode form, not in the 
real form (original Urdu text). This program works good for a file in 
English language. This is the code i use for indexing..

        FileReader file = new FileReader ("urdoc.txt");
        BufferedReader buff = new BufferedReader(file);
        String line = buff.readLine();
        boolean eof = false;
        String indexDir = "D:\\index";
               Analyzer analyzer = new StandardAnalyzer();
            boolean createFlag = true;
        IndexWriter writer =
                    new IndexWriter(indexDir, analyzer, createFlag);
            Document document  = new Document();
        document.add(new Field("fieldname",line, Field.Store.YES,

Kindly guide me, what I should do, would i have to change this code or 
whatever else do you suggest?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message