lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From varun sharma <mechanism_...@yahoo.co.in>
Subject Lucene index corruption on HDFS
Date Tue, 15 Jul 2014 06:14:50 GMT
I am building my code using Lucene 4.7.1 and Hadoop 2.4.0 . Here is what I am trying to do
Create Index
	1. Build index in RAMDirectory based on data stored on HDFS .
	2. Once built , copy the index onto HDFS.
Search Index
	1. Bring in the index stored on HDFS into RAMDirectory.
	2. Perform a search on in memory index .
The error I am facing is
Exception in thread "main" java.io.EOFException: read past EOF: RAMInputStream(name=segments_2)
at org.apache.lucene.store.RAMInputStream.switchCurrentBuffer(RAMInputStream.java:94) at org.apache.lucene.store.RAMInputStream.readByte(RAMInputStream.java:67)
at org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:41) at org.apache.lucene.store.DataInput.readInt(DataInput.java:84)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:326) at org.apache.lucene.index.StandardDirectoryReader$1
o.doBody(StandardDirectoryReader.java:56) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:843)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:66) at hdfs.SearchFiles.main(SearchFiles.java:85)
I did some research and found out this , may be due to index corruption .
Below is my code .
Save index into HDFS .
// Getting files present in memory into an array.StringfileList[]=rdir.listAll();// Reading
index files from memory and storing them to HDFS.for(inti =0;i <fileList.length;i++){IndexInputindxfile
=rdir.openInput(fileList[i].trim(),null);longlen =indxfile.length();intlen1 =(int)len;// Reading
data from file into a byte array.byte[]bytarr =newbyte[len1];indxfile.readBytes(bytarr,0,len1);//
Creating file in HDFS directory with name same as that of// index filePathsrc =newPath(indexPath
+fileList[i].trim());dfs.createNewFile(src);// Writing data from byte array to the file in
HDFSFSDataOutputStreamfs =dfs.create(newPath(dfs.getWorkingDirectory()+indexPath +fileList[i].trim()),true);fs.write(bytarr);fs.flush();fs.close();}FileSystem.closeAll();
________________________________

Bringing index from HDFS into RAMDirectory and using it .
// Creating a RAMDirectory (memory) object, to be able to create index// in memory.RAMDirectoryrdir
=newRAMDirectory();// Getting the list of index files present in the directory into an// array.FSDataInputStreamfilereader
=null;for(inti =0;i <status.length;i++){// Reading data from index files on HDFS directory
into filereader// object.filereader =dfs.open(status[i].getPath());intsize =filereader.available();//
Reading data from file into a byte array.byte[]bytarr =newbyte[size];filereader.read(bytarr,0,size);//
Creating file in RAM directory with names same as that of// index files present in HDFS directory.filenm
=newString(status[i].getPath().toString());StringsSplitValue =filenm.substring(57,filenm.length());System.out.println(sSplitValue);IndexOutputindxout
=rdir.createOutput((sSplitValue),IOContext.DEFAULT);// Writing data from byte array to the
file in RAM directoryindxout.writeBytes(bytarr,bytarr.length);indxout.flush();indxout.close();}
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message