lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ecos <>
Subject IndexFormatTooNewException - MapReduceIndexerTool for PDF files
Date Tue, 02 May 2017 04:48:23 GMT
Hi I'm getting the following error when trying to index PDF documents using
the MapReduceIndexerTool in Cloudera:


The cause of the error is:
org.apache.lucene.index.IndexFormatTooNewException: Format version is not
supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to
be between 0 and 3).

Reading out there I found the exception is thrown when Lucene detects an
index that is newer that the Lucene version.

My configuration is:
SOLR: 4.10.3
Cloudera: 5.8.0
Hadoop: 2.6.0

In order to index I´m following the tutorial:  </a>

Using the following hadoop command:
hadoop jar /usr/lib/solr/contrib/mr/search-mr-*-job.jar \
org.apache.solr.hadoop.MapReduceIndexerTool \
-D mapreduce.job.maps=1 \
-D mapreduce.job.reduces=1 \
-D dfs.replication=1 \
--morphline-file /root/$COLLECTION/conf/pdf_morphlines.conf \
--output-dir hdfs://localhost:8020/user/$USER/outdir --verbose \
--solr-home-dir $HOME/$COLLECTION --shards 1 \

The morphlines file:

And the schema file:
schema.xml <>  

Thank you.

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message