lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From paul.mo...@dds.net
Subject Missing tokens
Date Wed, 18 Aug 2010 09:34:43 GMT

Hi, I'm having a problem with certain search terms not being found when I
do a query. I'm using Solrj to index a pdf document, and add the contents
to the 'contents' field. If I query the 'contents' field on the
SolrInputDocument doc object as below, I get 50k tokens.

StringTokenizer to = new StringTokenizer((String)doc.getFieldValue(
"contents"));
System.out.println( "Tokens:"  + to.countTokens() );

However, once the doc is indexed and I use Luke to analyse the index, it
has only 3300 tokens in that field. Where did the other 47k go?

I read some other threads mentioning to increase the maxfieldLength in
solrconfig.xml, and my setting is below.

  <maxFieldLength>2147483647</maxFieldLength>

Any advice is appreciated,
Paul


Mime
View raw message