lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Wilson <...@sanger.ac.uk>
Subject Re: Solr 3.6.1 Query large field
Date Fri, 01 Mar 2013 14:35:23 GMT
Hi Otis

Thanks for the info. I tried 2 different ways that both seem to work okay.

I added <filter class="solr.LimitTokenCountFilterFactory"
maxTokenCount="100000"/>  to the <indexConfig> in the solrconfig.xml

And I tried adding the
<filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="100000"/>
To the <fieldType><analyzer type="index"> section, in the Schema.xml file.

Both ways work ok.

Cheers Mark

On 28/02/2013 08:05, "Otis Gospodnetic" <otis.gospodnetic@gmail.com> wrote:

> Mark,
> 
> Look at
> http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/collection1
> /conf/solrconfig.xml:
> 
>   <indexConfig>
>     <!-- maxFieldLength was removed in 4.0. To get similar behavior, include a
>          LimitTokenCountFilterFactory in your fieldType definition. E.g.
>      <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="10000"/>
>     -->
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Wed, Feb 27, 2013 at 11:08 AM, Mark Wilson <mw8@sanger.ac.uk> wrote:
> 
>> Hi
>> 
>> I am using Nutch to crawl a site, and post it in Solr 3.6.1. The page is
>> very large.
>> 
>> When I query the index, using the Solr Admin query page, it only finds the
>> result if it is in the top X% of the page, probably about 30%.
>> 
>> The page is about 79Kb, and consists of 19,067 words.
>> 
>> Is there a setting somewhere that sets the maxFieldSize? Or maxTokenSize?
>> 
>> I set the field content to be displayed on the result page, and it displays
>> all the data correctly, where I can see all the tokens I get no results
>> from.
>> 
>> I can't split the page up, as it is auto-generated from a database.
>> 
>> Any help gratefully received.
>> 
>> Thanks Mark
>> 
>> 
>> 
>> --
>>  The Wellcome Trust Sanger Institute is operated by Genome Research
>>  Limited, a charity registered in England with number 1021457 and a
>>  company registered in England with number 2742969, whose registered
>>  office is 215 Euston Road, London, NW1 2BE.
>> 



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

Mime
View raw message