lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Filtering on results with more than N words.
Date Thu, 06 Jun 2013 14:07:11 GMT
Yeah, but part of the problem is that an input string is not converted to 
"words" until analysis, which doesn't happen until after Solr creates the 
Lucene Document and hands it off to Lucene. In other words (Ha!Ha!), there 
are no words during the Solr-side of indexing. That said, you can always 
fake it by writing a JavaScript StatelessScriptUpdateProcessorFactory script 
that simulates basic tokenization, like converting punctuation to white 
space,  trimming and eliminating excess white space and then doing a split 
and count the results. Or, we could add a new update processor that did 
exactly that - CountWordsUpdateProcessorFactory. Much like 
FieldLengthUpdateProcessorFactory... maybe it could be an option on FLUPF - 
count="words/chars".

-- Jack Krupansky

-----Original Message----- 
From: Walter Underwood
Sent: Thursday, June 06, 2013 9:54 AM
To: solr-user@lucene.apache.org
Subject: Re: Filtering on results with more than N words.

Someone else asked about this recently. The best approach is to count the 
words at index time and add a field with the count, so "title" and 
"title_len" or something like that.

wunder

On Jun 6, 2013, at 4:20 AM, Jack Krupansky wrote:

> I don't recall seeing any such filter. Sounds like a good idea though. 
> Although, maybe it is another good idea that really isn't too necessary 
> for solving many real world problems.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Dotan Cohen
> Sent: Thursday, June 06, 2013 3:45 AM
> To: solr-user@lucene.apache.org
> Subject: Filtering on results with more than N words.
>
> Is there any way to restrict the search results to only those
> documents with more than N words / tokens in the searched field? I
> thought that this would be an easy one to Google for, but I cannot
> figure it out. or find any references. There are many references to
> word size in characters, but not to  filed size in words.
>
> Thank you.
>
> --
> Dotan Cohen
>
> http://gibberish.co.il
> http://what-is-what.com




Mime
View raw message