lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Nazemian <alinazem...@gmail.com>
Subject Re: Extending solr analysis in index time
Date Mon, 12 Jan 2015 06:01:10 GMT
Dear Jack,
Thank you very much.
Yeah I was thinking of function query for sorting, but I have to problems
in this case, 1) function query do the process at query time which I dont
want to. 2) I also want to have the score field for retrieving and showing
to users.

Dear Alexandre,
Here is some more explanation about the business behind the question:
I am going to provide a field for each document, lets refer it as
"document_score". I am going to fill this field based on the information
that could be extracted from Lucene reverse index. Assume I have a list of
terms, called important terms and I am going to extract the term frequency
for each of the terms inside this list per each document. To be honest I
want to use the term frequency for calculating "document_score".
"document_score" should be storable since I am going to retrieve this field
for each document. I also want to do sorting on "document_store" in case of
preferred by user.
I hope I did convey my point.
Best regards.


On Mon, Jan 12, 2015 at 12:53 AM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> Won't function queries do the job at query time? You can add or multiply
> the tf*idf score by a function of the term frequency of arbitrary terms,
> using the tf, mul, and add functions.
>
> See:
> https://cwiki.apache.org/confluence/display/solr/Function+Queries
>
> -- Jack Krupansky
>
> On Sun, Jan 11, 2015 at 10:55 AM, Ali Nazemian <alinazemian@gmail.com>
> wrote:
>
> > Dear Jack,
> > Hi,
> > I think you misunderstood my need. I dont want to change the default
> > scoring behavior of Lucene (tf-idf) I just want to have another field to
> do
> > sorting for some specific queries (not all the search business), however
> I
> > am aware of Lucene payload.
> > Thank you very much.
> >
> > On Sun, Jan 11, 2015 at 7:15 PM, Jack Krupansky <
> jack.krupansky@gmail.com>
> > wrote:
> >
> > > You would do that with a custom similarity (scoring) class. That's an
> > > expert feature. In fact a SUPER-expert feature.
> > >
> > > Start by completely familiarizing yourself with how TF*IDF  similarity
> > > already works:
> > >
> > >
> >
> http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
> > >
> > > And to use your custom similarity class in Solr:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements#OtherSchemaElements-Similarity
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Sun, Jan 11, 2015 at 9:04 AM, Ali Nazemian <alinazemian@gmail.com>
> > > wrote:
> > >
> > > > Hi everybody,
> > > >
> > > > I am going to add some analysis to Solr at the index time. Here is
> > what I
> > > > am considering in my mind:
> > > > Suppose I have two different fields for Solr schema, field "a" and
> > field
> > > > "b". I am going to use the created reverse index in a way that some
> > terms
> > > > are considered as important ones and tell lucene to calculate a value
> > > based
> > > > on these terms frequency per each document. For example let the word
> > > > "hello" considered as important word with the weight of "2.0".
> Suppose
> > > the
> > > > term frequency for this word at field "a" is 3 and at field "b" is 6
> > for
> > > > document 1. Therefor the score value would be 2*3+(2*6)^2. I want to
> > > > calculate this score based on these fields and put it in the index
> for
> > > > retrieving. My question would be how can I do such thing? First I did
> > > > consider using term component for calculating this value from outside
> > and
> > > > put it back to Solr index, but it seems it is not efficient enough.
> > > >
> > > > Thank you very much.
> > > > Best regards.
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>



-- 
A.Nazemian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message