lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Solanki <nitinml...@gmail.com>
Subject Re: Issue : Replacing ID with another will degrade performance in Solr?
Date Tue, 20 Jan 2015 14:18:56 GMT
Thanks and sorry for Stackoverflow. You are saying that use "string" type.
But I have used filter = solr.ShingleFilterFactory to break a string into
ngrams.
I want to build query correction just like google is doing - "Did you
mean".

i) I am storing ngrams into gram field and have only single this field in
solr. And saving ngrams(1 to 5 grams) using wikipedia dump data.
ii) Using suggester component to get suggestions of searched query/words.
Suggester gives suggestions on word by evaluating documents and suggested
words are sorting according to freq that I should.

Right Now, I have 600MB indexed data.
Example : When I apply algorithm on input query = "what is ago of salman
khn". It corrects the query into "what is age of salman khan" but it takes
10 seconds to do processing. Because I am calling on Solr API multiple
times to get suggetions of each words( By building input query from unigram
to 5-grams to check). Approx. Number of calls to Solr for single query is
around 1500 times. How to reduce it or make solr faster to give suggestions
fast. Average QTime for single hit on solr is 22 ms. it is taking.

I have attached schema.xml and solrconfig.xml. Please check it and give
your suggestions.
Waiting for your reply.




On Tue, Jan 20, 2015 at 5:25 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> I already replied to you on stack overflow but your response there and the
> schema.xml definition here are contrary to each other.
>
> You are using a textSpell field which is tokenized as a unique key. As I
> mentioned on stack overflow, it is a bad idea. Yes, it will impact
> performance as well as lead to duplicate documents. Switch to a "string" or
> int/long field and you should be fine regardless of what it is named.
>
> On Tue, Jan 20, 2015 at 8:28 AM, Nitin Solanki <nitinmlvya@gmail.com>
> wrote:
>
> > Hi,
> >                  I am working on solr 4.10.2. I have been trapped into
> > the *performance
> > issue* where I have indexed 600MB data on 4 shards with single replicas
> > each. I have defined 2 fields (ngram and frequency). I have removed ID
> > field and replaced it with ngram field. Therefore, Search performance is
> > getting low and taking *QTime  = 134 ms* which is not well for my task.
> >
> > *Schema.xml(sample part) *:-
> > *ngram field* -  <field name="ngram" type="textSpell" indexed="true"
> > stored="true" required="true" multiValued="false"/>
> >
> > <fieldType name="textSpell" class="solr.TextField"
> > positionIncrementGap="100">
> >        <analyzer type="index">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> > minShingleSize="2" outputUnigrams="true"/>
> >     </analyzer>
> >     <analyzer type="query">
> >         <tokenizer class="solr.StandardTokenizerFactory"/>
> >         <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> > minShingleSize="2" outputUnigrams="true"/>
> >     </analyzer>
> > </fieldType>
> >
> > I  have posted same problem on Stackoverflow
> > <
> >
> http://stackoverflow.com/questions/27983291/replacing-id-with-another-will-degrade-performance-in-solr/27984428?noredirect=1#comment44431492_27984428
> > >
> > but no able to get correct solution. Please help me.
> >
> > Thanks and Regards,
> >  Nitin Solanki.
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Mime
View raw message