lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wael Kader <w...@softech-lb.com>
Subject Re: Faceting Word Count
Date Mon, 06 Nov 2017 12:06:28 GMT
Hi,

I am using a custom field. Below is the field definition.
I am using this because I don't want stemming.


    <fieldType name="text_no_stem2" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="0"
                generateNumberParts="1"
                catenateWords="1"
                catenateNumbers="1"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>

        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
<!--ORIGINAL                generateNumberParts="1"-->
        <filter class="solr.WordDelimiterFilterFactory"
                protected="protwords.txt"
                generateWordParts="0"
                catenateWords="0"
                catenateNumbers="0"
                catenateAll="0"
                splitOnCaseChange="1"
                preserveOriginal="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!-- ORIGINAL filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/-->
        <!-- Webel: switch off Porter-stemmer algorithm to enforce whole
word match -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


Regards,
Wael

On Mon, Nov 6, 2017 at 10:29 AM, Emir Arnautović <
emir.arnautovic@sematext.com> wrote:

> Hi Wael,
> Can you provide your field definition and sample query.
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 6 Nov 2017, at 08:30, Wael Kader <wael@softech-lb.com> wrote:
> >
> > Hello,
> >
> > I am having an index with around 100 Million documents.
> > I have a multivalued column that I am saving big chunks of text data in.
> It
> > has around 20 GB of RAM and 4 CPU's.
> >
> > I was doing faceting on it to get word cloud but it was taking around 1
> > second to retrieve when the data was 5-10 Million .
> > Now I have more data and its taking minutes to get the results (that is
> if
> > it gets it and SOLR doesn't crash). Whats the best way to make it run or
> > maybe its not scalable to make it run on my current schema and design
> with
> > News articles.
> >
> > I am looking to find the best solution for this. Maybe create another
> index
> > to split the data while inserting it or maybe if I change some settings
> in
> > SolrConfig or add some RAM, it would perform better.
> >
> > --
> > Regards,
> > Wael
>
>


-- 
Regards,
Wael

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message