lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Number of clustering labels to show
Date Wed, 03 Jun 2015 06:36:08 GMT
Thank you so much for your explanation.

On 2 June 2015 at 17:31, Alessandro Benedetti <benedetti.alex85@gmail.com>
wrote:

> The scope in there is to try to make clustering lighter and more related to
> the query.
> The summary produced is a fragment that is surrounding the query terms in
> the document content.
> Actually this is arguably a way to improve the quality of clusters, but for
> sure it makes the clustering operation lighter, as the content used to
> produce the clusters is much smaller than the full content.
>
> We can discuss of course if the window of text surrounding queries match is
> really helpful to cluster the documents in a more precise way.
> That is not an easy research topic, and for sure it depends strictly on the
> use cases.
> For this reason a user should decide if going with the summary ( lighter)
> approach or the more comprehensive , full content approach.
>
> Cheers
>
> 2015-06-02 3:21 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
>
> > Thank you so much Alessandro.
> >
> > But i do not find any difference with the quality of the clustering
> results
> > when I change the hl.fragszie to a  even though I've set my
> > carrot.produceSummary to true.
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On 1 June 2015 at 17:31, Alessandro Benedetti <
> benedetti.alex85@gmail.com>
> > wrote:
> >
> > > Only to clarify the initial mail, The carrot.fragSize has nothing to do
> > > with the number of clusters produced.
> > >
> > > When you select to work with field summary ( you will work only on
> > snippets
> > > from the original content, snippets produced by the highlight of the
> > query
> > > in the content), the fragSize will specify the size of these fragments.
> > >
> > > From Carrot documentation :
> > >
> > > carrot.produceSummary
> > >
> > > When true, the carrot.snippet
> > > <https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet>
> field
> > > (if
> > > no snippet field, then the carrot.title
> > > <https://wiki.apache.org/solr/ClusteringComponent#carrot.title> field)
> > > will
> > > be highlighted and the highlighted text will be used for clustering.
> > > Highlighting is recommended when the snippet field contains a lot of
> > > content. Highlighting can also increase the quality of clustering
> because
> > > the clustered content will get an additional query-specific context.
> > > carrot.fragSize
> > >
> > > The frag size to use for highlighting. Meaningful only when
> > > carrot.produceSummary
> > > <
> https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary>
> > > is
> > > true. If not specified, the default highlighting fragsize (hl.fragsize)
> > > will be used. If that isn't specified, then 100.
> > >
> > >
> > > Cheers
> > >
> > > 2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>:
> > >
> > > > Thank you Stanislaw for the links. Will read them up to better
> > understand
> > > > how the algorithm works.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > > On 29 May 2015 at 17:22, Stanislaw Osinski <
> > > > stanislaw.osinski@carrotsearch.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The number of clusters primarily depends on the parameters of the
> > > > specific
> > > > > clustering algorithm. If you're using the default Lingo algorithm,
> > the
> > > > > number of clusters is governed by
> > > > > the LingoClusteringAlgorithm.desiredClusterCountBase parameter.
> Take
> > a
> > > > look
> > > > > at the documentation (
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
> > > > > )
> > > > > for some more details (the "Tweaking at Query-Time" section shows
> how
> > > to
> > > > > pass the specific parameters at request time). A complete overview
> of
> > > the
> > > > > Lingo clustering algorithm parameters is here:
> > > > > http://doc.carrot2.org/#section.component.lingo.
> > > > >
> > > > > Stanislaw
> > > > >
> > > > > --
> > > > > Stanislaw Osinski, stanislaw.osinski@carrotsearch.com
> > > > > http://carrotsearch.com
> > > > >
> > > > > On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo <
> > > > edwinyeozl@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm trying to increase the number of cluster result to be shown
> > > during
> > > > > the
> > > > > > search. I tried to set carrot.fragSize=20 but only 15 cluster
> > labels
> > > is
> > > > > > shown. Even when I tried to set carrot.fragSize=5, there's also
> 15
> > > > labels
> > > > > > shown.
> > > > > >
> > > > > > Is this the correct way to do this? I understand that setting
it
> to
> > > 20
> > > > > > might not necessary mean 20 lables will be shown, as the setting
> is
> > > for
> > > > > > maximum number. But when I set this to 5, it should reduce the
> > number
> > > > of
> > > > > > labels to 5?
> > > > > >
> > > > > > I'm using Solr 5.1.
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > > Edwin
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > --------------------------
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message