lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: facets - id and display value
Date Sat, 21 Aug 2010 21:14:31 GMT
Faceting harvests the fields that are already indexed (so you have to
both store and index the fields) and uses Java object refs (pointers),
without copying the facet values. You know how log files have
multi-line exception stacks & the like? The multi-line exception
stacks after the real log line tend to be the same. I grabbed all of
the lines after each log line and made facets out of them. Worked
quite well for counting "this exception stack happened 42 times, this
other one 250 times". So huge string fields work as facets.

I don't know if 'facet.prefix' on 50 characters is faster than 'q=' on
200 characters.

Sending a giant query is easy: use a POST instead of a GET.

If searching on giant facet strings really is a problem, add a hash
code to each facet string. Then, add a separate matching field in each
document that only stores that hashcode. Now, instead of searching on
the giant facet, you pull the hashcode out of it and search the
separate field for that.


On Fri, Aug 20, 2010 at 9:56 PM, Jonathan Rochkind <rochkind@jhu.edu> wrote:
> "A common way is to make a facet string of categoryId-2_name_imageurl.
> Then in your UI display the categoryId part of the facet."
>
> I've been thinking about  doing something like this for the same purposes. Will having
an "extra long" facet string like that have any effect on faceting performace?  How about
facet sorting with facet.sort=index?  In my case, the first part of the facet string would
be a 'sortable' value that sorts how I want, not just an id.
>
> I use facet.sort=index, but my display labels don't actually sort the way I want, so
I'm thinking of making a sort key that does, and storing "sortkey_label" in the actual facet
value.  But I worry this may have an effect on performance if the string gets really long.
But I'm thinking/hoping it won't -- at least for faceting the length of string shoudln't matter,
I think, but not sure about for sorting.  [Obviously you have to make sure to not accidentally
store the same 'id' with differently serialized 'metadata', or you'd wind up with two facet
values where you meant to have one].
>
> Is there any reason I couldn't use some non-printing control char as the seperator, instead
of just in that example ascii underscore?
>
> And then the other thing is, once I have these weird long facet strings with embedded
'metadata', if I actually want to 'fq' on one, I need to pass that whole weird string in the
fq, clearly.  How do people generally deal with this, using this technique? Just do it, pass
the whole string?  Use some sort of 'prefix' technique (I guess that would be the * wildcard
in the fq)?  Use two different solr fields, one for faceting with embedded metadata, and
a different one with the same values without embedded metadata for actual 'fq' filtering?
>
> Thanks for any tips,
>
> Jonathan
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message