lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-7005) facet.heatmap for spatial heatmap faceting on RPT
Date Tue, 03 Feb 2015 17:39:34 GMT

    [ https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303651#comment-14303651
] 

Hoss Man commented on SOLR-7005:
--------------------------------

bq. I'm confused about FacetComponent.distributedProcess() line ~215 (removal of faceting
types when distribFieldFacetRefinements != null). Chris Hostetter Which faceting types should
be removed here; why is it just facet.field and facet.query; maybe the others should too?

I'm confused to. (admitedly i haven't looked at it very hard today)

I suspect this code is just really old, from the back when only facet.field & facet.query
existed.  I suspect that at that point in time, the idea was:

1) remove *every* the facet.field params, because we're about loop over the ones we know still
need refinment and add them
2) remove *any* facet.query, because they never need refined

You'll note that a few lines down ~233 there is a similar block of code relating to facet.pivot
& facet.pivot.mincount -- aparently for the same reasons as #1 above.

bq. ...Which faceting types should be removed here; why is it just facet.field and facet.query;
maybe the others should too?

i suspect it's safe/efficient to remove all the facet params up front, and let the various
types of faceting re-add the params they need if/when they need refined? ... but i'm not certain
about that.

the thing to do is setup a simple cluster where the field terms are vastly diff between two
shards (to force refinement) and then look at what distributed refinement requests are sent
to each shard when combining multiple types of faceting -- make sure that a facet.field +
facet.range + facet.query + facet.pivot + facet.heatmap that requires refinement on the facet.field
doesn't unneccessarily re-request the same facet.range + facet.query + facet.pivot + facet.heatmap
if they don't also need refinement.


> facet.heatmap for spatial heatmap faceting on RPT
> -------------------------------------------------
>
>                 Key: SOLR-7005
>                 URL: https://issues.apache.org/jira/browse/SOLR-7005
>             Project: Solr
>          Issue Type: New Feature
>          Components: spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.1
>
>         Attachments: SOLR-7005_heatmap.patch, SOLR-7005_heatmap.patch, SOLR-7005_heatmap.patch,
heatmap_512x256.png, heatmap_64x32.png
>
>
> This is a new feature that uses the new spatial Heatmap / 2D PrefixTree cell counter
in Lucene spatial LUCENE-6191.  This is a form of faceting, and as-such I think it should
live in the "facet" parameter namespace.  Here's what the parameters are:
> * facet=true
> * facet.heatmap=fieldname
> * facet.heatmap.bbox=\["-180 -90" TO "180 90"]
> * facet.heatmap.gridLevel=6
> * facet.heatmap.distErrPct=0.10
> Like other faceting features, the fieldName can have local-params to exclude filter queries
or specify an output key.
> The bbox is optional; you get the whole world or you can specify a box or actually any
shape that WKT supports (you get the bounding box of whatever you put).
> Ultimately, this feature needs to know the grid level, which together with the input
shape will yield a certain number of cells.  You can specify gridLevel exactly, or don't and
instead provide distErrPct which is computed like it is for the RPT field type as seen in
the schema.  0.10 yielded ~4k cells but it'll vary.  There's also a facet.heatmap.maxCells
safety net defaulting to 100k.  Exceed this and you get an error.
> The output is (JSON):
> {noformat}
> {gridLevel=6,columns=64,rows=64,minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0,counts=[[0,
0, 2, 1, ....],[1, 1, 3, 2, ...],...]}
> {noformat}
> counts is null if all would be 0.  Perhaps individual row arrays should likewise be null...
I welcome feedback.
> I'm toying with an output format option in which you can specify a base-64'ed grayscale
PNG.
> Obviously this should support sharded / distributed environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message