lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-7005) facet.heatmap for spatial heatmap faceting on RPT
Date Wed, 21 Jan 2015 17:17:35 GMT

     [ https://issues.apache.org/jira/browse/SOLR-7005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated SOLR-7005:
-------------------------------
    Attachment: heatmap_64x32.png
                heatmap_512x256.png

There are some performance #'s on LUCENE-6191.

I experimented with generating a PNG to carry the data in a compressed manner, since this
data can get large.  I'm abusing the image to carry the same detail in the counts, and that
means 4 bytes per pixel.  Counts > 16M touch the high byte of a 4-byte int, which is where
the alpha channel is, which will progressively lighten the image.  _The image is not at all
optimized for human viewing that is pleasant on the eyes_, except for the bit flip of the
high (alpha channel) byte; otherwise you would see nothing until the counts exceed this figure.
 That said, it's crude and you can get a sense of it.  _If people have input on how to cheaply
and easily tweak the value to look nicer, I'm interested._  Since a client app may consume
this PNG if it wants this compressed format and render it the way it wants to, there should
be a straight-forward algorithm to derive the count from the ARGB (alpha, red, green, blue)
int.

The attached PNG is 512x256 (131,072 cells mind you!) of the 8.5M geonames data set.  On a
16 segment index with no search filters, it took 882ms to compute the underlying heatmap,
and 218ms to build the PNG and write it to disk.  The write-to-disk hack is temporary to easily
view the image by opening it from the file system.  You can expect there will be more time
in consuming this image from Solr's javabin/XML/JSON + base64 wrapper (whatever you choose).

Now a 512x256 image is so detailed that it arguably isn't a heatmap but another way to go
about rendering individual points.  A more course, say, 64x32 image would be more true to
the heatmap label, and obviously much faster to generate -- like 100ms + only ~2ms to generate
the PNG.

> facet.heatmap for spatial heatmap faceting on RPT
> -------------------------------------------------
>
>                 Key: SOLR-7005
>                 URL: https://issues.apache.org/jira/browse/SOLR-7005
>             Project: Solr
>          Issue Type: New Feature
>          Components: spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.1
>
>         Attachments: heatmap_512x256.png, heatmap_64x32.png
>
>
> This is a new feature that uses the new spatial Heatmap / 2D PrefixTree cell counter
in Lucene spatial LUCENE-6191.  This is a form of faceting, and as-such I think it should
live in the "facet" parameter namespace.  Here's what the parameters are:
> * facet=true
> * facet.heatmap=fieldname
> * facet.heatmap.bbox=\["-180 -90" TO "180 90"]
> * facet.heatmap.gridLevel=6
> * facet.heatmap.distErrPct=0.10
> Like other faceting features, the fieldName can have local-params to exclude filter queries
or specify an output key.
> The bbox is optional; you get the whole world or you can specify a box or actually any
shape that WKT supports (you get the bounding box of whatever you put).
> Ultimately, this feature needs to know the grid level, which together with the input
shape will yield a certain number of cells.  You can specify gridLevel exactly, or don't and
instead provide distErrPct which is computed like it is for the RPT field type as seen in
the schema.  0.10 yielded ~4k cells but it'll vary.  There's also a facet.heatmap.maxCells
safety net defaulting to 100k.  Exceed this and you get an error.
> The output is (JSON):
> {noformat}
> {gridLevel=6,columns=64,rows=64,minX=-180.0,maxX=180.0,minY=-90.0,maxY=90.0,counts=[[0,
0, 2, 1, ....],[1, 1, 3, 2, ...],...]}
> {noformat}
> counts is null if all would be 0.  Perhaps individual row arrays should likewise be null...
I welcome feedback.
> I'm toying with an output format option in which you can specify a base-64'ed grayscale
PNG.
> Obviously this should support sharded / distributed environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message