lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-5779) Improve BBox AreaSimilarity algorithm to consider lines and points
Date Thu, 19 Jun 2014 16:44:24 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-5779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated LUCENE-5779:
---------------------------------

    Attachment: LUCENE-5779__Improved_bbox_AreaSimilarity_algorithm.patch

The attached patch is a partial patch from LUCENE-5714 including just the AreaSimilarity class,
and the new test for BBoxStrategy which includes the test for this new similarity showing
examples scores.  Developing it surfaced a variety of dateline related bugs when computing
intersection width & height.

> Improve BBox AreaSimilarity algorithm to consider lines and points
> ------------------------------------------------------------------
>
>                 Key: LUCENE-5779
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5779
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spatial
>            Reporter: David Smiley
>         Attachments: LUCENE-5779__Improved_bbox_AreaSimilarity_algorithm.patch
>
>
> GeoPortal's area overlap algorithm didn't consider lines and points; they end up turning
the score 0.  I've thought about this for a bit and I've come up with an alternative scoring
algorithm.  (already coded and tested and documented):
> New Javadocs:
> {code:java}
> /**
>  * The algorithm is implemented as envelope on envelope overlays rather than
>  * complex polygon on complex polygon overlays.
>  * <p/>
>  * <p/>
>  * Spatial relevance scoring algorithm:
>  * <DL>
>  *   <DT>queryArea</DT> <DD>the area of the input query envelope</DD>
>  *   <DT>targetArea</DT> <DD>the area of the target envelope (per Lucene
document)</DD>
>  *   <DT>intersectionArea</DT> <DD>the area of the intersection between
the query and target envelopes</DD>
>  *   <DT>queryTargetProportion</DT> <DD>A 0-1 factor that divides the
score proportion between query and target.
>  *   0.5 is evenly.</DD>
>  *
>  *   <DT>queryRatio</DT> <DD>intersectionArea / queryArea; (see note)</DD>
>  *   <DT>targetRatio</DT> <DD>intersectionArea / targetArea; (see note)</DD>
>  *   <DT>queryFactor</DT> <DD>queryRatio * queryTargetProportion;</DD>
>  *   <DT>targetFactor</DT> <DD>targetRatio * (1 - queryTargetProportion);</DD>
>  *   <DT>score</DT> <DD>queryFactor + targetFactor;</DD>
>  * </DL>
>  * Note: The actual computation of queryRatio and targetRatio is more complicated so
that it considers
>  * points and lines. Lines have the ratio of overlap, and points are either 1.0 or 0.0
depending on wether
>  * it intersects or not.
>  * <p />
>  * Based on Geoportal's
>  * <a href="http://geoportal.svn.sourceforge.net/svnroot/geoportal/Geoportal/trunk/src/com/esri/gpt/catalog/lucene/SpatialRankingValueSource.java">
>  *   SpatialRankingValueSource</a> but modified. GeoPortal's algorithm will yield
a score of 0
>  * if either a line or point is compared, and it's doesn't output a 0-1 normalized score
(it multiplies the factors).
>  *
>  * @lucene.experimental
>  */
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message