lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Audenaerde (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-5370) Sorting Facets on CategoryPath (Label)
Date Fri, 21 Mar 2014 11:41:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942986#comment-13942986
] 

Rob Audenaerde edited comment on LUCENE-5370 at 3/21/14 11:40 AM:
------------------------------------------------------------------

I currently use the code below:

{code}
	private FacetResult getTopValueChildren( int topN, final SortDir sortDir, String dim, String...
path ) throws IOException
	{
		if ( topN <= 0 )
		{
			throw new IllegalArgumentException( "topN must be > 0 (got: " + topN + ")" );
		}

		DimConfig dimConfig = this.verifyDim( dim );
		FacetLabel cp = new FacetLabel( dim, path );
		int dimOrd = this.taxoReader.getOrdinal( cp );
		if ( dimOrd == -1 )
		{
			return null;
		}

		TopOrdAndLabelQueue q = new TopOrdAndLabelQueue( Math.min( this.taxoReader.getSize(), topN
) )
		{
			@Override
			protected boolean lessThan( OrdAndLabel a, OrdAndLabel b )
			{
				if ( sortDir == SortDir.DESC )
				{
					return super.lessThan( a, b );
				}
				else
				{
					return !super.lessThan( a, b );
				}
			}

		};

		int ord = this.children[dimOrd];
		int totValue = 0;
		int childCount = 0;

		TopOrdAndLabelQueue.OrdAndLabel reuse = null;
		while ( ord != TaxonomyReader.INVALID_ORDINAL )
		{
			if ( this.values[ord] > 0 )
			{
				totValue += this.values[ord];
				childCount++;
				if ( reuse == null )
				{
					reuse = new TopOrdAndLabelQueue.OrdAndLabel();
				}
				reuse.ord = ord;
				reuse.value = this.values[ord];
				reuse.label = this.taxoReader.getPath( ord ).components[cp.length];
				reuse = q.insertWithOverflow( reuse );
			}

			ord = this.siblings[ord];
		}

		if ( totValue == 0 )
		{
			return null;
		}

		if ( dimConfig.multiValued )
		{
			if ( dimConfig.requireDimCount )
			{
				totValue = this.values[dimOrd];
			}
			else
			{
				// Our sum'd value is not correct, in general:
				totValue = -1;
			}
		}
		else
		{
			// Our sum'd dim value is accurate, so we keep it
		}

		LabelAndValue[] labelValues = new LabelAndValue[q.size()];
		for ( int i = labelValues.length - 1; i >= 0; i-- )
		{
			TopOrdAndLabelQueue.OrdAndLabel ordAndValue = q.pop();
			labelValues[i] = new LabelAndValue( ordAndValue.label, ordAndValue.value );
		}

		return new FacetResult( dim, path, totValue, labelValues, childCount );
	}
{code}

I use the same approach as sorting on counts, except that I sort on the label instead. It
costs some in terms of retrieving the labels from the taxonomyreader.

So I ignore the counts in terms of sorting; but I do use them because the user is interesed
in the counts fo the sorted facet labels. 

Btw. I'm currently experimenting with a similar approach where we have facetlabels  that are
effectively numbers (like currency).  Because I do not know on beforehand what will be in
the facets, I put the String representation in the FacetLabel and store the numberic value
in the Float part of a FloatAssociatedFacetField. Facets then can be sorted on the FloatAssociated
value, which should be faster than retrieving labels from the reader.


was (Author: robau):
I currently use the code below:

{code}
	private FacetResult getTopValueChildren( int topN, final SortDir sortDir, String dim, String...
path ) throws IOException
	{
		if ( topN <= 0 )
		{
			throw new IllegalArgumentException( "topN must be > 0 (got: " + topN + ")" );
		}

		DimConfig dimConfig = this.verifyDim( dim );
		FacetLabel cp = new FacetLabel( dim, path );
		int dimOrd = this.taxoReader.getOrdinal( cp );
		if ( dimOrd == -1 )
		{
			return null;
		}

		TopOrdAndLabelQueue q = new TopOrdAndLabelQueue( Math.min( this.taxoReader.getSize(), topN
) )
		{
			@Override
			protected boolean lessThan( OrdAndLabel a, OrdAndLabel b )
			{
				if ( sortDir == SortDir.DESC )
				{
					return super.lessThan( a, b );
				}
				else
				{
					return !super.lessThan( a, b );
				}
			}

		};

		int ord = this.children[dimOrd];
		int totValue = 0;
		int childCount = 0;

		TopOrdAndLabelQueue.OrdAndLabel reuse = null;
		while ( ord != TaxonomyReader.INVALID_ORDINAL )
		{
			if ( this.values[ord] > 0 )
			{
				totValue += this.values[ord];
				childCount++;
				if ( reuse == null )
				{
					reuse = new TopOrdAndLabelQueue.OrdAndLabel();
				}
				reuse.ord = ord;
				reuse.value = this.values[ord];
				reuse.label = this.taxoReader.getPath( ord ).components[cp.length];
				reuse = q.insertWithOverflow( reuse );
			}

			ord = this.siblings[ord];
		}

		if ( totValue == 0 )
		{
			return null;
		}

		if ( dimConfig.multiValued )
		{
			if ( dimConfig.requireDimCount )
			{
				totValue = this.values[dimOrd];
			}
			else
			{
				// Our sum'd value is not correct, in general:
				totValue = -1;
			}
		}
		else
		{
			// Our sum'd dim value is accurate, so we keep it
		}

		LabelAndValue[] labelValues = new LabelAndValue[q.size()];
		for ( int i = labelValues.length - 1; i >= 0; i-- )
		{
			TopOrdAndLabelQueue.OrdAndLabel ordAndValue = q.pop();
			labelValues[i] = new LabelAndValue( ordAndValue.label, ordAndValue.value );
		}

		return new FacetResult( dim, path, totValue, labelValues, childCount );
	}
{code}

> Sorting Facets on CategoryPath (Label)
> --------------------------------------
>
>                 Key: LUCENE-5370
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5370
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>    Affects Versions: 4.6
>            Reporter: Rob Audenaerde
>              Labels: features
>
> Facet support sorting through {{FacetRequest.SortOrder}}. This is used in the {{ResultSortUtils}}.
For my application it would be very nice if the facets can also be sorted on their label.

> I think this could be accomplished by altering {{FacetRequest}} with an extra enum {{SortType}},
and two extra {{Heap}}  in {{ResultSortUtils}} which instead of comparing the double value,
compare the CategoryPath.
> What do you think of this idea? Or could the same behaviour be accomplished in a different
way already?
> (btw: I tried building this patch on the trunk of lucene5.0; but I couldn't get the maven
build to build correctly. I will try again lateron on the 4.6 branch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message