lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scott cote <scottcc...@gmail.com>
Subject problem using faceting in 5.3
Date Tue, 03 Nov 2015 05:42:56 GMT
Hello All,

I have been given the envious job of upgrading existing faceted taxonomy indexes from 3.6
to 5.3.

To make sure that I have everything in working order, I have written a little program to “smoke
test” .  Facets retrieved in version 3 should be retrievable in version 5, or our upgrade
has failed.

Unfortunately, I can’t seem to put together a quick program to validate my date once it
is upgraded to version 5.  Can someone tell me where I have gone off the rails?



In this email, I include:

1. The 3.6.2 validation code … (establishes what should be seen after the upgrade runs)
1.1. mvn dependencies
1.2. source code
1.3. output
2. The lucene upgrade shell script
3. The 5.3.1 validation code (that doesn’t generates nulls and isn’t quiet right)
3.1. mvn dependencies
3.2. source code
4.  The url for the compressed tar file of the index data stored in drop box.

Here are the key maven dependencies that I used for the 3.6 source:
<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>3.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-facet</artifactId>
    <version>3.6.2</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>3.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queries</artifactId>
    <version>3.6.0</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>3.6.0</version>
</dependency>


Here is the code to retrieve facet data from the version 3.6 index (which does work against
version 3.6 lucene):

public class FacetRunner {
    public static void main(final String[] args) throws Exception {
        File indexDirFile = new File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene3/data/doc-index/lucene");
        Directory indexDir = new SimpleFSDirectory(indexDirFile);
        IndexReader indexReader = IndexReader.open(indexDir);
        Searcher searcher = new IndexSearcher(indexReader);

        File taxonomyIndexDirFile = new File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene3/data/facets");
        Directory taxonomyIndexDir = new SimpleFSDirectory(taxonomyIndexDirFile);
        TaxonomyReader taxo = new DirectoryTaxonomyReader(taxonomyIndexDir);

        Term aTerm = new Term("$facets", "$fulltree$");//     new Term("text", "clarissa");
        Query q = new TermQuery(aTerm);
        TopScoreDocCollector tdc = TopScoreDocCollector.create(10,true);

        FacetSearchParams facetSearchParams = new FacetSearchParams();

                facetSearchParams.addFacetRequest(new CountFacetRequest(
                new CategoryPath("brs_recipient_domain"), 10));


        FacetsCollector facetsCollector = new FacetsCollector(facetSearchParams, indexReader,
taxo);

        searcher.search(q, MultiCollector.wrap(tdc, facetsCollector));
        List<FacetResult> res = facetsCollector.getFacetResults();
        for (FacetResult facetResult:res) {
            System.out.println(facetResult.toString());
        }

    }
Output looks like:

Request: brs_recipient_domain nRes=10 nLbl=10
Num valid Descendants (up to specified depth): 486
	Facet Result Node with 10 sub result nodes.
	Name: brs_recipient_domain
	Value: 2896.0
	Residue: 1497.0

	Subresult #0
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/enron.com
		Value: 1979.0
		Residue: 0.0

	Subresult #1
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/aol.com
		Value: 124.0
		Residue: 0.0

	Subresult #2
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/bracepatt.com
		Value: 84.0
		Residue: 0.0

	Subresult #3
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/txu.com
		Value: 63.0
		Residue: 0.0

	Subresult #4
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/hotmail.com
		Value: 46.0
		Residue: 0.0

	Subresult #5
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/teneo-test.com
		Value: 42.0
		Residue: 0.0

	Subresult #6
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/yahoo.com
		Value: 41.0
		Residue: 0.0

	Subresult #7
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/dttus.com
		Value: 34.0
		Residue: 0.0

	Subresult #8
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/velaw.com
		Value: 30.0
		Residue: 0.0

	Subresult #9
		Facet Result Node with 0 sub result nodes.
		Name: brs_recipient_domain/netzero.net
		Value: 28.0
		Residue: 0.0


Process finished with exit code 0


To upgrade the indexes, I have written a shell script that runs the IndexUpgrader using the
4.10.4 core jar to bring the facet index to 4 and the document index to 4. 


#!/bin/sh

export JARS_HOME=/users/scott/projects/prototypes/lucene-3-and-5/jars

echo "===>>>>>migrating lucene data from 3 to 4<<<<<========="
echo
export LUCENE_4_PATH=$JARS_HOME/lucene-core-4.10.4.jar

date "+DATE: %Y-%m-%d%nTIME: %H:%M:%S"
echo "upgrading facets taxonomy indices from 3 to 4 with command time java -cp $LUCENE_4_PATH
org.apache.lucene.index.IndexUpgrader facets"
time java -cp $LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader facets
echo
echo "upgrading document  indices from 3 to 4 with command time java -cp $LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader
doc-index/lucene"
time java -cp $LUCENE_4_PATH org.apache.lucene.index.IndexUpgrader doc-index/lucene
echo
echo "===>>>>>migrating lucene data from 4 to 5<<<<<========="
echo
export LUCENE_5_PATH=$JARS_HOME/lucene-backward-codecs-5.3.1.jar:$JARS_HOME/lucene-core-5.3.1.jar

echo "upgrading facets taxonomy indices from 4 to 5 with command time java -cp $LUCENE_5_PATH
org.apache.lucene.index.IndexUpgrader facets"
time java -cp $LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader facets
echo
echo "upgrading document  indices from 4 to 5 with command time java -cp $LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader
doc-index/lucene"
time java -cp $LUCENE_5_PATH org.apache.lucene.index.IndexUpgrader doc-index/lucene
echo 
echo "done upgrading from lucene 3 to lucene 5"
date "+DATE: %Y-%m-%d%nTIME: %H:%M:%S"

no errors occur.

At this point, my index documents look like version 5 lucene.

Now I want to validate my indexes and pull similar (if not the same data) from the upgraded
indexes.



Here are the maven dependencies for the 5.3.1. source


<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-facet</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-highlighter</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queries</artifactId>
    <version>5.3.1</version>
</dependency>

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-queryparser</artifactId>
    <version>5.3.1</version>
</dependency>


Here is my 5.3.1  program - it return’s nulls - what am I doing wrong?.



    public static void main(final String[] args) throws Exception {
        File indexDirFile = new File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene5/data/doc-index/lucene");
        
        Path indexDirFilePath = indexDirFile.toPath(); 
        Directory indexDir = new SimpleFSDirectory(indexDirFilePath);
        IndexReader indexReader = DirectoryReader.open(indexDir);

        IndexSearcher searcher = new IndexSearcher(indexReader);

        File taxonomyIndexDirFile = new File("/Users/scott/projects/prototypes/lucene-3-and-5/lucene5/data/facets");
        Path taxonomyIndexDirFilePath = taxonomyIndexDirFile.toPath();
        Directory taxonomyIndexDir = new SimpleFSDirectory(taxonomyIndexDirFilePath);
        TaxonomyReader taxo = new DirectoryTaxonomyReader(taxonomyIndexDir);

        Term aTerm = new Term("$facets", "$fulltree$");
        Query q = new TermQuery(aTerm);


        FacetsCollector facetsCollector = new FacetsCollector();

        //searcher.search(q, MultiCollector.wrap(tdc, facetsCollector));
        //FacetsCollector.search(searcher, new MatchAllDocsQuery(),10,facetsCollector);
        FacetsCollector.search(searcher, q, 10, facetsCollector);

        FacetsConfig config = new FacetsConfig();
        //config.set
        Facets facets = new FastTaxonomyFacetCounts(taxo, config, facetsCollector);
        FacetResult result = facets.getTopChildren(10, "brs_recipient_domain");



        for (LabelAndValue labelValue : result.labelValues) {
            System.out.println(String.format("%s (%s)", labelValue.label, labelValue.value));
        }

    }
Here is the url to a gzipped tar that contains the index (not yet upgraded):  https://www.dropbox.com/s/qbr7ogwgekatrdf/faceted_lucene_data.tar.gz?dl=0
<https://www.dropbox.com/s/qbr7ogwgekatrdf/faceted_lucene_data.tar.gz?dl=0>

Thanks for your help.

SCott
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message