lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Weird Facet and KeywordTokenizerFactory Issue
Date Tue, 06 Oct 2009 23:58:20 GMT

A few comments about the info you've provided...

when you cut/pasted the facet output, you excluded the field names.  based 
on the schema & solrconfig.xml snippets you posted later, i'm assuming 
they are usstate, and keyword, but you have to be explicit so that people can help correlate
the 
results you are getting with the schema you posted -- for example, you 
haven't posted anything that would verify that the usstate field actually 
uses your keywordText field, for ll we know it has a different field type 
by mistake (which would explain your problem). ... you have to post 
everything that would let us connect the dots from input to output in 
order to see where things might be going wrong.

A huge gap is in what your synonym files contain ... something weird in 
there could easily explain superfluous terms getting added to your data.

all that said: my best guess is that you have old data in your index from 
an older version of your schema when you had differnet analyzers 
configured.

if a term is showing up in the facet counts, you can search on it -- find 
the first doc that matches, verify that the term isn't actually in the 
data, and then reindex that one doc -- if it stops matching your search 
(and the facet count drops by one) then i'm right, just reindex 
everything.

(this is where a timestamp field recording exactly when each doc was added 
to the index comes in handy, you can compare it with the file modification 
time on your schema.xml and be certain which docs where indexed prior to 
you changes)



-Hoss


Mime
View raw message