lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mtraynham <mtrayn...@digitalsmiths.com>
Subject [Contribution] Multiword Inline-Prefix Autocomplete Idea
Date Fri, 20 May 2011 14:46:38 GMT
At my company. I've been spending some time figuring out the best approach
for inline prefix Auto completion.  Most of the support for auto completion
is based solely on prefix matching, as it can jump to a certain term within
a field quickly and break the enumeration loop when the prefix no longer
matches (really nice and freaking quick).  

Doing inline prefixing means you lose this functionality and have to check
each word specifically.

Some approaches I investigated:
- TermsComponent - test each term for inline prefixes
     - For large corpa, this is slow, really really slow...
- TST - currently only supports prefixing as well, but could manage to have
each node pointing back to the document and then perform document
intersection.
     - Probably the quickest solution, but rebuilding this tree every time a
commit happens could get really ugly on memory
- RAMDirectory - again a memory hog.

A quick and not so bad solution:
- New poly field type that splits a term value into prefix-able terms. 
- CopyField of dynamic type __s (string) to this fieldtype
- Ex. Jennifer Love Hewitt -> Jennifer Love Hewitt, Love Hewitt, Hewitt

How it looks indexed:
jennifer love hewitt<DELIM>Jennfier Love Hewitt
love hewitt<DELIM>Jennfier Love Hewitt
hewitt<DELIM>Jennfier Love Hewitt

As a user is typing name values we prefix match on the term they typed and
then return whatever is after the delimiter.  I also lower cased, so I could
get case insensitivity.

Note: The only case that I'm not currently supporting is out of order
prefixing, (e.g. user types Hewitt Jennfier).  Although this can be
accomplished using this approach, you would index each poly term split
separately and maintain a map while your prefix algorithm is running.

Thanks,
Matt

public class AutocompleteStrField extends StrField {

	private static Character DELIMITER = '\u00ff';
	
	@Override
	public boolean isPolyField(){
		return true;
	}

	/**
	 * Given a {@link org.apache.solr.schema.SchemaField}, create one or more
{@link org.apache.lucene.document.Fieldable} instances
	 * @param field the {@link org.apache.solr.schema.SchemaField}
	 * @param externalVal The value to add to the field
	 * @param boost The boost to apply
	 * @return An array of {@link org.apache.lucene.document.Fieldable}
	 *
	 * @see #createField(SchemaField, String, float)
	 * @see #isPolyField()
	 */
	@Override 
	public Fieldable[] createFields(SchemaField field, String externalVal,
float boost) {
		String[] st = externalVal.toLowerCase().split(" ");
		LinkedList<String> tokens = new LinkedList<String>(Arrays.asList(st));
		Fieldable[] f = new Fieldable[st.length];

		int count = 0;
		String value = "";
		while(!tokens.isEmpty()) {
			value = tokens.pollLast() + " " + value;
			f[count] = createField(field, value + DELIMITER + externalVal, boost);
			count++;
		}
		return f==null ? new Fieldable[]{} : f;
	}

	/** Given an indexed term, return the human readable representation */
	@Override
	public String indexedToReadable(String indexedForm) {
		return indexedForm.substring(indexedForm.lastIndexOf(DELIMITER) + 1);
	}
}


--
View this message in context: http://lucene.472066.n3.nabble.com/Contribution-Multiword-Inline-Prefix-Autocomplete-Idea-tp2965854p2965854.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message