lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: Search agents
Date Wed, 04 Jan 2006 14:41:09 GMT

Have you considered the MemoryIndex for this sort of thing?  I've  
thought that it would make for an elegant way to handle this sort of  
"agent" or notification service such that new documents get indexed  
normally, but also a single document goes into a MemoryIndex and is  
matched against many queries.

It would be great to see the code you've developed.  You are free to  
contribute it to the Lucene contrib codebase.  If it is a substantial  
contribution it needs further discussion and perhaps incubation for  
it to be accepted.  There are no restrictions other than the Apache  
Software License on the code in the contrib area.  The "sandbox" is a  
deprecated term for what we now call "contrib".


On Jan 4, 2006, at 9:03 AM, karl wettin wrote:

> Hello list,
> I wrote a search agent thingy for Lucene. It was built to handle  
> huge amounts of agents.
> Rather than one query per agent to find out if the new document is  
> interesting or not, agent trigger queries are stored in an index  
> that is queried with the tokens of a new document.
> Since it uses the index a bit backwards  the agent trigger queries  
> are somewhat limited:
> At least one token in a OR or FUZZY OR per agent field must match  
> the new document.
> Any NOT token in agent must not match the new document.
> It is fairly easy to add more query types, but is limited to single  
> token and non-wildcard types since the query if created from the  
> new document tokens.
> Agents are clustered by required fields by agent, and each cluster  
> is stored in an own index. When a new document is sent to the  
> AgentManager it creates one query per possible cluster. I'm not  
> sure this actually speeds things up, just a gut feeling.
> Example agents in psuedo trigger query language:
> Possible agent:
> AND (OR ("category","media"))
> AND (OR ("name", "hotel") OR ("name","rowanda"))
> AND (NOT("name", "paradise"))
> Impossible agent:
> AND (OR ("category","media"))
> AND (("name", "hotel") AND ("name","rowanda"))
> AND (NOT("name", "paradise"))
> In effect the agents can't trigger on AND queries of the same field.
> One could of couse place a more complex query on the new document  
> as the agent triggers, use some classifier or whatever if speed is  
> not a big deal. The agent triggers could then be built from the  
> original query. I probably won't implement such a thing my self.
> Should I post the code to the sandbox when I've tested it? Are  
> there any restrictions to the code if I do that?
> -- 
> karl
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message