xml-xindice-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From C F <tacnab...@yahoo.com>
Subject Re: Should I Use Xindice for web-searchable XML Docs?
Date Wed, 05 Mar 2003 16:29:23 GMT

Those are some very interesting comments about Lucene & Xindice...   I wasn't even aware
of Lucene's existance.  I'm surprised that there hasn't been more emphasis placed on searchable
XML docs in the IT world in general.  It seems to me, that it has the potential to make our
text searches a 100 times more potent.  Well anyway, you've given me a lot to think about,
I'm going to dig in a bit deep now :)  Thanks
 "David J. Thomson" <dthomson@eecs.tufts.edu> wrote:Hello,

As far as I can tell, you can do basic pattern matching with the XPath
support in Xindice. You can search with an XPath query something like:

//contact[@zipcode=""]

I've been thinking about implementing searching in my own application, and
I was wondering what people thought about Lucene and synergies with
Xindice:

http://jakarta.apache.org/lucene/docs/index.html

I'm just brainstorming, but couldn't you create some type of translator to
associate an XML document in Xindice with a Document in Lucene? When a
search returns certain documents, grab it from Xindice using a unique
message_id associated with every document. Xindice then provides the
transaction integrity, security, and fast lookup, while Lucene could be
used just for more advanced indexing. Some type of wrapper would be
probably required to keep the documents synchronized (this could become a
big headache, not sure). The XPath support in Xindice could help to limit
the scope of queries into the Lucene index as negative clauses. In my
case, for example, I would return an XPath query for a particular type of
document (contact, message, transcript, sphere, etc.), and then use that
sub-set to query the index.

Another option for searching, which someone already mentioned, would be to
use some regex libraries to augment XPath. I recommend Jakarta ORO, but I
haven't done a thorough evaluation of all the ones out there, simply
because it suits my needs quite well. Still, you're better off using a
search "library" if you can, simply because of all the factors you'll want
to consider, such as word proximity, frequency of repetition, stop words,
etc. which will give you the "rank" you're talking about.

David



On Thu, 27 Feb 2003, [iso-8859-1] CÚdric Viaud wrote:

>
> Hi,
>
> In fact, with XPath, you can search for node values and search for a
> particular pattern in it. You are not limited to attributes values.
> But i don't think this kind of retrieval will be very fast.
>
> This also may be not very efficient as you can't search for regular
> expressions in the content (i'am almost sure, be it should better be
> checked).
>
> Regards,
>
> CÚdric
> ----- Original Message -----
>
>
>
>
> -----Original Message-----
> From: C F [mailto:tacnaboyz@yahoo.com]
> Sent: Wednesday, February 26, 2003 6:42 PM
> To: xindice-users@xml.apache.org
> Subject: RE: Should I Use Xindice for web-searchable XML Docs?
>
>
> Thanks Bob,
>
> Yes, that does help a lot as far as my concerns about performance.
> I'd also like some feedback on just the overall idea of what I want to
> do with xindice. Has anyone used it as kind of a 'search engine' like
> I'm talking about? Is the XPath support in Xindice fairly robust?
> I'm not sure yet whether or not it's an absolute requirement, but it
> seems like it would be pretty difficult to be able to *rank* search
> results.... anybody done anything like this with Xindice?





---------------------------------
Do you Yahoo!?
Yahoo! Tax Center - forms, calculators, tips, and more
Mime
View raw message