xml-xindice-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Vergnaud <dvergn...@yahoo.com>
Subject Re: Incorrect "No result for query" for XPath expression
Date Thu, 22 Oct 2009 10:14:55 GMT
Hi Natalia,

thx a lot for your reply and making this clear to me. I thought the default behaviour for
functions such as contains() when receiving a set of text nodes as argument was to apply the
processing to each node separately and in turn return a set of booleans. Seeing as this is
wrong, I perfectly get your point. As it appears, the online tool I used to check my claim
made the same incorrect assumption.

Again thanks and regards,

David



----- Original Message ----
From: Natalia Shilenkova <nshilenkova@gmail.com>
To: xindice-users@xml.apache.org
Sent: Thu, October 22, 2009 12:13:55 AM
Subject: Re: Incorrect "No result for query" for XPath expression

David,

The problem you're describing is not a bug, your XPath query is executed correctly.

Let's see what happens when query /martif/text/body/termEntry[contains(langSet/ntig/termGrp/term/text(),'bancaire')]
is executed. First, XPath finds all nodes  with path /martif/text/body/termEntry/langSet/ntig/termGrp/term
and selects their children text nodes. The result of this step is node-set, which includes
<term> data for every language. Then, XPath evaluates function contains(), where first
argument is node-set. Per XPath specification [1], function contains expects two arguments
of type string, not node-set, so it converts the first argument to string using function string().
When applied to node-set, it returns string value of the _first_ node in the document order.

Instead of checking <term> data for every language, it just checks if <term> data
contains given string for language that happened to be first in the document. You can easily
verify that by rearranging order of langSet tags in the document. The query /martif/text/body/termEntry[contains(langSet[starts-with(@lang,'fr')]/ntig/termGrp/term/text(),'bancaire')]
works because of the same reason: contains() function only gets one langSet.

If you want the query that would check all the text nodes to see if they contain some substring,
you can try something like that:
/martif/text/body/termEntry[langSet/ntig/termGrp/term[contains(text(),'bancaire')]]

[1] http://www.w3.org/TR/1999/REC-xpath-19991116

Regards,
Natalia


On Oct 21, 2009, at 12:02 PM, David Vergnaud wrote:

> Hi,
> 
> I'm reporting on a problem which I'm pretty much convinced is a bug in the current 1-2.dev
version of Xindice (1.2m1). I'm using Xindice running on its own (no Tomcat) as a daemon on
a Linux box (Suse 11) with JDK 1.6.
> 
> Basically, I have a DB where I've stored terminology entries that contain information
about various banking terms in 4 languages. I want to be able to conduct two types of searches,
one where the term is searched for only one of the languages, and one where the search is
carried out in all languages. For this, I use two versions of a somewhat complicated XPath
expression: one where the language is specified (as attribute of one of the nodes, in a predicate)
and one where it isn't. This is the only difference between the two expressions. Surprisingly,
the one where the language is fixed does return results where the one without specification
doesn't. Besides, I've tested the XPath expression on other systems, and seen that there really
should be results.
> 
> The first impression is that when evaluating function arguments inside a predicate, only
the first node of a node set is evaluated. In my case, that would be confirmed by the following
fact: each entry contains first the German word, then either French or English. When doing
an "unrefined" search (no language specification) with a German word, results are returned.
When doing the same unrefined search with French or English, no results are returned.
> 
> Here's an example of an XPath we're using, first with the language refinement, then without:
> /martif/text/body/termEntry[contains(langSet[starts-with(@lang,'fr')]/ntig/termGrp/term/text(),'bancaire')]
> /martif/text/body/termEntry[contains(langSet/ntig/termGrp/term/text(),'bancaire')]
> 
> As you can see, the goal is to extract a termEntry element which contains the word "bancaire"
under the specified path. In the first path, I set the langSet to have attribute lang start
with "fr" (for French), in the second I don't. As I said before, the first expression yields
a result and the second one doesn't.
> 
> I'm including an example DB entry which can be used to test this -- I assume it should
be possible to observe this behaviour with only one entry in the DB as well. In order to use
the xpath above with it, one would need to prefix all node names in the xpath expression with
"tbx" (I only removed that for legibility).
> 
> Should this prove to be an error on my side, I'd be grateful to anyone who'd point it
out. Otherwise, it might need to be taken onto the Xindice bug list.
> 
> Cheers,
> 
> David
> 
> <?xml version="1.0"?>
> <martif xmlns="http://www.lisa.org/tbx" type="TBX" xml:lang="de-CH">
>  <martifHeader>
>    <fileDesc>
>      <titleStmt>
>        <title>
>          Test-TerminologieDB        </title>
>      </titleStmt>
>      <publicationStmt>
>        <p>
>           Version 1.1        </p>
>      </publicationStmt>
>      <sourceDesc>
>        <p>
>           Version 1.1        </p>
>      </sourceDesc>
>    </fileDesc>
>  </martifHeader>
>  <text>
>    <body>
>      <termEntry>
>        <descrip type="classificationCode" />
>        <descrip type="subjectField">
>        </descrip>
>        <langSet xml:lang="de-CH">
>          <transacGrp>
>            <transac type="transactionType">
>              created            </transac>
>            <transacNote type="responsibility">
>              STEA            </transacNote>
>            <date>
>              2009-09-15T14:44:54.924+02:00            </date>
>          </transacGrp>
>          <descrip type="reliabilityCode">
>            1          </descrip>
>          <note />
>          <descripGrp>
>            <descrip type="definition">
>              Die Garantie ist eine selbstständige, vom Hauptschuldverhältnis unabhängige
Verpflichtung. Der Garant (die Bank) kann keinerlei Einwendungen und Einreden aus dem Grundgeschäft
erheben. Das heisst: Der Garant zahlt auf erste schriftliche Anforderung (Inanspruchnahme)
des Begünstigten, gegen Einreichung der im Garantietext vorgeschriebenen Bestätigung und
allenfalls vorgeschriebenen Dokumente.            </descrip>
>            <adminGrp>
>              <admin type="source">
>                CS Glossar              </admin>
>            </adminGrp>
>          </descripGrp>
>          <ntig>
>            <termGrp>
>              <term>
>                Bankgarantie              </term>
>              <termNote type="partOfSpeech" />
>              <termNote type="grammaticalGender" />
>              <termNote type="grammaticalNumber" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>              <termNote type="termType">
>                main              </termNote>
>              <termNote type="usageNote" />
>            </termGrp>
>            <adminGrp>
>              <admin type="source">
>                CS Glossar              </admin>
>              <note />
>            </adminGrp>
>            <descripGrp>
>              <descrip type="example" />
>              <adminGrp>
>                <admin type="source" />
>              </adminGrp>
>            </descripGrp>
>            <note />
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                abbr              </termNote>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                syn              </termNote>
>              <termNote type="grammaticalGender" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>              <note />
>            </adminGrp>
>            <descrip type="example" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>            <note />
>          </ntig>
>        </langSet>
>        <langSet xml:lang="en-GB">
>          <transacGrp>
>            <transac type="transactionType">
>              created            </transac>
>            <transacNote type="responsibility">
>              STEA            </transacNote>
>            <date>
>              2009-09-15T14:44:54.924+02:00            </date>
>          </transacGrp>
>          <descrip type="reliabilityCode">
>            1          </descrip>
>          <note />
>          <descripGrp>
>            <descrip type="definition" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </descripGrp>
>          <ntig>
>            <termGrp>
>              <term>
>                bank guarantee              </term>
>              <termNote type="partOfSpeech" />
>              <termNote type="grammaticalGender" />
>              <termNote type="grammaticalNumber" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>              <termNote type="termType">
>                main              </termNote>
>              <termNote type="usageNote" />
>            </termGrp>
>            <adminGrp>
>              <admin type="source">
>                CS Glossar              </admin>
>              <note />
>            </adminGrp>
>            <descripGrp>
>              <descrip type="example" />
>              <adminGrp>
>                <admin type="source" />
>              </adminGrp>
>            </descripGrp>
>            <note />
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                abbr              </termNote>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                syn              </termNote>
>              <termNote type="grammaticalGender" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>              <note />
>            </adminGrp>
>            <descrip type="example" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>            <note />
>          </ntig>
>        </langSet>
>        <langSet xml:lang="fr-CH">
>          <transacGrp>
>            <transac type="transactionType">
>              created            </transac>
>            <transacNote type="responsibility">
>              STEA            </transacNote>
>            <date>
>              2009-09-15T14:44:54.924+02:00            </date>
>          </transacGrp>
>          <descrip type="reliabilityCode">
>            1          </descrip>
>          <note />
>          <descripGrp>
>            <descrip type="definition" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </descripGrp>
>          <ntig>
>            <termGrp>
>              <term>
>                garantie bancaire              </term>
>              <termNote type="partOfSpeech" />
>              <termNote type="grammaticalGender" />
>              <termNote type="grammaticalNumber" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>              <termNote type="termType">
>                main              </termNote>
>              <termNote type="usageNote" />
>            </termGrp>
>            <adminGrp>
>              <admin type="source">
>                CS Glossar              </admin>
>              <note />
>            </adminGrp>
>            <descripGrp>
>              <descrip type="example" />
>              <adminGrp>
>                <admin type="source" />
>              </adminGrp>
>            </descripGrp>
>            <note />
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                abbr              </termNote>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                syn              </termNote>
>              <termNote type="grammaticalGender" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>              <note />
>            </adminGrp>
>            <descrip type="example" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>            <note />
>          </ntig>
>        </langSet>
>        <langSet xml:lang="it-CH">
>          <transacGrp>
>            <transac type="transactionType">
>              created            </transac>
>            <transacNote type="responsibility">
>              STEA            </transacNote>
>            <date>
>              2009-09-15T14:44:54.924+02:00            </date>
>          </transacGrp>
>          <descrip type="reliabilityCode">
>            1          </descrip>
>          <note />
>          <descripGrp>
>            <descrip type="definition" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </descripGrp>
>          <ntig>
>            <termGrp>
>              <term>
>                garanzia bancaria              </term>
>              <termNote type="partOfSpeech" />
>              <termNote type="grammaticalGender" />
>              <termNote type="grammaticalNumber" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>              <termNote type="termType">
>                main              </termNote>
>              <termNote type="usageNote" />
>            </termGrp>
>            <adminGrp>
>              <admin type="source">
>                CS Glossar              </admin>
>              <note />
>            </adminGrp>
>            <descripGrp>
>              <descrip type="example" />
>              <adminGrp>
>                <admin type="source" />
>              </adminGrp>
>            </descripGrp>
>            <note />
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                abbr              </termNote>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>          </ntig>
>          <ntig>
>            <termGrp>
>              <term />
>              <termNote type="termType">
>                syn              </termNote>
>              <termNote type="grammaticalGender" />
>              <termCompList type="lemma">
>                <termComp />
>              </termCompList>
>              <termCompList type="morphologicalElement">
>                <termComp />
>              </termCompList>
>            </termGrp>
>            <adminGrp>
>              <admin type="source" />
>              <note />
>            </adminGrp>
>            <descrip type="example" />
>            <adminGrp>
>              <admin type="source" />
>            </adminGrp>
>            <note />
>          </ntig>
>        </langSet>
>      </termEntry>
>    </body>
>  </text>
> </martif>
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Mime
View raw message