lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <>
Subject Re: "Advanced" query language
Date Tue, 06 Dec 2005 22:22:13 GMT
Yonik wrote:
> For normal text data, with valid unicode characters that aren't legal
> XML, I'd rather have a simple escaping mechanism.  Something like
> backslash escaping that is easily understood.  Maybe something as
> simple as \00 for &#0; (backslash followed by two hex digits).

I agree with your goal of transparency, especially for the cases of 
human authorship.

However, I don't agree with the idea of an application-specific escape 
syntax.  What if someone wants to use the query metacharacter(s) ('\' in 
your example) literally?  The usual answer is to escape the 
metacharacters, e.g. "\\00" to encode literal "\00".  But *especially* 
for the human-authored cases, introduction of this complexity is less 
than ideal.

An alternative mechanism could be empty XML elements, e.g.:

<Term field="field"><UnicodeCharacter hex="00"/></Term>

Or less verbosely, with a fixed set of element names (and there are 28 
of these, right?: [#x00-#x08] | #x0B | #x0C | [#x0E-#x1F]):

   <Term field="field"><Char00/></Term>


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message