lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gareth Harper <gareth.har...@mandp.com>
Subject RE: Can't find Japanese words ending with numbers
Date Wed, 17 Apr 2019 10:44:58 GMT
Could someone please take me off this mailing list.

-----Original Message-----
From: Antonio Facciorusso <A.Facciorusso@westpole.it> 
Sent: 17 April 2019 11:05
To: users@jackrabbit.apache.org; general@lucene.apache.org
Subject: Can't find Japanese words ending with numbers

Dear all,

I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.

I have a node of type "mynodetype" having a property named "description" having the following
value: "横浜第2センタ". If I perform a full-text search using "jcr:contains" like:

jcr:contains(., '<value>*')

this query returns 0 results:
"//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"

while all of the following work correctly and return at least one result:

"//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
"//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"

I tried using both the default analyzer and the Japanese one (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html).

This is the content of my indexingConfiguration.xml file:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <index-rule nodeType="entity">
        <!-- simple properties -->
        <property isRegexp="true">.*:[^_]+</property>
        <!-- resources_data_xxx -->
        <property isRegexp="true">.*:resources_data_[^_]+</property>
        <!-- resources_xxx (with xxx != 'data') -->
        <property isRegexp="true">.*:resources_data[^_]+</property>
        <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
        <!-- resourcesxyz_xxx -->
        <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
        <!-- all other xxx_yyy (with xxx != resources) -->
        <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
    </index-rule>
</configuration>

Should I use a different configuration/analyzer? Is it a bug?

Thank you.

Best regards,
Antonio.
[https://westpole.it/firma/logo.png]

Antonio Facciorusso
WebRainbow(r) Software Analyst & Developer

P +39 051 8550 562
M +39 335 1219330
E A.Facciorusso@westpole.it
W https://westpole.webex.com/meet/A.Facciorusso
A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno

[https://westpole.it/firma/sito.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png]
<https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/facebook.png] <https://www.facebook.com/WESTPOLESPA/>
  [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>


This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR)
may contain confidential and/or privileged information for the exclusive use of the intended
recipient. Any review or distribution by others is strictly prohibited. If you are not the
intended recipient, you must not use, copy, disclose or take any action based on this message
or any information here. If you have received this email in error, please contact us (email:privacy@westpole.it)
by reply email and delete all copies. Legal privilege is not waived because you have read
this email. Thank you for your cooperation.


[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this
email


________________________________________________________________________
This e-mail has been scanned for all viruses by Claranet. The service is powered by MessageLabs.
For more information on a proactive anti-virus service working around the clock, around the
globe, visit:
http://www.claranet.co.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs - For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Mime
View raw message