lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Query fields with data of certain length
Date Thu, 01 Feb 2018 09:05:34 GMT
Hi Edwin,
Unfortunately, I was not able find regex that would work in your case.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Feb 2018, at 05:42, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:
> 
> Hi,
> 
> Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法
?
> 
> Regards,
> Edwin
> 
> 
> On 4 January 2018 at 18:04, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
> wrote:
> 
>> Hi Emir,
>> 
>> An example of the string in Chinese is 预支款管理及账务处理办法
>> 
>> The number of characters is 12, but the expected length should be 36.
>> 
>> Regards,
>> Edwin
>> 
>> 
>> On 4 January 2018 at 16:21, Emir Arnautović <emir.arnautovic@sematext.com>
>> wrote:
>> 
>>> Hi Edwin,
>>> I don’t have enough knowledge in eastern languages to know what is
>>> expected number when you as for sting length. Maybe you can try some of
>>> regex unicode settings and see if you’ll get what you need: try setting
>>> unicode flag with (?U) or try using regex groups and ranges. If you provide
>>> example string and expected length, maybe we could provide you regex.
>>> 
>>> Thanks,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 4 Jan 2018, at 04:37, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
>>> wrote:
>>>> 
>>>> Hi Emir,
>>>> 
>>>> So this would likely be different from what the operating system
>>> counts, as
>>>> the operating system may consider each Chinese characters as 3 to 4
>>> bytes.
>>>> Which is probably why I could not find any record with
>>> subject:/.{255,}.*/
>>>> 
>>>> Is there other tools that we can use to query the length for data that
>>> are
>>>> already indexed which are not in the standard English language? (Eg:
>>>> Chinese, Japanese, etc)
>>>> 
>>>> Regards,
>>>> Edwin
>>>> 
>>>> On 3 January 2018 at 23:51, Emir Arnautović <
>>> emir.arnautovic@sematext.com>
>>>> wrote:
>>>> 
>>>>> Hi Edwin,
>>>>> I do not know, but my guess would be that each character is counted as
>>> 1
>>>>> in regex regardless how many bytes it takes in used encoding.
>>>>> 
>>>>> Regards,
>>>>> Emir
>>>>> --
>>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>>> Solr & Elasticsearch Consulting Support Training -
>>> http://sematext.com/
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 3 Jan 2018, at 16:43, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Thanks for the reply.
>>>>>> 
>>>>>> I am doing the search on existing data that has already been indexed,
>>> and
>>>>>> it is likely to be a one time thing.
>>>>>> 
>>>>>> This  subject:/.{255,}.*/  works for English characters. However,
>>> there
>>>>> are
>>>>>> Chinese characters in some of the records. The length seems to be
more
>>>>> than
>>>>>> 255, but it does not shows up in the results.
>>>>>> 
>>>>>> Do you know how the length for Chinese characters and other languages
>>> are
>>>>>> being determined?
>>>>>> 
>>>>>> Regards,
>>>>>> Edwin
>>>>>> 
>>>>>> 
>>>>>> On 3 January 2018 at 23:01, Alexandre Rafalovitch <arafalov@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Do that during indexing as Emir suggested. Specifically, use
an
>>>>>>> UpdateRequestProcessor chain, probably with the Clone and FieldLength
>>>>>>> processors: http://www.solr-start.com/javadoc/solr-lucene/org/
>>>>>>> apache/solr/update/processor/FieldLengthUpdateProcessorFactory.html
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Alex.
>>>>>>> 
>>>>>>> On 31 December 2017 at 22:00, Zheng Lin Edwin Yeo <
>>> edwinyeozl@gmail.com
>>>>>> 
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Would like to check, if it is possible to query a field which
has
>>> data
>>>>> of
>>>>>>>> more than a certain length?
>>>>>>>> 
>>>>>>>> Like for example, I want to query the field subject that
has more
>>> than
>>>>>>> 255
>>>>>>>> bytes. Is it possible?
>>>>>>>> 
>>>>>>>> I am currently using Solr 6.5.1.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Edwin
>>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>> 


Mime
View raw message