jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabrizio Fortino (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (OAK-9123) Error: Document contains at least one immense term
Date Fri, 26 Jun 2020 16:50:00 GMT

     [ https://issues.apache.org/jira/browse/OAK-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Fabrizio Fortino resolved OAK-9123.
-----------------------------------
    Resolution: Fixed

Fixed with revision 1879243

> Error: Document contains at least one immense term
> --------------------------------------------------
>
>                 Key: OAK-9123
>                 URL: https://issues.apache.org/jira/browse/OAK-9123
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: elastic-search, indexing, search
>            Reporter: Fabrizio Fortino
>            Assignee: Fabrizio Fortino
>            Priority: Major
>
> {code:java}
> 11:35:09.400 [I/O dispatcher 1] ERROR o.a.j.o.p.i.e.i.ElasticIndexWriter - Bulk item
with id /wikipedia/76/84/National Palace (Mexico) failed
> org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=illegal_argument_exception,
reason=Document contains at least one immense term in field="text.keyword" (whose UTF8 encoding
is longer than the max length 32766), all of which were skipped. Please correct the analyzer
to not produce such terms. The prefix of the first immense term is: '[123, 123, 73, 110, 102,
111, 98, 111, 120, 32, 104, 105, 115, 116, 111, 114, 105, 99, 32, 98, 117, 105, 108, 100,
105, 110, 103, 10, 124, 110]...', original message: bytes can be at most 32766 in length;
got 33409]
> at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
> at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
> at org.elasticsearch.action.bulk.BulkItemResponse.fromXContent(BulkItemResponse.java:138)
> at org.elasticsearch.action.bulk.BulkResponse.fromXContent(BulkResponse.java:196)
> at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1888)
> at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAsyncAndParseEntity$10(RestHighLevelClient.java:1676)
> at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1758)
> at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:590)
> at org.elasticsearch.client.RestClient$1.completed(RestClient.java:333)
> at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327)
> at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122)
> at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181)
> at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448)
> at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338)
> at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265)
> at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
> at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
> at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114)
> at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162)
> at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337)
> at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
> at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276)
> at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
> at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=max_bytes_length_exceeded_exception,
reason=bytes can be at most 32766 in length; got 33409]
> at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:496)
> at org.elasticsearch.ElasticsearchException.fromXContent(ElasticsearchException.java:407)
> at org.elasticsearch.ElasticsearchException.innerFromXContent(ElasticsearchException.java:437)
> ... 24 common frames omitted{code}
> This happens with huge keyword fields since Lucene doesn't allow terms with more than
32k bytes.
> See [https://discuss.elastic.co/t/error-document-contains-at-least-one-immense-term-in-field/66486]
> We have decided to always create keyword fields to remove the need to specify properties
like ordered or facet. In this way every field can be sorted or used as facet.
> In this specific case the keyword field won't be needed at all but it would be hard to
decide when include it or not. To solve this we are going to use `ignore_above=256` so huge
keyword fields will be ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message