manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: UTF-8 Format from Confluence to Solr
Date Mon, 12 Jun 2017 18:52:23 GMT
Hi Marisol,

You can create a ticket from here:
https://issues.apache.org/jira/projects/CONNECTORS

Kind Regards,
Furkan KAMACI


12 Haz 2017 Pzt, saat 18:25 tarihinde Marisol Redondo <
marisol.redondo.garcia@gmail.com> şunu yazdı:

> How can I do that?
>
> On 1 June 2017 at 16:43, Antonio David Pérez Morales <
> adperezmorales@gmail.com> wrote:
>
>> Hi Marisol
>>
>> Could you mind to create a ticket and provide a patch?
>>
>> This way we can test it in our ends and include it for the next Manifold
>> release.
>>
>> Thanks
>>
>> Regards
>>
>> 2017-06-01 16:28 GMT+02:00 Marisol Redondo <
>> marisol.redondo.garcia@gmail.com>:
>>
>>> I fixed the problem.
>>>
>>> The problem is that the Confluence connector is getting the entity of
>>> the request with the default encoding ("ISO-8859-1"), and not UTF-8.
>>>
>>> To fix that, I made a change in the Confluence connector, and each time
>>> is reading the request's entity I use EntityUtils.toString(entity,
>>> *"UTF-8"*)
>>>
>>> Thanks
>>>
>>>
>>> On 31 May 2017 at 10:13, Marisol Redondo <
>>> marisol.redondo.garcia@gmail.com> wrote:
>>>
>>>> Hi.
>>>>
>>>> I'm having problems with the encoding when injecting in Solr 6 in
>>>> standalone mode from a Confluence wiki.
>>>>
>>>> I have Manifold 2.5 with Tomcat-8.
>>>>
>>>> The repository connector from the job take the information from a
>>>> Confluence wiki and the output connector is Solr, using the Tika
>>>> transformation, a custom transformation and a Metadata adjuster.
>>>>
>>>> When the document is injected into solr, the content of the document
>>>> has some character that shouldn't be there because are not in the
>>>> confluence page, mainly a  character.
>>>>
>>>> I have checked that confluence, the tomcat server when manifold is
>>>> running, the http request to confluence has the Accept-Charset header set
>>>> to UTF-8, the solr server is acepting UTF8.
>>>>
>>>> In the log, I have seen that when retrieving the information from
>>>> confluence, the content is fine, and when it's sending the information to
>>>> solr, it has the character. I have tried without using any transfomer and
>>>> getting the same log entry.
>>>>
>>>> Is this a bug or how can I resolve this?
>>>>
>>>> Thanks for your help
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message