manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marisol Redondo <marisol.redondo.gar...@gmail.com>
Subject UTF-8 Format from Confluence to Solr
Date Wed, 31 May 2017 09:13:32 GMT
Hi.

I'm having problems with the encoding when injecting in Solr 6 in
standalone mode from a Confluence wiki.

I have Manifold 2.5 with Tomcat-8.

The repository connector from the job take the information from a
Confluence wiki and the output connector is Solr, using the Tika
transformation, a custom transformation and a Metadata adjuster.

When the document is injected into solr, the content of the document has
some character that shouldn't be there because are not in the confluence
page, mainly a  character.

I have checked that confluence, the tomcat server when manifold is running,
the http request to confluence has the Accept-Charset header set to UTF-8,
the solr server is acepting UTF8.

In the log, I have seen that when retrieving the information from
confluence, the content is fine, and when it's sending the information to
solr, it has the character. I have tried without using any transfomer and
getting the same log entry.

Is this a bug or how can I resolve this?

Thanks for your help

Mime
View raw message