kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: encoding issue in kafka
Date Tue, 19 Jun 2012 05:14:48 GMT
It is definitely a problem in StringEncoder/StringDecoder. They are weirdly
packaged in with the Encoder.scala/Decoder.scala (maybe should be moved
out). I filed a JIRA for it and marked it with newbie label in case anyone
wants to take a shot:
https://issues.apache.org/jira/browse/KAFKA-367

-Jay

On Mon, Jun 18, 2012 at 6:32 PM, Chris Burroughs
<chris.burroughs@gmail.com>wrote:

> I thought there was another thread or ticket about this but I can't find
> it.  Did we ever get to the bottom of this?
>
> On 06/06/2012 05:45 PM, Roman Garcia wrote:
> > I guess having to set the platform encoding shouldn't be the fix, right?
> > If that's the case, then somewhere in the code the default encoding
> > (platform) is being used...that was the StringEncoder I understand.
> > Then again, I believe these were removed from trunk, right? (at least, I
> > can't seem to find em)
> > If they weren't removed (just moved outside), it should be considered to
> > start using the expected String constructor: String(bytes, charset)
> >
> > Regards,
> > Roman
> >
> > 2012/6/6 Patricio Echagüe <patricioe@gmail.com>
> >
> >> I ran into the same issue and setting -Dfile.encoding=UTF-8 in the
> startup
> >> script fixed it.
> >>
> >> On Wed, Jun 6, 2012 at 12:18 AM, 刘明敏 <diveintotomorrow@gmail.com>
> wrote:
> >>
> >>> sorry for the late reply, it is my fault
> >>>
> >>> after set -Dfile.encoding=UTF-8  when start up producer
> >>>
> >>> problem solved.
> >>>
> >>> On Sat, Jun 2, 2012 at 6:07 PM, 刘明敏 <diveintotomorrow@gmail.com>
> wrote:
> >>>
> >>>> we encountered an encoding issue when dealing with Chinese character
> >>>>
> >>>> the producer send characters in right encode(UTF-8),while after the
> >>>> consumer get it ,it all turns into question marks:????
> >>>>
> >>>> when start up producer,kafka broker server and consumer, we tried
> >>>> specified -Dfile.encoding=UTF-8,but it doesn't work
> >>>>
> >>>>
> >>>> In producer,we use StringEncoder,below is the snippet of producer:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>   val props = new Properties();
> >>>>
> >>>>
> >>>>
> >>>>   ...
> >>>>
> >>>>   props.put("serializer.class", "kafka.serializer.StringEncoder");
> >>>>
> >>>>
> >>>>   props.put("compression.codec", "1") //gzip
> >>>>
> >>>>
> >>>>
> >>>>   val producerConfig = new ProducerConfig(props);
> >>>>
> >>>>
> >>>>   val producer = new Producer[String, String](producerConfig);
> >>>>
> >>>>
> >>>>     val data = new ProducerData[String, String](topic, partitionKey,
> >>> List("string_to_send_to_borker"));
> >>>>
> >>>>
> >>>>
> >>>>   producer.send(data);
> >>>>
> >>>>
> >>>>
> >>>> and consumer:
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>     val topicMessageStreams =
> >>> consumerConnector.createMessageStreams(Predef.Map(topic -> consumers),
> >> new
> >>> StringDecoder)
> >>>>
> >>>>
> >>>>
> >>>>     for ((topic, streamList) <- topicMessageStreams) {
> >>>>
> >>>>
> >>>>       for (stream <- streamList) {
> >>>>
> >>>>
> >>>>         val processor = new StreamProcessor(stream)
> >>>>
> >>>>
> >>>>
> >>>>         new Thread(processor).start();
> >>>>
> >>>>
> >>>>       }
> >>>>
> >>>>
> >>>>     }
> >>>>
> >>>>
> >>>>
> >>>> and the StreamProcessor just iterate each streams
> >>>>
> >>>>
> >>>>   val message = iterator.next.message//chinese characters in message
> >>> turns into ?????
> >>>>
> >>>>
> >>>>
> >>>> Anyone any help?
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards
> >>>>
> >>>> ----------------------
> >>>> 刘明敏 | mmLiu
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Best Regards
> >>>
> >>> ----------------------
> >>> 刘明敏 | mmLiu
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message