kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koushik Chitta <kchi...@microsoft.com.INVALID>
Subject RE: Kafka consumer to unzip stream of .gz files and read
Date Mon, 21 May 2018 22:59:23 GMT
You should read the message value as byte array rather than string .
Other Approach is , while producing you can use the kafka compression = GZIP to have similar

-----Original Message-----
From: mayur shah <mayurshah3112@gmail.com> 
Sent: Monday, May 21, 2018 1:50 AM
To: users@kafka.apache.org; dev@kafka.apache.org
Subject: Kafka consumer to unzip stream of .gz files and read

 HI Team,


I am facing one issue on kafka consumer using python hope you guys help us to resolve this

Kafka consumer to unzip stream of .gz files and read <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F50232186%2Fkafka-consumer-to-unzip-stream-of-gz-files-and-read&data=02%7C01%7Ckchitta%40microsoft.com%7Cf6bb56d82595416ead9508d5bef7e6c9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636624894296815698&sdata=3d0yQUtWTq8AcpzDs01jqDPh2EsPeIztlznJmLbT0ns%3D&reserved=0>

Kafka producer is sending .gz files but not able to decompress and read the files at the consumer
end. Getting error as "IOError: Not a gzipped file"

Producer -

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Airport < ~/Downloads/stocks.json.gz

Consumer -

import sys import gzipimport StringIOfrom kafka import KafkaConsumer

consumer = KafkaConsumer(KAFKA_TOPIC, bootstrap_servers=KAFKA_BROKERS)
    for message in consumer:
        f = StringIO.StringIO(message.value)
        gzip_f = gzip.GzipFile(fileobj=f)
        unzipped_content = gzip_f.read()
        content = unzipped_content.decode('utf8')
        print (content)except KeyboardInterrupt:

Error at consumer -

Traceback (most recent call last):
  File "consumer.py", line 18, in <module>
    unzipped_content = gzip_f.read()
  File "/usr/lib64/python2.6/gzip.py", line 212, in read
  File "/usr/lib64/python2.6/gzip.py", line 255, in _read
  File "/usr/lib64/python2.6/gzip.py", line 156, in _read_gzip_header
    raise IOError, 'Not a gzipped file'IOError: Not a gzipped file

View raw message