drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carles Tarsà <carles.ta...@reviewpro.com>
Subject reading different content from sequence files
Date Thu, 24 Nov 2016 15:45:11 GMT
Hi,

I've trying Drill because it looks very promising but I've encountered 
some issues which I couldn't solve. I'm wondering if I'm not configuring 
something properly or if there's some bug.

The first issue is that I when try to read a Sequence file, the content 
that I get it's different from the one on the original file.

$ hadoop fs -text /user/ctarsa/esborram2.seq
16/11/24 16:27:37 WARN util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes 
where applicable
key0       value0
key1       value1
key10      value10
key        {"review":"{"author":"àéïöç"}"}
key 
{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"

When I try to read it back from DRILL

0: jdbc:drill:zk=local>  select (convert_from(binary_key,'UTF8')), 
(convert_from(binary_value,'UTF8')) from 
dfs.`hdfs:/user/ctarsa/esborram2.seq`;
+--------+--------+
| EXPR$0 | EXPR$1 |
+--------+--------+
| key0 | value0 |
| key1 | value1 |
| key10 | value10 |
| key | ${"review":"{"author":"àéïöç"}"} |
| key | 
��{"review":{"scrapedDate":1475060474000,"productReviewId":"1009214395780445","dataProviderId":643,"productInfoId":45782422,"approxPublishedDate":1465164000000,"firstScrapedDate":1475060474000,"externalId":"1009214395780445"

|
+--------+--------+
5 rows selected (0.308 seconds)

Notice that there are some extra characters, marked in red. Also notice 
that on the first rows the | don't seam to be aligned.

I've tried it in a Mac machine with the latest Drill (1.8.0) with hadoop 
2.6.0-cdh5.4.4 and also in a Linux box. I've also tried with different 
compressions (No compression, LZO, LZO Block, LZO Record) on the 
sequence file with no success.

Can you please help ?

Thanks,

Carles


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message