drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: WARC files
Date Tue, 17 Jan 2017 18:41:29 GMT
I believe what you you need is a format plugin.

Once you manage to read a file and populate drill's internal data
structures(value vectors), then the format of the file no longer comes into
picture. So from here on you can use any sql operators (filter, join etc)
or UDF's

To my knowledge there is no format plugin available for drill to read WARC
files. However if hive supports reading WARC files, then you can use drill
and query them through the hive plugin for better query runtimes.

- Rahul

On Mon, Jan 16, 2017 at 7:05 PM, Bob Rudis <bob@rud.is> wrote:

> Hey folks,
>
> Does anyone know if there have been UDFs made to enable working with
> WARC files in Drill?
>
> WARC: http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml
>
> thx,
>
> -Bob
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message