metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otto Fowler <ottobackwa...@gmail.com>
Subject Re: HDFS Compression
Date Tue, 11 Oct 2016 17:20:01 GMT
And also support the extensibility offered by STELLAR and enrichments, such
that adding new fields using either will not mean having to write
supporting java code etc.

Or from a higher level : The flexibility for configuration based enrichment
and modification of the data through ingest should not be lost for storage
requirements.

On October 11, 2016 at 13:13:43, Carolyn Duby (cduby@hortonworks.com) wrote:

The format should be compatible/optimal with spark and Zeppelin. Perhaps
other interactive BI tools like Tableau.

Thanks
Carolyn




On 10/11/16, 1:06 PM, "Nick Allen" <nick@nickallen.org> wrote:

>Right. The original idea is to do batch analytics. Kind of difficult to
>work with data sitting in an ES index. But if we get a better
understanding
>of the type of batch analytics, it might get us closer to the target.
>
>On Tue, Oct 11, 2016 at 1:03 PM, Zeolla@GMail.com <zeolla@gmail.com>
wrote:
>
>> I'm somewhat ignorant here, never having used the MaaS stuff yet, but
isn't
>> that the dataset that the models would run against? I understand there
>> could be additional use cases, I just wanted to be clear.
>>
>> Jon
>>
>> On Tue, Oct 11, 2016 at 1:01 PM Nick Allen <nick@nickallen.org> wrote:
>>
>> > I don't think we put much thought into how exactly the data should be
>> > landed in HDFS and for what use cases. It just has not been a
priority.
>> >
>> > That being said, this might be a good time to gather everyone's
thoughts
>> on
>> > how they would use that kind of data and for what purposes.
>> >
>> >
>> >
>> > On Tue, Oct 11, 2016 at 12:11 PM, Owen O'Malley <omalley@apache.org>
>> > wrote:
>> >
>> > > Be careful of using compressed JSON, since it isn't splittable. JSON
is
>> > > also very slow for reading.
>> > >
>> > > .. Owen
>> > >
>> > > On Tue, Oct 11, 2016 at 4:31 AM, Casey Stella <cestella@gmail.com>
>> > wrote:
>> > >
>> > > > I'd also tack on to this that the configuration for the hdfs
writer
>> > > should
>> > > > be moved to zookeeper rather than done in flux, IMO
>> > > > On Tue, Oct 11, 2016 at 07:20 Otto Fowler <ottobackwards@gmail.com>

>> > > wrote:
>> > > >
>> > > > > The storage format and retrieval from that format should be
>> > > configurable,
>> > > > > that is a ‘boundary’ for Metron so to speak.
>> > > > >
>> > > > > On October 10, 2016 at 16:15:12, Zeolla@GMail.com (
>> zeolla@gmail.com)
>> > > > > wrote:
>> > > > >
>> > > > > Is there a specific reason why the JSON files stored in HDFS
are
>> not
>> > > > > compressed? I looked for some related JIRAs and mail
conversations
>> > but
>> > > > > couldn't find this already mentioned. I'm wondering if there
was
a
>> > good
>> > > > > enough of an argument to keep things uncompressed, or if the
>> subject
>> > > just
>> > > > > hadn't been broached yet.
>> > > > >
>> > > > > Jon
>> > > > > --
>> > > > >
>> > > > > Jon
>> > > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Nick Allen <nick@nickallen.org>
>> >
>> --
>>
>> Jon
>>
>
>
>
>--
>Nick Allen <nick@nickallen.org>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message