spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Piu <sebastian....@gmail.com>
Subject Re: "Schemaless" Spark
Date Fri, 19 Aug 2016 22:05:11 GMT
You can do operations without a schema just fine, obviously the more you
know about your data the more tools you will have, it is hard without more
context on what you are trying to achieve.

On Fri, 19 Aug 2016, 22:55 Efe Selcuk, <efeman92@gmail.com> wrote:

> Hi Spark community,
>
> This is a bit of a high level question as frankly I'm not well versed in
> Spark or related tech.
>
> We have a system in place that reads columnar data in through CSV and
> represents the data in relational tables as it operates. It's essentially
> schema-based ETL. This restricts our input data so we either have to
> restrict what the data looks like coming in, or we have to transform and
> map it to some relational representation before we work on it.
>
> One of our goals with the Spark application we're building is to make our
> input and operations more generic. So we can accept data in say JSON
> format, operate on it without a schema, and output that way as well.
>
> My question is on whether Spark supports this view and what facilities it
> provides. Unless I've been interpreting things incorrectly, the various
> data formats that spark operates on still assumes specified fields. I don't
> know what this approach would look like in terms of data types, operations,
> etc.
>
> I realize that this is lacking in detail but I imagine this may be more of
> a conversation than just an answer to a question.
>
> Efe
>

Mime
View raw message