spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Efe Selcuk <efema...@gmail.com>
Subject "Schemaless" Spark
Date Fri, 19 Aug 2016 21:54:23 GMT
Hi Spark community,

This is a bit of a high level question as frankly I'm not well versed in
Spark or related tech.

We have a system in place that reads columnar data in through CSV and
represents the data in relational tables as it operates. It's essentially
schema-based ETL. This restricts our input data so we either have to
restrict what the data looks like coming in, or we have to transform and
map it to some relational representation before we work on it.

One of our goals with the Spark application we're building is to make our
input and operations more generic. So we can accept data in say JSON
format, operate on it without a schema, and output that way as well.

My question is on whether Spark supports this view and what facilities it
provides. Unless I've been interpreting things incorrectly, the various
data formats that spark operates on still assumes specified fields. I don't
know what this approach would look like in terms of data types, operations,
etc.

I realize that this is lacking in detail but I imagine this may be more of
a conversation than just an answer to a question.

Efe

Mime
View raw message