Hi Spark community,

This is a bit of a high level question as frankly I'm not well versed in Spark or related tech.

We have a system in place that reads columnar data in through CSV and represents the data in relational tables as it operates. It's essentially schema-based ETL. This restricts our input data so we either have to restrict what the data looks like coming in, or we have to transform and map it to some relational representation before we work on it.

One of our goals with the Spark application we're building is to make our input and operations more generic. So we can accept data in say JSON format, operate on it without a schema, and output that way as well.

My question is on whether Spark supports this view and what facilities it provides. Unless I've been interpreting things incorrectly, the various data formats that spark operates on still assumes specified fields. I don't know what this approach would look like in terms of data types, operations, etc.

I realize that this is lacking in detail but I imagine this may be more of a conversation than just an answer to a question.

Efe