spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Nerothin <>
Subject Re: reporting use case
Date Thu, 04 Apr 2019 19:05:36 GMT
Hi Prasad,

Could you create an Oracle-side view that captures only the relevant
records and the use Spark JDBC connector to load the view into Spark?

On Thu, Apr 4, 2019 at 1:48 PM Prasad Bhalerao <>

> Hi,
> I am exploring spark for my Reporting application.
> My use case is as follows...
> I have 4-5 oracle tables which contains more than 1.5 billion rows. These
> tables are updated very frequently every day. I don't have choice to change
> database technology. So this data is going to remain in Oracle only.
> To generate 1 report, on an average 15 - 50 million rows has to be fetched
> from oracle tables. These rows contains some blob columns. Most of the time
> is spent in fetching these many rows from db over the network. Data
> processing is not that complex. Currently these report takes around 3-8
> hours to complete. I trying to speed up this report generation process.
> Can use spark as a caching layer in this case to avoid fetching data from
> oracle over the network every time? I am thinking to submit a spark job for
> each report request and use spark SQL to fetch the data and then process it
> and write to a file? I trying to use kind of data locality in this case.
> Whenever a data is updated in oracle tables can I refresh the data in
> spark storage? I can get the update feed using messaging technology.
> Can some one from community help me with this?
> Suggestions are welcome.
> Thanks,
> Prasad
> Thanks,
> Prasad


View raw message