spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasad Bhalerao <prasadbhalerao1...@gmail.com>
Subject Re: reporting use case
Date Thu, 04 Apr 2019 19:23:29 GMT
Hi ,

I can create a view on these tables but the thing is I am going to need
almost every column from these tables and I have faced issues with oracle
views on such a large tables which involves joins. Some how oracle used to
choose not so correct execution plan.

Can you please tell me how creating a views will help in this scenario?

Can you please tell if I am thinking in right direction?

I have two challenges
1) First to load 2-4 TB of data in spark very quickly.
2) And then keep this data updated in spark whenever DB updates are done.

Thanks,
Prasad

On Fri, Apr 5, 2019 at 12:35 AM Jason Nerothin <jasonnerothin@gmail.com>
wrote:

> Hi Prasad,
>
> Could you create an Oracle-side view that captures only the relevant
> records and the use Spark JDBC connector to load the view into Spark?
>
> On Thu, Apr 4, 2019 at 1:48 PM Prasad Bhalerao <
> prasadbhalerao1983@gmail.com> wrote:
>
>> Hi,
>>
>> I am exploring spark for my Reporting application.
>> My use case is as follows...
>> I have 4-5 oracle tables which contains more than 1.5 billion rows. These
>> tables are updated very frequently every day. I don't have choice to change
>> database technology. So this data is going to remain in Oracle only.
>> To generate 1 report, on an average 15 - 50 million rows has to be
>> fetched from oracle tables. These rows contains some blob columns. Most of
>> the time is spent in fetching these many rows from db over the network.
>> Data processing is not that complex. Currently these report takes around
>> 3-8 hours to complete. I trying to speed up this report generation process.
>>
>> Can use spark as a caching layer in this case to avoid fetching data from
>> oracle over the network every time? I am thinking to submit a spark job for
>> each report request and use spark SQL to fetch the data and then process it
>> and write to a file? I trying to use kind of data locality in this case.
>>
>> Whenever a data is updated in oracle tables can I refresh the data in
>> spark storage? I can get the update feed using messaging technology.
>>
>> Can some one from community help me with this?
>> Suggestions are welcome.
>>
>>
>> Thanks,
>> Prasad
>>
>>
>>
>> Thanks,
>> Prasad
>>
>
>
> --
> Thanks,
> Jason
>

Mime
View raw message