spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Affan Syed <as...@an10.io>
Subject Re: [External Sender] Having access to spark results
Date Thu, 25 Oct 2018 09:33:23 GMT
Femi,
We have a solution that needs to be both on-prem and also in the cloud.

Not sure how that impacts anything, what we want is to run an analytical
query on a large dataset (ours is over Cassandra) -- so batch in that
sense, but think on-demand --- and then have the result be entirely (not
first x number of rows) available for a web application to access the
results.

Web application work over a REST API, so while the query can be submitted
through something like Livy or the thrift-server, the concern is how do we
get the final result back to be useful.

I could think of two ways of doing that.

A  global temp table would work, but that would be first point --- it seems
a bit involved. My point was that, has someone solved that problem and run
through all the steps?


- Affan

ᐧ

On Thu, Oct 25, 2018 at 12:39 PM Femi Anthony <
olufemi.anthony@capitalone.com> wrote:

> What sort of environment are you running Spark on - in the cloud, on
> premise ? Is its a real-time or batch oriented application?
> Please provide more details.
> Femi
>
> On Thu, Oct 25, 2018 at 3:29 AM Affan Syed <asyed@an10.io> wrote:
>
>> Spark users,
>> We really would want to get an input here about how the results from a
>> Spark Query will be accessible to a web-application. Given Spark is a well
>> used in the industry I would have thought that this part would have lots of
>> answers/tutorials about it, but I didnt find anything.
>>
>> Here are a few options that come to mind
>>
>> 1) Spark results are saved in another DB ( perhaps a traditional one) and
>> a request for query returns the new table name for access through a
>> paginated query. That seems doable, although a bit convoluted as we need to
>> handle the completion of the query.
>>
>> 2) Spark results are pumped into a messaging queue from which a socket
>> server like connection is made.
>>
>> What confuses me is that other connectors to spark, like those for
>> Tableau, using something like JDBC should have all the data (not the top
>> 500 that we typically can get via Livy or other REST interfaces to Spark).
>> How do those connectors get all the data through a single connection?
>>
>>
>> Can someone with expertise help in bringing clarity.
>>
>> Thank you.
>>
>> Affan
>> ᐧ
>> ᐧ
>>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
View raw message