spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Need help in SparkSQL
Date Thu, 23 Jul 2015 12:00:22 GMT
Another typical solution is build a search using elasticsearch and use it
as  secondary index for hbase
On 23 Jul 2015 15:50, "Jörn Franke" <jornfranke@gmail.com> wrote:

> I do not think you can put all your queries into the row key without
> duplicating the data for each query. However, this would be more last
> resort.
>
> Have you checked out phoenix for Hbase? This might suit your needs. It
> makes it much simpler, because it provided sql on top of Hbase.
>
> Nevertheless, Hive could also be a viable alternative depending on how
> often you run queries etc
>
> Le jeu. 23 juil. 2015 à 7:14, Jeetendra Gangele <gangele397@gmail.com> a
> écrit :
>
>> Query will be something like that
>>
>> 1. how many users visited 1 BHK flat in last 1 hour in given particular
>> area
>> 2. how many visitor for flats in give area
>> 3. list all user who bought given property in last 30 days
>>
>> Further it may go too complex involving multiple parameters in my query.
>>
>> The problem is HBase is designing row key to get this data efficiently.
>>
>> Since I have multiple fields to query upon base may not be a good choice?
>>
>> i dont dont to iterate the result set which Hbase returns and give the
>> result because this will kill the performance?
>>
>> On 23 July 2015 at 01:02, Jörn Franke <jornfranke@gmail.com> wrote:
>>
>>> Can you provide an example of an and query ? If you do just look-up you
>>> should try Hbase/ phoenix, otherwise you can try orc with storage index
>>> and/or compression, but this depends on how your queries look like
>>>
>>> Le mer. 22 juil. 2015 à 14:48, Jeetendra Gangele <gangele397@gmail.com>
>>> a écrit :
>>>
>>>> HI All,
>>>>
>>>> I have data in MongoDb(few TBs) which I want to migrate to HDFS to do
>>>> complex queries analysis on this data.Queries like AND queries involved
>>>> multiple fields
>>>>
>>>> So my question in which which format I should store the data in HDFS so
>>>> that processing will be fast for such kind of queries?
>>>>
>>>>
>>>> Regards
>>>> Jeetendra
>>>>
>>>>
>>
>>
>> --
>> Hi,
>>
>> Find my attached resume. I have total around 7 years of work experience.
>> I worked for Amazon and Expedia in my previous assignments and currently
>> I am working with start- up technology company called Insideview in
>> hyderabad.
>>
>> Regards
>> Jeetendra
>>
>

Mime
View raw message