phoenix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiannis Gkoufas <johngou...@gmail.com>
Subject Re: Question about IndexTool
Date Wed, 16 Sep 2015 08:32:55 GMT
Hi Gabriel,

thanks a lot for the reply. I noticed my self afterwards that it does a
rollback on every upsert and then extracts the KeyValues.
Basically I am trying to replicate the same job but in Spark and I cannot
understand where in the existing source code of IndexTool is guaranteed
that the row keys written in the HFiles are in the correct order.
I have been getting errors "Added a key not lexically larger than previous
key"

Thanks a lot!


On 15 September 2015 at 19:46, Gabriel Reid <gabriel.reid@gmail.com> wrote:

> The upsert statements in the MR jobs are used to convert data into the
> appropriate encoding for writing to an HFile -- the data doesn't actually
> get pushed to Phoenix from within the MR job. Instead, the created
> KeyValues are extracted from the "output" of the upsert statement, and the
> statement is rolled-back within the MR job. The extracted KeyValues are
> then written to the HFile.
>
> - Gabriel
>
> On Tue, Sep 15, 2015 at 2:12 PM Yiannis Gkoufas <johngouf85@gmail.com>
> wrote:
>
>> Hi there,
>>
>> I was going through the code related to index creation via MapReduce job
>> (IndexTool) and I have some questions.
>> If I am not mistaken, for a global secondary index Phoenix creates a new
>> HBase table which has the appropriate key (the column value of the original
>> table you want to index) and loads the column values you have in your
>> INCLUDE statement.
>> In the PhoenixIndexImportMapper I can see that an Upsert statement is
>> executed, but also HFiles are written.
>> My question is the following: why is the Upsert statement needed if the
>> table containing the secondary index will be populated from the HFiles
>> written?
>>
>> Thanks a lot
>>
>

Mime
View raw message