spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Loading data into Hbase table throws NoClassDefFoundError: org/apache/htrace/Trace error
Date Mon, 03 Oct 2016 16:56:31 GMT
That sounds interesting, would love to learn more about it.

Mitch: looks good. Lastly I would suggest you to think if you really need
multiple column families.
On 4 Oct 2016 02:57, "Benjamin Kim" <bbuild11@gmail.com> wrote:

> Lately, I’ve been experimenting with Kudu. It has been a much better
> experience than with HBase. Using it is much simpler, even from spark-shell.
>
> spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0
>
> It’s like going back to rudimentary DB systems where tables have just a
> primary key and the columns. Additional benefits include a home-grown spark
> package, fast upserts and table scans for analytics, time-series support
> just introduced, and (my favorite) simpler configuration and
> administration. It has just gone to version 1.0.0; so, I’m waiting for
> 1.0.1+ before I propose it as our HBase replacement for some bugs to shake
> out. All my performance tests have been stellar versus HBase especially
> with its simplicity.
>
> Just a thought…
>
> Cheers,
> Ben
>
>
> On Oct 3, 2016, at 8:40 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
> Hi,
>
> I decided to create a composite key *ticker-date* from the csv file
>
> I just did some manipulation on CSV file
>
> export IFS=",";sed -i 1d tsco.csv; cat tsco.csv | while read a b c d e f;
> do echo "TSCO-$a,TESCO PLC,TSCO,$a,$b,$c,$d,$e,$f"; done > temp; mv -f temp
> tsco.csv
>
> Which basically takes the csv file, tells the shell that field separator
> IFS=",", drops the header, reads every field in every line (1,b,c ..),
> creates the composite key TSCO-$a, adds the stock name and ticker to the
> csv file. The whole process can be automated and parameterised.
>
> Once the csv file is put into HDFS then, I run the following command
>
> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW
> _KEY,stock_info:stock,stock_info:ticker,stock_daily:Date,sto
> ck_daily:open,stock_daily:high,stock_daily:low,stock_daily:c
> lose,stock_daily:volume" tsco hdfs://rhes564:9000/data/stocks/tsco.csv
>
> The Hbase table is created as below
>
> create 'tsco','stock_info','stock_daily'
>
> and this is the data (2 rows each 2 family and with 8 attributes)
>
> hbase(main):132:0> scan 'tsco', LIMIT => 2
> ROW                                                    COLUMN+CELL
>  TSCO-1-Apr-08
> column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-08
>  TSCO-1-Apr-08
> column=stock_daily:close, timestamp=1475507091676, value=405.25
>  TSCO-1-Apr-08
> column=stock_daily:high, timestamp=1475507091676, value=406.75
>  TSCO-1-Apr-08
> column=stock_daily:low, timestamp=1475507091676, value=379.25
>  TSCO-1-Apr-08
> column=stock_daily:open, timestamp=1475507091676, value=380.00
>  TSCO-1-Apr-08
> column=stock_daily:volume, timestamp=1475507091676, value=49664486
>  TSCO-1-Apr-08
> column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
>  TSCO-1-Apr-08
> column=stock_info:ticker, timestamp=1475507091676, value=TSCO
>
>  TSCO-1-Apr-09
> column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-09
>  TSCO-1-Apr-09
> column=stock_daily:close, timestamp=1475507091676, value=333.30
>  TSCO-1-Apr-09
> column=stock_daily:high, timestamp=1475507091676, value=334.60
>  TSCO-1-Apr-09
> column=stock_daily:low, timestamp=1475507091676, value=326.50
>  TSCO-1-Apr-09
> column=stock_daily:open, timestamp=1475507091676, value=331.10
>  TSCO-1-Apr-09
> column=stock_daily:volume, timestamp=1475507091676, value=24877341
>  TSCO-1-Apr-09
> column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
>  TSCO-1-Apr-09
> column=stock_info:ticker, timestamp=1475507091676, value=TSCO
>
> Any suggestions
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 3 October 2016 at 14:42, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> or may be add ticker+date like similar
>>
>>
>> <image.png>
>>
>> So the new row key would be TSCO-1-Apr-08
>>
>> and this will be added as row key. Both Date and ticker will stay as they
>> are as column family attributes?
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 3 October 2016 at 14:32, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>> wrote:
>>
>>> with ticker+date I can c reate something like below for row key
>>>
>>> TSCO_1-Apr-08
>>>
>>>
>>> or TSCO1-Apr-08
>>>
>>> if I understood you correctly
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 3 October 2016 at 13:13, ayan guha <guha.ayan@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Looks like you are saving to new.csv but still loading tsco.csv? Its
>>>> definitely the header.
>>>>
>>>> Suggestion: ticker+date as row key has following benefits:
>>>>
>>>> 1. using ticker+date as row key will enable you to hold multiple ticker
>>>> in this single hbase table. (Think composite primary key)
>>>> 2. Using date itself as row key will lead to hotspots (Look up
>>>> hotspoting due to monotonically increasing row key). To distribute the
>>>> load, it is suggested to use a salting. Ticker can be used as a natural
>>>> salt in this case.
>>>> 3. Also, you may want to hash the rowkey value to give it little more
>>>> flexible (Think surrogate key).
>>>>
>>>>
>>>>
>>>> On Mon, Oct 3, 2016 at 10:17 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Hi Ayan,
>>>>>
>>>>> Sounds like the row key has to be unique much like a primary key in
>>>>> RDBMS
>>>>>
>>>>> This is what I download as a csv for stock from Google Finance
>>>>>
>>>>>   Date Open High Low Close Volume
>>>>> 27-Sep-16 177.4 177.75 172.5 177.75 24117196
>>>>>
>>>>>
>>>>> So What I do I add the stock and ticker myself to end of the row via
>>>>> shell script and get rid of header
>>>>>
>>>>> sed -i 1d tsco.csv; cat tsco.csv|awk '{print $0,",TESCO PLC,TSCO"}' >
>>>>> new.csv
>>>>>
>>>>> The New table has two column families: stock_price, stock_info and row
>>>>> key date (one row per date)
>>>>>
>>>>> This creates a new csv file with two additional columns appended to
>>>>> the end of each line
>>>>>
>>>>> Then I run the following command
>>>>>
>>>>> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
>>>>> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
>>>>> stock_daily:open, stock_daily:high, stock_daily:low, stock_daily:close,
>>>>> stock_daily:volume, stock_info:stock, stock_info:ticker" tsco
>>>>> hdfs://rhes564:9000/data/stocks/tsco.csv
>>>>>
>>>>> This is in Hbase table for a given day
>>>>>
>>>>> hbase(main):090:0> scan 'tsco', LIMIT => 10
>>>>> ROW                                                    COLUMN+CELL
>>>>>  1-Apr-08
>>>>> column=stock_daily:close, timestamp=1475492248665, value=405.25
>>>>>  1-Apr-08
>>>>> column=stock_daily:high, timestamp=1475492248665, value=406.75
>>>>>  1-Apr-08
>>>>> column=stock_daily:low, timestamp=1475492248665, value=379.25
>>>>>  1-Apr-08
>>>>> column=stock_daily:open, timestamp=1475492248665, value=380.00
>>>>>  1-Apr-08
>>>>> column=stock_daily:volume, timestamp=1475492248665, value=49664486
>>>>>  1-Apr-08
>>>>> column=stock_info:stock, timestamp=1475492248665, value=TESCO PLC
>>>>>  1-Apr-08
>>>>> column=stock_info:ticker, timestamp=1475492248665, value=TSCO
>>>>>
>>>>>
>>>>> But I also have this at the bottom
>>>>>
>>>>>   Date
>>>>> column=stock_daily:close, timestamp=1475491189158, value=Close
>>>>>  Date
>>>>> column=stock_daily:high, timestamp=1475491189158, value=High
>>>>>  Date
>>>>> column=stock_daily:low, timestamp=1475491189158, value=Low
>>>>>  Date
>>>>> column=stock_daily:open, timestamp=1475491189158, value=Open
>>>>>  Date
>>>>> column=stock_daily:volume, timestamp=1475491189158, value=Volume
>>>>>  Date
>>>>> column=stock_info:stock, timestamp=1475491189158, value=TESCO PLC
>>>>>  Date
>>>>> column=stock_info:ticker, timestamp=1475491189158, value=TSCO
>>>>>
>>>>> Sounds like the table header?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>> On 3 October 2016 at 11:24, ayan guha <guha.ayan@gmail.com> wrote:
>>>>>
>>>>>> I am not well versed with importtsv, but you can create a CSV file
>>>>>> using a simple spark program to create first column as ticker+tradedate.
I
>>>>>> remember doing similar manipulation to create row key format in pig.
>>>>>> On 3 Oct 2016 20:40, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Ayan,
>>>>>>>
>>>>>>> How do you specify ticker+rtrade as row key in the below
>>>>>>>
>>>>>>> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
>>>>>>> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
>>>>>>> stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:h
>>>>>>> igh,stock_daily:low,stock_daily:close,stock_daily:volume" tsco
>>>>>>> hdfs://rhes564:9000/data/stocks/tsco.csv
>>>>>>>
>>>>>>> I always thought that Hbase will take the first column as row
key so
>>>>>>> it takes stock as the row key which is tsco plc for every row!
>>>>>>>
>>>>>>> Does row key need to be unique?
>>>>>>>
>>>>>>> cheers
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property
which may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary
damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 3 October 2016 at 10:30, ayan guha <guha.ayan@gmail.com>
wrote:
>>>>>>>
>>>>>>>> Hi Mitch
>>>>>>>>
>>>>>>>> It is more to do with hbase than spark.
>>>>>>>>
>>>>>>>> Row key can be anything, yes but essentially what you are
doing is
>>>>>>>> insert and update tesco PLC row. Given your schema, ticker+trade
date seems
>>>>>>>> to be a good row key
>>>>>>>> On 3 Oct 2016 18:25, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> thanks again.
>>>>>>>>>
>>>>>>>>> I added that jar file to the classpath and that part
worked.
>>>>>>>>>
>>>>>>>>> I was using spark shell so I have to use spark-submit
for it to be
>>>>>>>>> able to interact with map-reduce job.
>>>>>>>>>
>>>>>>>>> BTW when I use the command line utility ImportTsv  to
load a file
>>>>>>>>> into Hbase with the following table format
>>>>>>>>>
>>>>>>>>> describe 'marketDataHbase'
>>>>>>>>> Table marketDataHbase is ENABLED
>>>>>>>>> marketDataHbase
>>>>>>>>> COLUMN FAMILIES DESCRIPTION
>>>>>>>>> {NAME => 'price_info', BLOOMFILTER => 'ROW', VERSIONS
=> '1',
>>>>>>>>> IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',
DATA_BLOCK_ENCODING =>
>>>>>>>>> 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', BLOCKC
>>>>>>>>> ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE
=> '0'}
>>>>>>>>> 1 row(s) in 0.0930 seconds
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
>>>>>>>>> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
>>>>>>>>> stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:h
>>>>>>>>> igh,stock_daily:low,stock_daily:close,stock_daily:volume"
tsco
>>>>>>>>> hdfs://rhes564:9000/data/stocks/tsco.csv
>>>>>>>>>
>>>>>>>>> There are with 1200 rows in the csv file,* but it only
loads the
>>>>>>>>> first row!*
>>>>>>>>>
>>>>>>>>> scan 'tsco'
>>>>>>>>> ROW                                                 
  COLUMN+CELL
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:close, timestamp=1475447365118, value=325.25
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:high, timestamp=1475447365118, value=332.00
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:low, timestamp=1475447365118, value=324.00
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:open, timestamp=1475447365118, value=331.75
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:ticker, timestamp=1475447365118, value=TSCO
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:tradedate, timestamp=1475447365118,
value= 3-Jan-06
>>>>>>>>>  Tesco PLC
>>>>>>>>> column=stock_daily:volume, timestamp=1475447365118, value=46935045
>>>>>>>>> 1 row(s) in 0.0390 seconds
>>>>>>>>>
>>>>>>>>> Is this because the hbase_row_key --> Tesco PLC is
the same for
>>>>>>>>> all? I thought that the row key can be anything.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>
>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>>> for any loss, damage or destruction of data or any other
property which may
>>>>>>>>> arise from relying on this email's technical content
is explicitly
>>>>>>>>> disclaimed. The author will in no case be liable for
any monetary damages
>>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 3 October 2016 at 07:44, Benjamin Kim <bbuild11@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> We installed Apache Spark 1.6.0 at the time alongside
CDH 5.4.8
>>>>>>>>>> because Cloudera only had Spark 1.3.0 at the time,
and we wanted to use
>>>>>>>>>> Spark 1.6.0’s features. We borrowed the /etc/spark/conf/spark-env.sh
file
>>>>>>>>>> that Cloudera generated because it was customized
to add jars first from
>>>>>>>>>> paths listed in the file /etc/spark/conf/classpath.txt.
So, we entered the
>>>>>>>>>> path for the htrace jar into the /etc/spark/conf/classpath.txt
file. Then,
>>>>>>>>>> it worked. We could read/write to HBase.
>>>>>>>>>>
>>>>>>>>>> On Oct 2, 2016, at 12:52 AM, Mich Talebzadeh <
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Thanks Ben
>>>>>>>>>>
>>>>>>>>>> The thing is I am using Spark 2 and no stack from
CDH!
>>>>>>>>>>
>>>>>>>>>> Is this approach to reading/writing to Hbase specific
to Cloudera?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction
of data or any other
>>>>>>>>>> property which may arise from relying on this email's
technical content is
>>>>>>>>>> explicitly disclaimed. The author will in no case
be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or
destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1 October 2016 at 23:39, Benjamin Kim <bbuild11@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Mich,
>>>>>>>>>>>
>>>>>>>>>>> I know up until CDH 5.4 we had to add the HTrace
jar to the
>>>>>>>>>>> classpath to make it work using the command below.
But after upgrading to
>>>>>>>>>>> CDH 5.7, it became unnecessary.
>>>>>>>>>>>
>>>>>>>>>>> echo "/opt/cloudera/parcels/CDH/jar
>>>>>>>>>>> s/htrace-core-3.2.0-incubating.jar" >>
>>>>>>>>>>> /etc/spark/conf/classpath.txt
>>>>>>>>>>>
>>>>>>>>>>> Hope this helps.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Ben
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Oct 1, 2016, at 3:22 PM, Mich Talebzadeh <
>>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Trying bulk load using Hfiles in Spark as below
example:
>>>>>>>>>>>
>>>>>>>>>>> import org.apache.spark._
>>>>>>>>>>> import org.apache.spark.rdd.NewHadoopRDD
>>>>>>>>>>> import org.apache.hadoop.hbase.{HBaseConfiguration,
>>>>>>>>>>> HTableDescriptor}
>>>>>>>>>>> import org.apache.hadoop.hbase.client.HBaseAdmin
>>>>>>>>>>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>>>>>>>>>> import org.apache.hadoop.fs.Path;
>>>>>>>>>>> import org.apache.hadoop.hbase.HColumnDescriptor
>>>>>>>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>>>>>>>> import org.apache.hadoop.hbase.client.Put;
>>>>>>>>>>> import org.apache.hadoop.hbase.client.HTable;
>>>>>>>>>>> import org.apache.hadoop.hbase.mapred.TableOutputFormat
>>>>>>>>>>> import org.apache.hadoop.mapred.JobConf
>>>>>>>>>>> import org.apache.hadoop.hbase.io.ImmutableBytesWritable
>>>>>>>>>>> import org.apache.hadoop.mapreduce.Jo
>>>>>>>>>>> <http://org.apache.hadoop.mapreduce.jo/>b
>>>>>>>>>>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
>>>>>>>>>>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
>>>>>>>>>>> import org.apache.hadoop.hbase.KeyValue
>>>>>>>>>>> import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat
>>>>>>>>>>> import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
>>>>>>>>>>>
>>>>>>>>>>> So far no issues.
>>>>>>>>>>>
>>>>>>>>>>> Then I do
>>>>>>>>>>>
>>>>>>>>>>> val conf = HBaseConfiguration.create()
>>>>>>>>>>> conf: org.apache.hadoop.conf.Configuration =
Configuration:
>>>>>>>>>>> core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml,
>>>>>>>>>>> yarn-default.xml, yarn-site.xml, hbase-default.xml,
hbase-site.xml
>>>>>>>>>>> val tableName = "testTable"
>>>>>>>>>>> tableName: String = testTable
>>>>>>>>>>>
>>>>>>>>>>> ...

Mime
View raw message