gora-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alfonso Nishikawa <alfonso.nishik...@gmail.com>
Subject Re: Nutch + Gora + Hbase client ( BigTable )
Date Tue, 07 Nov 2017 00:38:01 GMT
Hi, Akshar.

I usually use Eclipse. If you use it, it is quite simple, since the project
is mavenized.
You have to clone the repository https://github.com/apache/gora.git
After that, if you decide to stick to version 0.8, checkout that tag: git
checkout tags/apache-gora-0.8
(but you can work on master)

In Eclipse:

    File >Import... > Maven > Existing Maven Projects

And you browse the project folder and click Accept (OK or whatever, to
select that path). Then select all projects and click Finish. It will
import the main maven project and subprojects.
You can close all of them, except gora-core and gora-hbase.

Modify gora-hbase to your needs:

* I am assuming the bigtable-hbase library is a substitute for hbase-client
(as a proxy)?

  <root>/pom.xml has declared the versions of all dependencies, so you can
add there the bigtable-hbase-1.x-hadoop one (as in my first answer)
  <root>/gora-hbase/pom.xml declares what dependencies the module uses, but
without the version. Here I guess you have to substitute hbase-client with
the bigtable one.

* Modify HBaseTableConnection

After finishing your modifications, you can install gora-hbase in the local
maven repository, so Nutch will pickup your version:

1- Execute on console: mvn -DskipTests -pl gora-hbase -am install
2- Delete ivy cache (I don't remember exactly the folder to delete, I
always forget it since I only use ivy with Nutch).
3- Compile Nutch again so it picks your compiled Gora.

I am telling by memory, so if some step is wrong let me know. And any
question here we are.
About mvn execution, as you can see I am skipping  the tests, since they
bring up a HBase standalone instance and test against it, so will not work
with you.

Thanks to you for the try.



2017-11-06 21:26 GMT-01:00 SJC Multimedia <sjcmultimedia@gmail.com>:

> Thanks for the suggestion. Very interested in trying it out. Can you
> please suggest step need to build gora from source so that I can modify
> HBaseTableConnection?
> I already have dependency for bigtable and hbase-common 1.2.3 in my ivy
> file.
> Thanks
> Akshar
> On Tue, Oct 31, 2017 at 12:27 PM, Alfonso Nishikawa <
> alfonso.nishikawa@gmail.com> wrote:
>> Hi, Akshar.
>> Much probably you are the first one in do what you are trying. I never
>> used Google Cloud Platform, but in case there is no answer to your
>> question, my only suggestion would be to clone the repository [1], try with
>> the bigtable dependency:
>>       <dependency>
>>         <groupId>com.google.cloud.bigtable</groupId>
>>         <artifactId>bigtable-hbase-1.x-hadoop</artifactId>
>>         <version>1.0.0-pre3</version>
>>       </dependency>
>> and add some "catch" at HBaseTableConnection class [2] to see what is
>> happening there.
>> I know this is not a solution, but I am at your disposal for any question
>> about this approach (when I know the answer, of course).
>> [1] https://github.com/apache/gora/tree/apache-gora-0.8
>> [2] https://github.com/apache/gora/blob/apache-gora-0.8/gora-
>> hbase/src/main/java/org/apache/gora/hbase/store/HBaseTableCo
>> nnection.java#L115
>> Regars,
>> Alfonso Nishikawa
>> 2017-10-30 17:08 GMT-01:00 SJC Multimedia <sjcmultimedia@gmail.com>:
>>> Hi
>>> I am trying out Google BigTable as a nutch backend for which there is no
>>> official documentation that its supported. However I dont see any reason
>>> why it would be not be possible so I am giving it a shot.
>>> I have upgraded Gora to 0.8 version with Nutch 2.3.1 and JDK to 1.8.
>>> Currently while utilizing *bigtable-hbase-1.x-hadoop-1.0.0-pre3.jar *version,
>>> call to Bigtable fails while performing flushCommits as part of inject
>>> operation. I do see the table getting created on the BigTable side but the
>>> table is empty.
>>> The exception by itself is not enough to give us an answer.  The
>>> UnsupportedOperationException is a bit strange.  I'm not sure where
>>> that's coming from.  Here
>>> <https://cloud.google.com/bigtable/docs/hbase-batch-exceptions>'s a
>>> guide on getting more information from a RetriesExhaustedWithDetailsException,
>>> since neither Gora or BigtableBufferedMutator are under our control.
>>> This seems like a client-side thing, so this is likely some strange
>>> interaction between BigTable library and Gora.
>>> *Any suggestion on how exactly to figure out what is the issue here?*
>>> Here is grpc session info:
>>> 2017-10-27 17:37:51,462 INFO  grpc.BigtableSession - Bigtable options:
>>> BigtableOptions{dataHost=bigtable.googleapis.com, tableAdminHost=
>>> bigtableadmin.googleapis.com, instanceAdminHost=bigtableadmi
>>> n.googleapis.com, projectId=xxxxxx-dev, instanceId=big-table-nutch-test,
>>> userAgent=hbase-1.2.0-cdh5.13.0, credentialType=DefaultCredentials,
>>> port=443, dataChannelCount=20, retryOptions=RetryOptions{retriesEnabled=true,
>>> allowRetriesWithoutTimestamp=false, statusToRetryOn=[INTERNAL,
>>> initialBackoffMillis=5, maxElapsedBackoffMillis=60000,
>>> backoffMultiplier=2.0, streamingBufferSize=60,
>>> readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3},
>>> bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true,
>>> bulkMaxKeyCount=25, bulkMaxRequestSize=1048576, autoflushMs=0,
>>> maxInflightRpcs=1000, maxMemory=93218406, enableBulkMutationThrottling=false,
>>> bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false,
>>> shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000},
>>> usePlaintextNegotiation=false}.
>>> Getting following error:
>>> 2017-10-27 17:37:51,660 ERROR store.HBaseStore - Failed 1 action:
>>> UnsupportedOperationException: 1 time, servers with issues:
>>> bigtable.googleapis.com,
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>>> Failed 1 action: UnsupportedOperationException: 1 time, servers with
>>> issues: bigtable.googleapis.com,
>>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.hand
>>> leExceptions(BigtableBufferedMutator.java:271)
>>> at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.muta
>>> te(BigtableBufferedMutator.java:198)
>>> at org.apache.gora.hbase.store.HBaseTableConnection.flushCommit
>>> s(HBaseTableConnection.java:115)
>>> at org.apache.gora.hbase.store.HBaseTableConnection.close(HBase
>>> TableConnection.java:127)
>>> at org.apache.gora.hbase.store.HBaseStore.close(HBaseStore.java:819)
>>> at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordW
>>> riter.java:56)
>>> at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.cl
>>> ose(MapTask.java:647)
>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>> at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.
>>> run(LocalJobRunner.java:243)
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Executor
>>> s.java:511)
>>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>>> Executor.java:1149)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>>> lExecutor.java:624)
>>> at java.lang.Thread.run(Thread.java:748)
>>> Thanks
>>> Akshar

View raw message