spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: OOM for HiveFromSpark example
Date Thu, 26 Mar 2015 08:37:48 GMT
When you run it in local mode ^^

Thanks
Best Regards

On Thu, Mar 26, 2015 at 2:06 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> I don;t think thats correct. load data local should pick input from local
> directory.
>
> On Thu, Mar 26, 2015 at 1:59 PM, Akhil Das <akhil@sigmoidanalytics.com>
> wrote:
>
>> Not sure, but you can create that path in all workers and put that file
>> in it.
>>
>> Thanks
>> Best Regards
>>
>> On Thu, Mar 26, 2015 at 1:56 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>> wrote:
>>
>>> The Hive command
>>>
>>> LOAD DATA LOCAL INPATH
>>> '/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt'
>>> INTO TABLE src_spark
>>>
>>> 1. LOCAL INPATH. if i push to HDFS then how will it work ?
>>>
>>> 2. I cant use sc.addFile, cause i want to run Hive (Spark SQL) queries.
>>>
>>> On Thu, Mar 26, 2015 at 1:41 PM, Akhil Das <akhil@sigmoidanalytics.com>
>>> wrote:
>>>
>>>> Now its clear that the workers are not having the file kv1.txt in their
>>>> local filesystem. You can try putting that in hdfs and use the URI to that
>>>> file or try adding the file with sc.addFile
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Thu, Mar 26, 2015 at 1:38 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>> wrote:
>>>>
>>>>> Does not work
>>>>>
>>>>> 15/03/26 01:07:05 INFO HiveMetaStore.audit: ugi=dvasthimal
>>>>> ip=unknown-ip-addr cmd=get_table : db=default tbl=src_spark
>>>>> 15/03/26 01:07:06 ERROR ql.Driver: FAILED: SemanticException Line 1:23
>>>>> Invalid path
>>>>> ''/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt'':
>>>>> No files matching path
>>>>> file:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt
>>>>> org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23 Invalid
>>>>> path
>>>>> ''/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt'':
>>>>> No files matching path
>>>>> file:/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/examples/src/main/resources/kv1.txt
>>>>> at
>>>>> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.applyConstraints(LoadSemanticAnalyzer.java:142)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.analyzeInternal(LoadSemanticAnalyzer.java:233)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>>>>> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
>>>>>
>>>>>
>>>>>
>>>>> Does the input file needs to be passed to executor via -- jars ?
>>>>>
>>>>> On Thu, Mar 26, 2015 at 12:15 PM, Akhil Das <
>>>>> akhil@sigmoidanalytics.com> wrote:
>>>>>
>>>>>> Try to give the complete path to the file kv1.txt.
>>>>>> On 26 Mar 2015 11:48, "ÐΞ€ρ@Ҝ (๏̯͡๏)" <deepujain@gmail.com>
wrote:
>>>>>>
>>>>>>> I am now seeing this error.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 15/03/25 19:44:03 ERROR yarn.ApplicationMaster: User class threw
>>>>>>> exception: FAILED: SemanticException Line 1:23 Invalid path
>>>>>>> ''examples/src/main/resources/kv1.txt'': No files matching path
>>>>>>> file:/hadoop/10/scratch/local/usercache/dvasthimal/appcache/application_1426715280024_89893/container_1426715280024_89893_01_000002/examples/src/main/resources/kv1.txt
>>>>>>>
>>>>>>> org.apache.spark.sql.execution.QueryExecutionException: FAILED:
>>>>>>> SemanticException Line 1:23 Invalid path
>>>>>>> ''examples/src/main/resources/kv1.txt'': No files matching path
>>>>>>> file:/hadoop/10/scratch/local/usercache/dvasthimal/appcache/application_1426715280024_89893/container_1426715280024_89893_01_000002/examples/src/main/resources/kv1.txt
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:312)
>>>>>>>
>>>>>>> at
>>>>>>> org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:280)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -sh-4.1$ pwd
>>>>>>>
>>>>>>> /home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4
>>>>>>>
>>>>>>> -sh-4.1$ ls examples/src/main/resources/kv1.txt
>>>>>>>
>>>>>>> examples/src/main/resources/kv1.txt
>>>>>>>
>>>>>>> -sh-4.1$
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Mar 26, 2015 at 8:08 AM, Zhan Zhang <zzhang@hortonworks.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>  You can do it in $SPARK_HOME/conf/spark-defaults.con
>>>>>>>>
>>>>>>>>  spark.driver.extraJavaOptions -XX:MaxPermSize=512m
>>>>>>>>
>>>>>>>>  Thanks.
>>>>>>>>
>>>>>>>>  Zhan Zhang
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Mar 25, 2015, at 7:25 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>  Where and how do i pass this or other JVM argument ?
>>>>>>>> -XX:MaxPermSize=512m
>>>>>>>>
>>>>>>>> On Wed, Mar 25, 2015 at 11:36 PM, Zhan Zhang <
>>>>>>>> zzhang@hortonworks.com> wrote:
>>>>>>>>
>>>>>>>>> I solve this by  increase the PermGen memory size in
driver.
>>>>>>>>>
>>>>>>>>>  -XX:MaxPermSize=512m
>>>>>>>>>
>>>>>>>>>  Thanks.
>>>>>>>>>
>>>>>>>>>  Zhan Zhang
>>>>>>>>>
>>>>>>>>>  On Mar 25, 2015, at 10:54 AM, ÐΞ€ρ@Ҝ (๏̯͡๏)
<deepujain@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  I am facing same issue, posted a new thread. Please
respond.
>>>>>>>>>
>>>>>>>>> On Wed, Jan 14, 2015 at 4:38 AM, Zhan Zhang <
>>>>>>>>> zzhang@hortonworks.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Folks,
>>>>>>>>>>
>>>>>>>>>> I am trying to run hive context in yarn-cluster mode,
but met
>>>>>>>>>> some error. Does anybody know what cause the issue.
>>>>>>>>>>
>>>>>>>>>> I use following cmd to build the distribution:
>>>>>>>>>>
>>>>>>>>>>  ./make-distribution.sh -Phive -Phive-thriftserver
 -Pyarn
>>>>>>>>>> -Phadoop-2.4
>>>>>>>>>>
>>>>>>>>>> 15/01/13 17:59:42 INFO cluster.YarnClusterScheduler:
>>>>>>>>>> YarnClusterScheduler.postStartHook done
>>>>>>>>>> 15/01/13 17:59:42 INFO storage.BlockManagerMasterActor:
>>>>>>>>>> Registering block manager cn122-10.l42scl.hortonworks.com:56157
>>>>>>>>>> with 1589.8 MB RAM, BlockManagerId(2,
>>>>>>>>>> cn122-10.l42scl.hortonworks.com, 56157)
>>>>>>>>>> 15/01/13 17:59:43 INFO parse.ParseDriver: Parsing
command: CREATE
>>>>>>>>>> TABLE IF NOT EXISTS src (key INT, value STRING)
>>>>>>>>>> 15/01/13 17:59:43 INFO parse.ParseDriver: Parse Completed
>>>>>>>>>> 15/01/13 17:59:44 INFO metastore.HiveMetaStore: 0:
Opening raw
>>>>>>>>>> store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
>>>>>>>>>> 15/01/13 17:59:44 INFO metastore.ObjectStore: ObjectStore,
>>>>>>>>>> initialize called
>>>>>>>>>> 15/01/13 17:59:44 INFO DataNucleus.Persistence: Property
>>>>>>>>>> datanucleus.cache.level2 unknown - will be ignored
>>>>>>>>>> 15/01/13 17:59:44 INFO DataNucleus.Persistence: Property
>>>>>>>>>> hive.metastore.integral.jdo.pushdown unknown - will
be ignored
>>>>>>>>>> 15/01/13 17:59:44 WARN DataNucleus.Connection: BoneCP
specified
>>>>>>>>>> but not present in CLASSPATH (or one of dependencies)
>>>>>>>>>> 15/01/13 17:59:44 WARN DataNucleus.Connection: BoneCP
specified
>>>>>>>>>> but not present in CLASSPATH (or one of dependencies)
>>>>>>>>>> 15/01/13 17:59:52 INFO metastore.ObjectStore: Setting
MetaStore
>>>>>>>>>> object pin classes with
>>>>>>>>>> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
>>>>>>>>>> 15/01/13 17:59:52 INFO metastore.MetaStoreDirectSql:
MySQL check
>>>>>>>>>> failed, assuming we are not on mysql: Lexical error
at line 1, column 5.
>>>>>>>>>> Encountered: "@" (64), after : "".
>>>>>>>>>> 15/01/13 17:59:53 INFO DataNucleus.Datastore: The
class
>>>>>>>>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema"
is tagged as
>>>>>>>>>> "embedded-only" so does not have its own datastore
table.
>>>>>>>>>> 15/01/13 17:59:53 INFO DataNucleus.Datastore: The
class
>>>>>>>>>> "org.apache.hadoop.hive.metastore.model.MOrder" is
tagged as
>>>>>>>>>> "embedded-only" so does not have its own datastore
table.
>>>>>>>>>> 15/01/13 17:59:59 INFO DataNucleus.Datastore: The
class
>>>>>>>>>> "org.apache.hadoop.hive.metastore.model.MFieldSchema"
is tagged as
>>>>>>>>>> "embedded-only" so does not have its own datastore
table.
>>>>>>>>>> 15/01/13 17:59:59 INFO DataNucleus.Datastore: The
class
>>>>>>>>>> "org.apache.hadoop.hive.metastore.model.MOrder" is
tagged as
>>>>>>>>>> "embedded-only" so does not have its own datastore
table.
>>>>>>>>>> 15/01/13 18:00:00 INFO metastore.ObjectStore: Initialized
>>>>>>>>>> ObjectStore
>>>>>>>>>> 15/01/13 18:00:00 WARN metastore.ObjectStore: Version
information
>>>>>>>>>> not found in metastore. hive.metastore.schema.verification
is not enabled
>>>>>>>>>> so recording the schema version 0.13.1aa
>>>>>>>>>> 15/01/13 18:00:01 INFO metastore.HiveMetaStore: Added
admin role
>>>>>>>>>> in metastore
>>>>>>>>>> 15/01/13 18:00:01 INFO metastore.HiveMetaStore: Added
public role
>>>>>>>>>> in metastore
>>>>>>>>>> 15/01/13 18:00:01 INFO metastore.HiveMetaStore: No
user is added
>>>>>>>>>> in admin role, since config is empty
>>>>>>>>>> 15/01/13 18:00:01 INFO session.SessionState: No Tez
session
>>>>>>>>>> required at this point. hive.execution.engine=mr.
>>>>>>>>>> 15/01/13 18:00:02 INFO log.PerfLogger: <PERFLOG
method=Driver.run
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:02 INFO log.PerfLogger: <PERFLOG
>>>>>>>>>> method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:02 INFO ql.Driver: Concurrency mode
is disabled,
>>>>>>>>>> not creating a lock manager
>>>>>>>>>> 15/01/13 18:00:02 INFO log.PerfLogger: <PERFLOG
method=compile
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: <PERFLOG
method=parse
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO parse.ParseDriver: Parsing
command: CREATE
>>>>>>>>>> TABLE IF NOT EXISTS src (key INT, value STRING)
>>>>>>>>>> 15/01/13 18:00:03 INFO parse.ParseDriver: Parse Completed
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: </PERFLOG
method=parse
>>>>>>>>>> start=1421190003030 end=1421190003031 duration=1
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: <PERFLOG
>>>>>>>>>> method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO parse.SemanticAnalyzer: Starting
Semantic
>>>>>>>>>> Analysis
>>>>>>>>>> 15/01/13 18:00:03 INFO parse.SemanticAnalyzer: Creating
table src
>>>>>>>>>> position=27
>>>>>>>>>> 15/01/13 18:00:03 INFO metastore.HiveMetaStore: 0:
get_table :
>>>>>>>>>> db=default tbl=src
>>>>>>>>>> 15/01/13 18:00:03 INFO HiveMetaStore.audit: ugi=zzhang
>>>>>>>>>> ip=unknown-ip-addr      cmd=get_table : db=default
tbl=src
>>>>>>>>>> 15/01/13 18:00:03 INFO metastore.HiveMetaStore: 0:
get_database:
>>>>>>>>>> default
>>>>>>>>>> 15/01/13 18:00:03 INFO HiveMetaStore.audit: ugi=zzhang
>>>>>>>>>> ip=unknown-ip-addr      cmd=get_database: default
>>>>>>>>>> 15/01/13 18:00:03 INFO ql.Driver: Semantic Analysis
Completed
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: </PERFLOG
>>>>>>>>>> method=semanticAnalyze start=1421190003031 end=1421190003406
duration=375
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO ql.Driver: Returning Hive
schema:
>>>>>>>>>> Schema(fieldSchemas:null, properties:null)
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: </PERFLOG
method=compile
>>>>>>>>>> start=1421190002998 end=1421190003416 duration=418
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: <PERFLOG
>>>>>>>>>> method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO ql.Driver: Starting command:
CREATE TABLE
>>>>>>>>>> IF NOT EXISTS src (key INT, value STRING)
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: </PERFLOG
>>>>>>>>>> method=TimeToSubmit start=1421190002995 end=1421190003421
duration=426
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: <PERFLOG
method=runTasks
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO log.PerfLogger: <PERFLOG
>>>>>>>>>> method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> 15/01/13 18:00:03 INFO exec.DDLTask: Default to LazySimpleSerDe
>>>>>>>>>> for table src
>>>>>>>>>> 15/01/13 18:00:05 INFO log.PerfLogger: </PERFLOG
>>>>>>>>>> method=Driver.execute start=1421190003416 end=1421190005498
duration=2082
>>>>>>>>>> from=org.apache.hadoop.hive.ql.Driver>
>>>>>>>>>> Exception in thread "Driver"
>>>>>>>>>> Exception: java.lang.OutOfMemoryError thrown from
the
>>>>>>>>>> UncaughtExceptionHandler in thread "Driver"
>>>>>>>>>> --
>>>>>>>>>> CONFIDENTIALITY NOTICE
>>>>>>>>>> NOTICE: This message is intended for the use of the
individual or
>>>>>>>>>> entity to
>>>>>>>>>> which it is addressed and may contain information
that is
>>>>>>>>>> confidential,
>>>>>>>>>> privileged and exempt from disclosure under applicable
law. If
>>>>>>>>>> the reader
>>>>>>>>>> of this message is not the intended recipient, you
are hereby
>>>>>>>>>> notified that
>>>>>>>>>> any printing, copying, dissemination, distribution,
disclosure or
>>>>>>>>>> forwarding of this communication is strictly prohibited.
If you
>>>>>>>>>> have
>>>>>>>>>> received this communication in error, please contact
the sender
>>>>>>>>>> immediately
>>>>>>>>>> and delete it from your system. Thank You.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>>>  Deepak
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>>  Deepak
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Deepak
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Deepak
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>
>
> --
> Deepak
>
>

Mime
View raw message