spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: JavaRDD.collect()
Date Sat, 25 Jan 2014 02:25:58 GMT
RDD.first() doesnt have to scan the whole partition. It gets only the first
item and returns it.
RDD.collect() has to scan the whole partition, collect all of it and send
all of it back (serialization + deserialization costs, etc.)

TD


On Fri, Jan 24, 2014 at 5:55 PM, Chen Jin <karen.cj@gmail.com> wrote:

> Hi All,
>
> I have some metadata saved as a single partition on HDFS (a few
> hundred bytes) and when I want to get the content of the data:
>
> JavaRDD blob = sc.textFile();
> List<String> lines = blob.collect();
>
> However, collect takes probably more than 3 seconds at least but
> first() only take 0.1 second,
>
> Could you advise on what's the best practice to read small files using
> spark.
>
> -chen
>
>
> On Fri, Jan 24, 2014 at 3:23 PM, Kapil Malik <kmalik@adobe.com> wrote:
> > Hi Andrew,
> >
> >
> >
> > Here's the exception I get while trying to build an OSGi bundle using
> maven SCR plugin -
> >
> >
> >
> > [ERROR] Failed to execute goal
> org.apache.felix:maven-scr-plugin:1.9.0:scr (generate-scr-scrdescriptor) on
> project repo-spark: Execution generate-scr-scrdescriptor of goal
> org.apache.felix:maven-scr-plugin:1.9.0:scr failed: Invalid signature file
> digest for Manifest main attributes -> [Help 1]
> >
> > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to
> execute goal org.apache.felix:maven-scr-plugin:1.9.0:scr
> (generate-scr-scrdescriptor) on project repo-spark: Execution
> generate-scr-scrdescriptor of goal
> org.apache.felix:maven-scr-plugin:1.9.0:scr failed: Invalid signature file
> digest for Manifest main attributes
> >
> >   at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:225)
> >
> > ...
> >
> > Caused by: org.apache.maven.plugin.PluginExecutionException: Execution
> generate-scr-scrdescriptor of goal
> org.apache.felix:maven-scr-plugin:1.9.0:scr failed: Invalid signature file
> digest for Manifest main attributes
> >
> >   at
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:110)
> >
> >   at
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
> >
> >   ... 19 more
> >
> > Caused by: java.lang.SecurityException: Invalid signature file digest
> for Manifest main attributes
> >
> >   at
> sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:240)
> >
> > ...
> >
> >
> >
> >
> >
> > Also, from eclipse, if I build a simple main program. Then, I can create
> an executable JAR in 3 ways -
> >
> > a.       Extract required libraries into generated JAR ( individual
> classes inside my JAR)
> >
> > On running main program on this JAR –
> >
> > Exception in thread "main" com.typesafe.config.ConfigException$Missing:
> No configuration setting found for key 'akka.remote.log-received-messages'
> >
> >         at
> com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:126)
> >
> >         at
> com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:146)
> >
> >
> >
> > b.      Package required libraries into generated JAR (all JARs inside
> my JAR)
> >
> > On running main program on this JAR –
> >
> > Exception in thread "main" java.lang.reflect.InvocationTargetException
> >
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >
> >         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >
> >         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >
> >         at java.lang.reflect.Method.invoke(Method.java:616)
> >
> >         at
> org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)
> >
> > Caused by: java.lang.Exception: Could not find resource path for Web UI:
> org/apache/spark/ui/static
> >
> >         at
> org.apache.spark.ui.JettyUtils$.createStaticHandler(JettyUtils.scala:89)
> >
> >         at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:40)
> >
> >         at org.apache.spark.SparkContext.<init>(SparkContext.scala:122)
> >
> >         at
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:67)
> >
> >
> >
> > c.       Copy required libraries into a sub-folder next to generated JAR
> >
> > This works well ! But problem is that it’s not portable to a java server.
> >
> >
> >
> > Regards,
> >
> >
> >
> > Kapil Malik | kmalik@adobe.com
> >
> >
> >
> > -----Original Message-----
> > From: Andrew Ash [mailto:andrew@andrewash.com]
> > Sent: 25 January 2014 04:08
> > To: user@spark.incubator.apache.org
> > Cc: dev@spark.incubator.apache.org
> > Subject: Re: Running spark driver inside a servlet
> >
> >
> >
> > Can you paste the exception you're seeing?
> >
> >
> >
> > Sent from my mobile phone
> >
> > On Jan 24, 2014 2:36 PM, "Kapil Malik" <kmalik@adobe.com<mailto:
> kmalik@adobe.com>> wrote:
> >
> >
> >
> >>  Hi all,
> >
> >>
> >
> >>
> >
> >>
> >
> >> Is it possible to create a Spark Context (i.e. the driver program)
> >
> >> from a servlet deployed on some application server ?
> >
> >>
> >
> >> I am able to run spark Java driver successfully via maven / standalone
> >
> >> (after specifying the classpath), but when I bundle spark libraries in
> >
> >> a JAR, along with my servlet (using maven shade plugin), it gives me
> >
> >> security exception. Any suggestions?
> >
> >>
> >
> >>
> >
> >>
> >
> >> Thanks and regards,
> >
> >>
> >
> >>
> >
> >>
> >
> >> Kapil Malik
> >
> >>
> >
> >>
> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message