spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: Can't seem to link "external/twitter" classes from my own app
Date Thu, 05 Jun 2014 07:46:48 GMT
Great - well we do hope we hear from you, since the user list is for
interesting success stories and anecdotes, as well as blog posts etc too :)


On Thu, Jun 5, 2014 at 9:40 AM, Jeremy Lee <unorthodox.engineers@gmail.com>
wrote:

> Oh. Yes of course. *facepalm*
>
> I'm sure I typed that at first, but at some point my fingers decided to
> grammar-check me. Stupid fingers. I wonder what "sbt assemble" does? (apart
> from error) It certainly takes a while to do it.
>
> Thanks for the maven offer, but I'm not scheduled to learn that until
> after Scala, streaming, graphx, mllib, HDFS, sbt, Python, and yarn. I'll
> probably need to know it for yarn, but I'm really hoping to put it off
> until then. (fortunately I already knew about linux, AWS, eclipse, git,
> java, distributed programming and ssh keyfiles, or I would have been in
> real trouble)
>
> Ha! OK, that worked for the Kafka project... fails on the other old 0.9
> Twitter project, but who cares... now for mine....
>
> HAHA! YES!! Oh thank you! I have the equivalent of "hello world" that uses
> one external library! Now the compiler and I can have a _proper_
> conversation.
>
> Hopefully you won't be hearing from me for a while.
>
>
>
> On Thu, Jun 5, 2014 at 3:06 PM, Nick Pentreath <nick.pentreath@gmail.com>
> wrote:
>
>> The "magic incantation" is "sbt assembly" (not "assemble").
>>
>> Actually I find maven with their assembly plugins to be very easy (mvn
>> package). I can send a Pom.xml for a skeleton project if you need
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Thu, Jun 5, 2014 at 6:59 AM, Jeremy Lee <
>> unorthodox.engineers@gmail.com> wrote:
>>
>>> Hmm.. That's not working so well for me. First, I needed to add a
>>> "project/plugin.sbt" file with the contents:
>>>
>>> addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.11.4")
>>>
>>> Before 'sbt/sbt assemble' worked at all. And I'm not sure about that
>>> version number, but "0.9.1" isn't working much better and "11.4" is the
>>> latest one recommended by the sbt project site. Where did you get your
>>> version from?
>>>
>>> Second, even when I do get it to build a .jar, spark-submit is still
>>> telling me the external.twitter library is missing.
>>>
>>> I tried using your github project as-is, but it also complained about
>>> the missing plugin.. I'm trying it with various versions now to see if I
>>> can get that working, even though I don't know anything about kafka. Hmm,
>>> and no. Here's what I get:
>>>
>>>  [info] Set current project to Simple Project (in build
>>> file:/home/ubuntu/spark-1.0.0/SparkKafka/)
>>> [error] Not a valid command: assemble
>>> [error] Not a valid project ID: assemble
>>> [error] Expected ':' (if selecting a configuration)
>>> [error] Not a valid key: assemble (similar: assembly, assemblyJarName,
>>> assemblyDirectory)
>>> [error] assemble
>>> [error]
>>>
>>> I also found this project which seemed to be exactly what I was after:
>>>  https://github.com/prabeesh/SparkTwitterAnalysis
>>>
>>> ...but it was for Spark 0.9, and though I updated all the version
>>> references to "1.0.0", that one doesn't work either. I can't even get it to
>>> build.
>>>
>>> *sigh*
>>>
>>> Is it going to be easier to just copy the external/ source code into my
>>> own project? Because I will... especially if creating "Uberjars" takes this
>>> long every... single... time...
>>>
>>>
>>>
>>> On Thu, Jun 5, 2014 at 8:52 AM, Jeremy Lee <
>>> unorthodox.engineers@gmail.com> wrote:
>>>
>>>> Thanks Patrick!
>>>>
>>>> Uberjars. Cool. I'd actually heard of them. And thanks for the link to
>>>> the example! I shall work through that today.
>>>>
>>>> I'm still learning sbt and it's many options... the last new framework
>>>> I learned was node.js, and I think I've been rather spoiled by "npm".
>>>>
>>>> At least it's not maven. Please, oh please don't make me learn maven
>>>> too. (The only people who seem to like it have Software Stockholm Syndrome:
>>>> "I know maven kidnapped me and beat me up, but if you spend long enough
>>>> with it, you eventually start to sympathize and see it's point of view".)
>>>>
>>>>
>>>> On Thu, Jun 5, 2014 at 3:39 AM, Patrick Wendell <pwendell@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey Jeremy,
>>>>>
>>>>> The issue is that you are using one of the external libraries and
>>>>> these aren't actually packaged with Spark on the cluster, so you need
>>>>> to create an uber jar that includes them.
>>>>>
>>>>> You can look at the example here (I recently did this for a kafka
>>>>> project and the idea is the same):
>>>>>
>>>>> https://github.com/pwendell/kafka-spark-example
>>>>>
>>>>> You'll want to make an uber jar that includes these packages (run sbt
>>>>> assembly) and then submit that jar to spark-submit. Also, I'd try
>>>>> running it locally first (if you aren't already) just to make the
>>>>> debugging simpler.
>>>>>
>>>>> - Patrick
>>>>>
>>>>>
>>>>> On Wed, Jun 4, 2014 at 6:16 AM, Sean Owen <sowen@cloudera.com>
wrote:
>>>>> > Ah sorry, this may be the thing I learned for the day. The issue
is
>>>>> > that classes from that particular artifact are missing though. Worth
>>>>> > interrogating the resulting .jar file with "jar tf" to see if it
made
>>>>> > it in?
>>>>> >
>>>>> > On Wed, Jun 4, 2014 at 2:12 PM, Nick Pentreath <
>>>>> nick.pentreath@gmail.com> wrote:
>>>>> >> @Sean, the %% syntax in SBT should automatically add the Scala
>>>>> major version
>>>>> >> qualifier (_2.10, _2.11 etc) for you, so that does appear to
be
>>>>> correct
>>>>> >> syntax for the build.
>>>>> >>
>>>>> >> I seemed to run into this issue with some missing Jackson deps,
and
>>>>> solved
>>>>> >> it by including the jar explicitly on the driver class path:
>>>>> >>
>>>>> >> bin/spark-submit --driver-class-path
>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar --class
>>>>> "SimpleApp"
>>>>> >> SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>> >>
>>>>> >> Seems redundant to me since I thought that the JAR as argument
is
>>>>> copied to
>>>>> >> driver and made available. But this solved it for me so perhaps
>>>>> give it a
>>>>> >> try?
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> On Wed, Jun 4, 2014 at 3:01 PM, Sean Owen <sowen@cloudera.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Those aren't the names of the artifacts:
>>>>> >>>
>>>>> >>>
>>>>> >>>
>>>>> http://search.maven.org/#search%7Cga%7C1%7Ca%3A%22spark-streaming-twitter_2.10%22
>>>>> >>>
>>>>> >>> The name is "spark-streaming-twitter_2.10"
>>>>> >>>
>>>>> >>> On Wed, Jun 4, 2014 at 1:49 PM, Jeremy Lee
>>>>> >>> <unorthodox.engineers@gmail.com> wrote:
>>>>> >>> > Man, this has been hard going. Six days, and I finally
got a
>>>>> "Hello
>>>>> >>> > World"
>>>>> >>> > App working that I wrote myself.
>>>>> >>> >
>>>>> >>> > Now I'm trying to make a minimal streaming app based
on the
>>>>> twitter
>>>>> >>> > examples, (running standalone right now while learning)
and when
>>>>> running
>>>>> >>> > it
>>>>> >>> > like this:
>>>>> >>> >
>>>>> >>> > bin/spark-submit --class "SimpleApp"
>>>>> >>> > SimpleApp/target/scala-2.10/simple-project_2.10-1.0.jar
>>>>> >>> >
>>>>> >>> > I'm getting this error:
>>>>> >>> >
>>>>> >>> > Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>> >>> > org/apache/spark/streaming/twitter/TwitterUtils$
>>>>> >>> >
>>>>> >>> > Which I'm guessing is because I haven't put in a dependency
to
>>>>> >>> > "external/twitter" in the .sbt, but _how_? I can't
find any docs
>>>>> on it.
>>>>> >>> > Here's my build file so far:
>>>>> >>> >
>>>>> >>> > simple.sbt
>>>>> >>> > ------------------------------------------
>>>>> >>> > name := "Simple Project"
>>>>> >>> >
>>>>> >>> > version := "1.0"
>>>>> >>> >
>>>>> >>> > scalaVersion := "2.10.4"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-core"
%
>>>>> "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-streaming"
%
>>>>> "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>> "spark-streaming-twitter" %
>>>>> >>> > "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.twitter4j" % "twitter4j-stream"
%
>>>>> "3.0.3"
>>>>> >>> >
>>>>> >>> > resolvers += "Akka Repository" at "http://repo.akka.io/releases/
>>>>> "
>>>>> >>> > ------------------------------------------
>>>>> >>> >
>>>>> >>> > I've tried a few obvious things like adding:
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %% "spark-external"
%
>>>>> "1.0.0"
>>>>> >>> >
>>>>> >>> > libraryDependencies += "org.apache.spark" %%
>>>>> "spark-external-twitter" %
>>>>> >>> > "1.0.0"
>>>>> >>> >
>>>>> >>> > because, well, that would match the naming scheme implied
so
>>>>> far, but it
>>>>> >>> > errors.
>>>>> >>> >
>>>>> >>> >
>>>>> >>> > Also, I just realized I don't completely understand
if:
>>>>> >>> > (a) the "spark-submit" command _sends_ the .jar to
all the
>>>>> workers, or
>>>>> >>> > (b) the "spark-submit" commands sends a _job_ to the
workers,
>>>>> which are
>>>>> >>> > supposed to already have the jar file installed (or
in hdfs), or
>>>>> >>> > (c) the Context is supposed to list the jars to be
distributed.
>>>>> (is that
>>>>> >>> > deprecated?)
>>>>> >>> >
>>>>> >>> > One part of the documentation says:
>>>>> >>> >
>>>>> >>> >  "Once you have an assembled jar you can call the
>>>>> bin/spark-submit
>>>>> >>> > script as
>>>>> >>> > shown here while passing your jar."
>>>>> >>> >
>>>>> >>> > but another says:
>>>>> >>> >
>>>>> >>> > "application-jar: Path to a bundled jar including your
>>>>> application and
>>>>> >>> > all
>>>>> >>> > dependencies. The URL must be globally visible inside
of your
>>>>> cluster,
>>>>> >>> > for
>>>>> >>> > instance, an hdfs:// path or a file:// path that is
present on
>>>>> all
>>>>> >>> > nodes."
>>>>> >>> >
>>>>> >>> > I suppose both could be correct if you take a certain
point of
>>>>> view.
>>>>> >>> >
>>>>> >>> > --
>>>>> >>> > Jeremy Lee  BCompSci(Hons)
>>>>> >>> >   The Unorthodox Engineers
>>>>> >>
>>>>> >>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jeremy Lee  BCompSci(Hons)
>>>>   The Unorthodox Engineers
>>>>
>>>
>>>
>>>
>>> --
>>> Jeremy Lee  BCompSci(Hons)
>>>   The Unorthodox Engineers
>>>
>>
>>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers
>

Mime
View raw message