mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Runtime Interner Exception
Date Thu, 11 Jun 2015 18:34:01 GMT
Mihai,

buna dimeneata,

with 0.10 code name samsara, you have two choices:
(1) embedded api -- nest in your java/scala application; or
(2) algebra/Scala scripting with spark-shell [1] -- more R-like experience.

with embedded api, you will have to take care of writing your own
application and set up proper mahout dependencies and imports and take care
of context(session) [2,3] creation -- this probably takes more time to try
but it is how i use it. Also, Scala idea plugin is more useful there to
guide you thru syntax whereas in shell you'd have to do some additional
handwaiving to make idea as useful. up 2 you.

use head of 0.10.x branch + spark 1.2.x. (h2o is broken there but i assume
you don't care about h2o).

After you are done with this boilerplate nonsense, you are ready to code:

(1) I assume since currently you are trying to use DRM, you have it
persisted somewhere. Mahout's DRM is compabile on persistence throughout --
it is native persistence format for DrmLike type in Samsara as well. so
first we load it [4]
(2) then we ask for dssvd to compute what we need with parameters we need
[5]
(3) then we save any required product back to dfs where we need it [6]

Please do not let amount of references to discourage you; i am just trying
to be exhaustively helpful. Bottom line experience should be no more
complicated (and in some aspects perhaps even exceed) R experience.

[1] http://mahout.apache.org/users/sparkbindings/play-with-shell.html
[2] create context:
http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pfd
[3] distributed imports:
http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pfd
[4] loading from dfs
http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pfe -- this is a
bit dated; the name has changed to drmDfsRead
[5] dssvd invocation
http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pf17
[6] saving back to dfs
http://apache.github.io/mahout/doc/ScalaSparkBindings.html#pff -- this has
also changed i think to `dfsWrite` to be consistent with conventions.


On Wed, Jun 10, 2015 at 11:46 PM, Mihai Dascalu <mihai.dascalu@cs.pub.ro>
wrote:

> Ok, you convinced me :) But can you please help me with an example or some
> documentation? I only found fragments of code (and only Scala, not Java
> Spark).
>
> How should I create the input matrix, invoke dssvd, as well as configure
> processors/ memory?
>
> Also, are there some specific dependencies of versions? Should I wait for
> the next release?
>
>
> Thanks a lot and have a great day!
> Mihai
>
> > On Jun 10, 2015, at 23:57, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> >
> > Hadoop has its own guava. This is some dependency clash at runtime, for
> > sure. Other than that no idea. MR is being phased out. Why don't u try
> > spark version in upcoming .10.2?
> > On Jun 10, 2015 12:58 PM, "Mihai Dascalu" <mihai.dascalu@cs.pub.ro>
> wrote:
> >
> >> Hi!
> >>
> >> After upgrading to Mahout 0.10.1, I have a runtime exception in the
> >> following Hadoop code in which I create the input matrix for performing
> >> SSVD:
> >>
> >> // prepare output matrix
> >> 81:             final Configuration conf = new Configuration();
> >>
> >> 83:             SequenceFile.Writer writer =
> >> SequenceFile.createWriter(conf,
> >>                                Writer.file(new Path(path + "/" +
> >> outputFileName)),
> >>                                Writer.keyClass(IntWritable.class),
> >>                                Writer.valueClass(VectorWritable.class));
> >>
> >> while in the console we have:
> >> ...
> >> [Loaded org.apache.hadoop.util.StringInterner from
> >>
> file:/Users/mihaidascalu/Dropbox%20(Personal)/Workspace/Eclipse/ReaderBenchDev/lib/Mahout/mahout-mr-0.10.1-job.jar]
> >> ...
> >> java.lang.VerifyError: (class: com/google/common/collect/Interners,
> >> method: newWeakInterner signature:
> ()Lcom/google/common/collect/Interner;)
> >> Incompatible argument to function
> >>        at
> >> org.apache.hadoop.util.StringInterner.<clinit>(StringInterner.java:48)
> >>        at
> >>
> org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2293)
> >>        at
> >>
> org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2185)
> >>        at
> >> org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2102)
> >>        at
> org.apache.hadoop.conf.Configuration.get(Configuration.java:851)
> >>        at
> >>
> org.apache.hadoop.io.SequenceFile.getDefaultCompressionType(SequenceFile.java:234)
> >>        at
> >> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:264)
> >>        at
> >>
> services.semanticModels.LSA.CreateInputMatrix.parseCorpus(CreateInputMatrix.java:83)
> >>        at
> >>
> services.semanticModels.LSA.CreateInputMatrix.main(CreateInputMatrix.java:197)
> >>
> >> Any suggestions? I tried adding guava-14.0.1.jar as dependency, but it
> did
> >> not fix it
> >>
> >>
> >> Thanks and have a great day!
> >> Mihai
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message