mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Upgrade to Spark 1.1.0?
Date Thu, 23 Oct 2014 17:00:51 GMT
Off the list I’ve heard of problems using the maven artifacts for Spark even when you are
not building Spark. There have been reported problems in the serialization class UIDs generated
when building Mahout. If you encounter those try the build method in the PR and report these
to the Spark folks.

On Oct 21, 2014, at 3:48 PM, Pat Ferrel <pat@occamsmachete.com> wrote:

Right.

Something else that’s come up so I haven’t tried the shell tutorial yet. If anyone else
wants to try it you can build Mahout from this PR:
https://github.com/apache/mahout/pull/61

On Oct 21, 2014, at 3:28 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

hm no they don't push different binary releases to maven. I assume they
only push the default one.

On Tue, Oct 21, 2014 at 3:26 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> ps i remember discussion for packaging binary spark distributions. So
> there's in fact a number of different spark artifact releases. However, i
> am not sure if they are pushing them to mvn repositories. (if they did,
> they might use different maven classifiers for those). If that's the case,
> then one plausible strategy here is to recommend rebuilding mahout with
> dependency to a classifier corresponding to the actual spark binary release
> used.
> 
> On Tue, Oct 21, 2014 at 2:21 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> 
>> if you are using mahout shell or command line drivers (which i dont) it
>> would seem the correct thing to do is for mahout script simply to take
>> spark dependencies from installed $SPARK_HOME rather than from Mahout's
>> assembly. In fact that would be consistent with what other projects are
>> doing in similar situation. it should also probably make things compatible
>> between minor releases of spark.
>> 
>> But i think you are right in a sense that the problem is that spark jars
>> are not uniquely encompassed by maven artifact id and version, unlike with
>> most other products. (e.g. if we see mahout-math-0.9.jar we expect there to
>> be one and only one released artifact in existence -- but one's local build
>> may create incompatible variations).
>> 
>> On Tue, Oct 21, 2014 at 1:51 PM, Pat Ferrel <pat@occamsmachete.com>
>> wrote:
>> 
>>> The problem is not in building Spark it is in building Mahout using the
>>> correct Spark jars. If you are using CDH and hadoop 2 the correct jars are
>>> in the repos.
>>> 
>>> For the rest of us, though the process below seems like an error prone
>>> hack to me it does work on Linux and BSD/mac. It should really be addressed
>>> by Spark imo.
>>> 
>>> BTW The cache is laid out differently on linux but I don’t think you
>>> need to delete is anyway.
>>> 
>>> On Oct 21, 2014, at 12:27 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> wrote:
>>> 
>>> fwiw i never built spark using maven. Always use sbt assembly.
>>> 
>>> On Tue, Oct 21, 2014 at 11:55 AM, Pat Ferrel <pat@occamsmachete.com>
>>> wrote:
>>> 
>>>> Ok, the mystery is solved.
>>>> 
>>>> The safe sequence from my limited testing is:
>>>> 1) delete ~/.m2/repository/org/spark and mahout
>>>> 2) build Spark for your version of Hadoop *but do not use "mvn package
>>>> ...”* use “mvn install …” This will put a copy of the exact bits
you
>>> need
>>>> into the maven cache for building mahout against. In my case using
>>> hadoop
>>>> 1.2.1 it was "mvn -Dhadoop.version=1.2.1 -DskipTests clean install” If
>>> you
>>>> run tests on Spark some failures can safely be ignored according to the
>>>> Spark guys so check before giving up.
>>>> 3) build mahout with “mvn clean install"
>>>> 
>>>> This will create mahout from exactly the same bits you will run on your
>>>> cluster. It got rid of a missing anon function for me. The problem
>>> occurs
>>>> when you use a different version of Spark on your cluster than you
>>> used to
>>>> build Mahout and this is rather hidden by Maven. Maven downloads from
>>> repos
>>>> any dependency that is not in the local .m2 cache and so you have to
>>> make
>>>> sure your version of Spark is there so Maven wont download one that is
>>>> incompatible. Unless you really know what you are doing I’d build both
>>>> Spark and Mahout for now
>>>> 
>>>> BTW I will check in the Spark 1.1.0 version of Mahout once I do some
>>> more
>>>> testing.
>>>> 
>>>> On Oct 21, 2014, at 10:26 AM, Pat Ferrel <pat@occamsmachete.com>
>>> wrote:
>>>> 
>>>> Sorry to hear. I bet you’ll find a way.
>>>> 
>>>> The Spark Jira trail leads to two suggestions:
>>>> 1) use spark-submit to execute code with your own entry point (other
>>> than
>>>> spark-shell) One theory points to not loading all needed Spark classes
>>> from
>>>> calling code (Mahout in our case). I can hand check the jars for the
>>> anon
>>>> function I am missing.
>>>> 2) there may be different class names in the running code (created by
>>>> building Spark locally) and the  version referenced in the Mahout POM.
>>> If
>>>> this turns out to be true it means we can’t rely on building Spark
>>> locally.
>>>> Is there a maven target that puts the artifacts of the Spark build in
>>> the
>>>> .m2/repository local cache? That would be an easy way to test this
>>> theory.
>>>> 
>>>> either of these could cause missing classes.
>>>> 
>>>> 
>>>> On Oct 21, 2014, at 9:52 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> wrote:
>>>> 
>>>> no i havent used it with anything but 1.0.1 and 0.9.x .
>>>> 
>>>> on a side note, I just have changed my employer. It is one of these big
>>>> guys that make it very difficult to do any contributions. So I am not
>>> sure
>>>> how much of anything i will be able to share/contribute.
>>>> 
>>>> On Tue, Oct 21, 2014 at 9:43 AM, Pat Ferrel <pat@occamsmachete.com>
>>> wrote:
>>>> 
>>>>> But unless you have the time to devote to errors avoid it. I’ve built
>>>>> everything from scratch using 1.0.2 and 1.1.0 and am getting these and
>>>>> missing class errors. The 1.x branch seems to have some kind of
>>> peculiar
>>>>> build order dependencies. The errors sometimes don’t show up until
>>>> runtime,
>>>>> passing all build tests.
>>>>> 
>>>>> Dmitriy, have you successfully used any Spark version other than
>>> 1.0.1 on
>>>>> a cluster? If so do you recall the exact order and from what sources
>>> you
>>>>> built?
>>>>> 
>>>>> 
>>>>> On Oct 21, 2014, at 9:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> wrote:
>>>>> 
>>>>> You can't use spark client of one version and have the backend of
>>>> another.
>>>>> You can try to change spark dependency in mahout poms to match your
>>>> backend
>>>>> (or vice versa, you can change your backend to match what's on the
>>>> client).
>>>>> 
>>>>> On Tue, Oct 21, 2014 at 7:12 AM, Mahesh Balija <
>>>> balijamahesh.mca@gmail.com
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Here are the errors I get which I run in a pseudo distributed mode,
>>>>>> 
>>>>>> Spark 1.0.2 and Mahout latest code (Clone)
>>>>>> 
>>>>>> When I run the command in page,
>>>>>> https://mahout.apache.org/users/sparkbindings/play-with-shell.html
>>>>>> 
>>>>>> val drmX = drmData(::, 0 until 4)
>>>>>> 
>>>>>> java.io.InvalidClassException: org.apache.spark.rdd.RDD; local class
>>>>>> incompatible: stream classdesc serialVersionUID = 385418487991259089,
>>>>>> local class serialVersionUID = -6766554341038829528
>>>>>>   at
>>>>>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:592)
>>>>>>   at
>>>>>> 
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>>>>   at
>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>>>>   at
>>>>>> 
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>>>>   at
>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>>>>   at
>>>>>> 
>>>> 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
>>>>>>   at
>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>>>>   at
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
>>>>>>   at
>>>>>> 
>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
>>>>>>   at
>>>>>> 
>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
>>>>>>   at
>>>>>> 
>>>> 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
>>>>>>   at
>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>>>>   at
>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
>>>>>>   at
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>   at java.lang.Thread.run(Thread.java:701)
>>>>>> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 1 (task 0.0:1)
>>>>>> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 2 (task 0.0:0)
>>>>>> 14/10/21 19:35:37 WARN TaskSetManager: Lost TID 3 (task 0.0:1)
>>>>>> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 4 (task 0.0:0)
>>>>>> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 5 (task 0.0:1)
>>>>>> 14/10/21 19:35:38 WARN TaskSetManager: Lost TID 6 (task 0.0:0)
>>>>>> org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>>> Task 0.0:0 failed 4 times, most recent failure: Exception failure
in
>>>>>> TID 6 on host mahesh-VirtualBox.local: java.io.InvalidClassException:
>>>>>> org.apache.spark.rdd.RDD; local class incompatible: stream classdesc
>>>>>> serialVersionUID = 385418487991259089, local class serialVersionUID
=
>>>>>> -6766554341038829528
>>>>>> 
>>> java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:592)
>>>>>> 
>>>>>> 
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>>>> 
>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>>>> 
>>>>>> 
>>> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1621)
>>>>>> 
>>>>>> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1516)
>>>>>> 
>>>>>> 
>>>> 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1770)
>>>>>> 
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>>>>   java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
>>>>>> 
>>>>>> 
>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
>>>>>> 
>>>>>> 
>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1836)
>>>>>> 
>>>>>> 
>>>> 
>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1795)
>>>>>> 
>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
>>>>>>   java.io.ObjectInputStream.readObject(ObjectInputStream.java:369)
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
>>>>>> 
>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:165)
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>>   java.lang.Thread.run(Thread.java:701)
>>>>>> Driver stacktrace:
>>>>>>   at org.apache.spark.scheduler.DAGScheduler.org
>>>>>> 
>>>>> 
>>>> 
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>>>   at
>>>>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>>>   at scala.Option.foreach(Option.scala:236)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>>>   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>>>   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>>>   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>>>   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>>>   at
>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>>>   at
>>>>>> 
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>   at
>>>>>> 
>>>>> 
>>>> 
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>> 
>>>>>> Best,
>>>>>> Mahesh Balija.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Oct 21, 2014 at 2:38 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> On Mon, Oct 20, 2014 at 1:51 PM, Pat Ferrel <pat@occamsmachete.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Is anyone else nervous about ignoring this issue or relying
on
>>>>>> non-build
>>>>>>>> (hand run) test driven transitive dependency checking. I
hope
>>> someone
>>>>>>> else
>>>>>>>> will chime in.
>>>>>>>> 
>>>>>>>> As to running unit tests on a TEST_MASTER I’ll look into
it. Can we
>>>> set
>>>>>>> up
>>>>>>>> the build machine to do this? I’d feel better about eyeballing
>>> deps if
>>>>>> we
>>>>>>>> could have a TEST_MASTER automatically run during builds
at Apache.
>>>>>> Maybe
>>>>>>>> the regular unit tests are OK for building locally ourselves.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Oct 20, 2014, at 12:23 PM, Dmitriy Lyubimov <dlieu.7@gmail.com
>>>> 
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> On Mon, Oct 20, 2014 at 11:44 AM, Pat Ferrel <
>>> pat@occamsmachete.com>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Maybe a more fundamental issue is that we don’t
know for sure
>>>>>> whether
>>>>>>> we
>>>>>>>>>> have missing classes or not. The job.jar at least
used the pom
>>>>>>>> dependencies
>>>>>>>>>> to guarantee every needed class was present. So the
job.jar
>>> seems to
>>>>>>>> solve
>>>>>>>>>> the problem but may ship some unnecessary duplicate
code, right?
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> No, as i wrote spark doesn't  work with job jar format.
Neither
>>> as it
>>>>>>>> turns
>>>>>>>>> out more recent hadoop MR btw.
>>>>>>>> 
>>>>>>>> Not speaking literally of the format. Spark understands jars
and
>>> maven
>>>>>>> can
>>>>>>>> build one from transitive dependencies.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Yes, this is A LOT of duplicate code (will take normally
MINUTES
>>> to
>>>>>>>> startup
>>>>>>>>> tasks with all of it just on copy time). This is absolutely
not
>>> the
>>>>>> way
>>>>>>>> to
>>>>>>>>> go with this.
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Lack of guarantee to load seems like a bigger problem than
startup
>>>>>> time.
>>>>>>>> Clearly we can’t just ignore this.
>>>>>>>> 
>>>>>>> 
>>>>>>> Nope. given highly iterative nature and dynamic task allocation
in
>>> this
>>>>>>> environment, one is looking to effects similar to Map Reduce.
This
>>> is
>>>>> not
>>>>>>> the only reason why I never go to MR anymore, but that's one
of main
>>>>>> ones.
>>>>>>> 
>>>>>>> How about experiment: why don't you create assembly that copies
ALL
>>>>>>> transitive dependencies in one folder, and then try to broadcast
it
>>>> from
>>>>>>> single point (front end) to well... let's start with 20 machines.
>>> (of
>>>>>>> course we ideally want to into 10^3 ..10^4 range -- but why bother
>>> if
>>>> we
>>>>>>> can't do it for 20).
>>>>>>> 
>>>>>>> Or, heck, let's try to simply parallel-copy it between too machines
>>> 20
>>>>>>> times that are not collocated on the same subnet.
>>>>>>> 
>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> There may be any number of bugs waiting for the time
we try
>>> running
>>>>>>> on a
>>>>>>>>>> node machine that doesn’t have some class in it’s
classpath.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> No. Assuming any given method is tested on all its execution
>>> paths,
>>>>>>> there
>>>>>>>>> will be no bugs. The bugs of that sort will only appear
if the
>>> user
>>>>>> is
>>>>>>>>> using algebra directly and calls something that is not
on the
>>> path,
>>>>>>> from
>>>>>>>>> the closure. In which case our answer to this is the
same as for
>>> the
>>>>>>>> solver
>>>>>>>>> methodology developers -- use customized SparkConf while
creating
>>>>>>> context
>>>>>>>>> to include stuff you really want.
>>>>>>>>> 
>>>>>>>>> Also another right answer to this is that we probably
should
>>>>>> reasonably
>>>>>>>>> provide the toolset here. For example, all the stats
stuff found
>>> in R
>>>>>>>> base
>>>>>>>>> and R stat packages so the user is not compelled to go
non-native.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Huh? this is not true. The one I ran into was found by calling
>>>>>> something
>>>>>>>> in math from something in math-scala. It led outside and
you can
>>>>>>> encounter
>>>>>>>> such things even in algebra.  In fact you have no idea if
these
>>>>>> problems
>>>>>>>> exists except for the fact you have used it a lot personally.
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> You ran it with your own code that never existed before.
>>>>>>> 
>>>>>>> But there's difference between released Mahout code (which is
what
>>> you
>>>>>> are
>>>>>>> working on) and the user code. Released code must run thru remote
>>> tests
>>>>>> as
>>>>>>> you suggested and thus guarantee there are no such problems with
>>> post
>>>>>>> release code.
>>>>>>> 
>>>>>>> For users, we only can provide a way for them to load stuff that
>>> they
>>>>>>> decide to use. We don't have apriori knowledge what they will
use.
>>> It
>>>> is
>>>>>>> the same thing that spark does, and the same thing that MR does,
>>>> doesn't
>>>>>>> it?
>>>>>>> 
>>>>>>> Of course mahout should drop rigorously the stuff it doesn't
load,
>>> from
>>>>>> the
>>>>>>> scala scope. No argue about that. In fact that's what i suggested
>>> as #1
>>>>>>> solution. But there's nothing much to do here but to go dependency
>>>>>>> cleansing for math and spark code. Part of the reason there's
so
>>> much
>>>> is
>>>>>>> because newer modules still bring in everything from mrLegacy.
>>>>>>> 
>>>>>>> You are right in saying it is hard to guess what else dependencies
>>> are
>>>>> in
>>>>>>> the util/legacy code that are actually used. but that's not a
>>>>>> justification
>>>>>>> for brute force "copy them all" approach that virtually guarantees
>>>>>> ruining
>>>>>>> one of the foremost legacy issues this work intended to address.
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
> 



Mime
View raw message