spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DB Tsai <dbt...@stanford.edu>
Subject Re: Calling external classes added by sc.addJar needs to be through reflection
Date Mon, 19 May 2014 00:03:30 GMT
The jars are included in my driver, and I can successfully use them in the
driver. I'm working on a patch, and it's almost working. Will submit a PR
soon.


Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Sun, May 18, 2014 at 11:58 AM, Patrick Wendell <pwendell@gmail.com>wrote:

> @db - it's possible that you aren't including the jar in the classpath
> of your driver program (I think this is what mridul was suggesting).
> It would be helpful to see the stack trace of the CNFE.
>
> - Patrick
>
> On Sun, May 18, 2014 at 11:54 AM, Patrick Wendell <pwendell@gmail.com>
> wrote:
> > @xiangrui - we don't expect these to be present on the system
> > classpath, because they get dynamically added by Spark (e.g. your
> > application can call sc.addJar well after the JVM's have started).
> >
> > @db - I'm pretty surprised to see that behavior. It's definitely not
> > intended that users need reflection to instantiate their classes -
> > something odd is going on in your case. If you could create an
> > isolated example and post it to the JIRA, that would be great.
> >
> > On Sun, May 18, 2014 at 9:58 AM, Xiangrui Meng <mengxr@gmail.com> wrote:
> >> I created a JIRA: https://issues.apache.org/jira/browse/SPARK-1870
> >>
> >> DB, could you add more info to that JIRA? Thanks!
> >>
> >> -Xiangrui
> >>
> >> On Sun, May 18, 2014 at 9:46 AM, Xiangrui Meng <mengxr@gmail.com>
> wrote:
> >>> Btw, I tried
> >>>
> >>> rdd.map { i =>
> >>>   System.getProperty("java.class.path")
> >>> }.collect()
> >>>
> >>> but didn't see the jars added via "--jars" on the executor classpath.
> >>>
> >>> -Xiangrui
> >>>
> >>> On Sat, May 17, 2014 at 11:26 PM, Xiangrui Meng <mengxr@gmail.com>
> wrote:
> >>>> I can re-produce the error with Spark 1.0-RC and YARN (CDH-5). The
> >>>> reflection approach mentioned by DB didn't work either. I checked the
> >>>> distributed cache on a worker node and found the jar there. It is also
> >>>> in the Environment tab of the WebUI. The workaround is making an
> >>>> assembly jar.
> >>>>
> >>>> DB, could you create a JIRA and describe what you have found so far?
> Thanks!
> >>>>
> >>>> Best,
> >>>> Xiangrui
> >>>>
> >>>> On Sat, May 17, 2014 at 1:29 AM, Mridul Muralidharan <
> mridul@gmail.com> wrote:
> >>>>> Can you try moving your mapPartitions to another class/object which
> is
> >>>>> referenced only after sc.addJar ?
> >>>>>
> >>>>> I would suspect CNFEx is coming while loading the class containing
> >>>>> mapPartitions before addJars is executed.
> >>>>>
> >>>>> In general though, dynamic loading of classes means you use
> reflection to
> >>>>> instantiate it since expectation is you don't know which
> implementation
> >>>>> provides the interface ... If you statically know it apriori, you
> bundle it
> >>>>> in your classpath.
> >>>>>
> >>>>> Regards
> >>>>> Mridul
> >>>>> On 17-May-2014 7:28 am, "DB Tsai" <dbtsai@stanford.edu> wrote:
> >>>>>
> >>>>>> Finally find a way out of the ClassLoader maze! It took me some
> times to
> >>>>>> understand how it works; I think it worths to document it in
a
> separated
> >>>>>> thread.
> >>>>>>
> >>>>>> We're trying to add external utility.jar which contains
> CSVRecordParser,
> >>>>>> and we added the jar to executors through sc.addJar APIs.
> >>>>>>
> >>>>>> If the instance of CSVRecordParser is created without reflection,
it
> >>>>>> raises *ClassNotFound
> >>>>>> Exception*.
> >>>>>>
> >>>>>> data.mapPartitions(lines => {
> >>>>>>     val csvParser = new CSVRecordParser((delimiter.charAt(0))
> >>>>>>     lines.foreach(line => {
> >>>>>>       val lineElems = csvParser.parseLine(line)
> >>>>>>     })
> >>>>>>     ...
> >>>>>>     ...
> >>>>>>  )
> >>>>>>
> >>>>>>
> >>>>>> If the instance of CSVRecordParser is created through reflection,
> it works.
> >>>>>>
> >>>>>> data.mapPartitions(lines => {
> >>>>>>     val loader = Thread.currentThread.getContextClassLoader
> >>>>>>     val CSVRecordParser =
> >>>>>>         loader.loadClass("com.alpine.hadoop.ext.CSVRecordParser")
> >>>>>>
> >>>>>>     val csvParser = CSVRecordParser.getConstructor(Character.TYPE)
> >>>>>>         .newInstance(delimiter.charAt(0).asInstanceOf[Character])
> >>>>>>
> >>>>>>     val parseLine = CSVRecordParser
> >>>>>>         .getDeclaredMethod("parseLine", classOf[String])
> >>>>>>
> >>>>>>     lines.foreach(line => {
> >>>>>>        val lineElems = parseLine.invoke(csvParser,
> >>>>>> line).asInstanceOf[Array[String]]
> >>>>>>     })
> >>>>>>     ...
> >>>>>>     ...
> >>>>>>  )
> >>>>>>
> >>>>>>
> >>>>>> This is identical to this question,
> >>>>>>
> >>>>>>
> http://stackoverflow.com/questions/7452411/thread-currentthread-setcontextclassloader-without-using-reflection
> >>>>>>
> >>>>>> It's not intuitive for users to load external classes through
> reflection,
> >>>>>> but couple available solutions including 1) messing around
> >>>>>> systemClassLoader by calling systemClassLoader.addURI through
> reflection or
> >>>>>> 2) forking another JVM to add jars into classpath before bootstrap
> loader
> >>>>>> are very tricky.
> >>>>>>
> >>>>>> Any thought on fixing it properly?
> >>>>>>
> >>>>>> @Xiangrui,
> >>>>>> netlib-java jniloader is loaded from netlib-java through
> reflection, so
> >>>>>> this problem will not be seen.
> >>>>>>
> >>>>>> Sincerely,
> >>>>>>
> >>>>>> DB Tsai
> >>>>>> -------------------------------------------------------
> >>>>>> My Blog: https://www.dbtsai.com
> >>>>>> LinkedIn: https://www.linkedin.com/in/dbtsai
> >>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message