spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diana Carroll <dcarr...@cloudera.com>
Subject logging in pyspark
Date Tue, 06 May 2014 19:31:50 GMT
What should I do if I want to log something as part of a task?

This is what I tried.  To set up a logger, I followed the advice here:
http://py4j.sourceforge.net/faq.html#how-to-turn-logging-on-off

logger = logging.getLogger("py4j")
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler())

This works fine when I call it from my driver (ie pyspark):
logger.info("this works fine")

But I want to try logging within a distributed task so I did this:

def logTestMap(a):
    logger.info("test")
    return a

myrdd.map(logTestMap).count()

and got:
PicklingError: Can't pickle 'lock' object

So it's trying to serialize my function and can't because of a lock object
used in logger, presumably for thread-safeness.  But then...how would I do
it?  Or is this just a really bad idea?

Thanks
Diana

Mime
View raw message