spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <>
Subject Re: When to expect UTF8String?
Date Fri, 12 Jun 2015 04:05:34 GMT
Through the DataFrame API, users should never see UTF8String.

Expression (and any class in the catalyst package) is considered internal
and so uses the internal representation of various types.  Which type we
use here is not stable across releases.

Is there a reason you aren't defining a UDF instead?

On Thu, Jun 11, 2015 at 8:08 PM, zsampson <> wrote:

> I'm hoping for some clarity about when to expect String vs UTF8String when
> using the Java DataFrames API.
> In upgrading to Spark 1.4, I'm dealing with a lot of errors where what was
> once a String is now a UTF8String. The comments in the file and the related
> commit message indicate that maybe it should be internal to SparkSQL's
> implementation.
> However, when I add a column containing a custom subclass of Expression,
> the
> row passed to the eval method contains instances of UTF8String. Ditto for
> AggregateFunction.update. Is this expected? If so, when should I generally
> know to deal with UTF8String objects?
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message