I think it works in SparkPipeline-- I have hacks in place to fake a
TIOContext inside of Spark when it's needed, but it's possible we need to
add implementation of more methods to get it to work w/all of the
ReadableData impls.
On Mon, May 9, 2016 at 9:26 AM, David Ortiz <dpo5003@gmail.com> wrote:
> Thanks. That works. I also found a workaround by serializing all the
> avro records into JSON in the map function that reads the data in, then
> deserializing back into avro in my processing function down the line.
>
> Does ReadableData have issues running on a SparkPipeline? Just curious
> since it takes the org.apache.hadoop.mapreduce.TaskInputOutputContext in
> its read method.
>
> On Fri, May 6, 2016 at 4:56 PM Josh Wills <josh.wills@gmail.com> wrote:
>
>> Try using the ReadableData version of the PTable- it's an object that is
>> serializable and you can read the data from it into whatever you want in
>> the initialize method of the DoFn you pass it to.
>>
>> On Fri, May 6, 2016 at 1:03 PM David Ortiz <dpo5003@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> In attempt to make my code a little bit easier to following, I am
>>> attempting to materialize a PTable to a map and then pass it into another
>>> DoFn. Unfortunately, since the value is an Avro record, I am getting a
>>> NotSerializableException out of the code when I try to use it.
>>>
>>> I attempting to get around this by converting the record into a
>>> ByteBuffer with the avro utils, but lo and behold that's also not
>>> Serializable. Since I do not see a convenient way to wrap a byte array
>>> with crunch, has anyone had any luck with any other approaches to getting a
>>> crunch-compatible serialized avro object?
>>>
>>> Thanks,
>>> David Ortiz
>>>
>>
|