spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pushkar.Gujar" <>
Subject Re: question regarding pyspark
Date Sat, 22 Apr 2017 00:32:22 GMT
 Hi Afshin,

If you need to associate header information from 2nd file to first one i.e.
, you can do that with specifying custom schema. Below is example from
spark-csv package.   As you can guess, you will have to do some
pre-processing to create customSchema by first reading second file .

val customSchema = StructType(Array(
    StructField("year", IntegerType, true),
    StructField("make", StringType, true),
    StructField("model", StringType, true),
    StructField("comment", StringType, true),
    StructField("blank", StringType, true)))
val df =
    .option("header", "true") // Use first line of all files as header

Thank you,
*Pushkar Gujar*

On Fri, Apr 21, 2017 at 7:37 PM, Afshin, Bardia <> wrote:

> I’m ingesting a CSV with hundreds of columns and the original CSV file
> it’self doesn’t have any header. I do have a separate file that is just the
> headers, is there a way to tell Spark API this information when loading the
> CSV file? Or do I have to do some preprocesisng before doing so?
> Thanks,
> Bardia Afshin
> ------------------------------
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.

View raw message