spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: I would like to add JDBCDialect to support Vertica database
Date Wed, 11 Dec 2019 15:47:47 GMT
Can we make the JDBCDialect a public API that users can plugin? It looks
like an end-less job to make sure Spark JDBC source supports all databases.

On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <lixiao@databricks.com> wrote:

> You can follow how we test the other JDBC dialects. All JDBC dialects
> require the docker integration tests.
> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc
>
>
> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <bryan.herger@microfocus.com>
> wrote:
>
>> Hi, to answer both questions raised:
>>
>>
>>
>> Though Vertica is derived from Postgres, Vertica does not recognize type
>> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
>> enough to cause issues.  The major changes are to use type names and date
>> format supported by Vertica.
>>
>>
>>
>> For testing, I have a SQL script plus Scala and PySpark scripts, but
>> these require a Vertica database to connect, so automated testing on a
>> build server wouldn’t work.  It’s possible to include my test scripts and
>> directions to run manually, but not sure where in the repo that would go.
>> If automated testing is required, I can ask our engineers whether there
>> exists something like a mockito that could be included.
>>
>>
>>
>> Thanks, Bryan H
>>
>>
>>
>> *From:* Xiao Li [mailto:lixiao@databricks.com]
>> *Sent:* Wednesday, December 11, 2019 10:13 AM
>> *To:* Sean Owen <srowen@gmail.com>
>> *Cc:* Bryan Herger <bryan.herger@microfocus.com>; dev@spark.apache.org
>> *Subject:* Re: I would like to add JDBCDialect to support Vertica
>> database
>>
>>
>>
>> How can the dev community test it?
>>
>>
>>
>> Xiao
>>
>>
>>
>> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <srowen@gmail.com> wrote:
>>
>> It's probably OK, IMHO. The overhead of another dialect is small. Are
>> there differences that require a new dialect? I assume so and might
>> just be useful to summarize them if you open a PR.
>>
>> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
>> <bryan.herger@microfocus.com> wrote:
>> >
>> > Hi, I am a Vertica support engineer, and we have open support requests
>> around NULL values and SQL type conversion with DataFrame read/write over
>> JDBC when connecting to a Vertica database.  The stack traces point to
>> issues with the generic JDBCDialect in Spark-SQL.
>> >
>> > I saw that other vendors (Teradata, DB2...) have contributed a
>> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
>> for Vertica.
>> >
>> > The changeset is on my fork of apache/spark at
>> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>> >
>> > I have tested this against Vertica 9.3 and found that this changeset
>> addresses both issues reported to us (issue with NULL values - setNull() -
>> for valid java.sql.Types, and String to VARCHAR conversion)
>> >
>> > Is the an acceptable change?  If so, how should I go about submitting a
>> pull request?
>> >
>> > Thanks, Bryan Herger
>> > Vertica Solution Engineer
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
>>
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>
>
> --
> [image: Databricks Summit - Watch the talks]
> <https://databricks.com/sparkaisummit/north-america>
>

Mime
View raw message