spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: I would like to add JDBCDialect to support Vertica database
Date Thu, 12 Dec 2019 00:54:07 GMT
Not sure, too.
Can't you use Spark Packages for your scenario?
https://spark-packages.org/


On Thu, Dec 12, 2019 at 9:46 AM Hyukjin Kwon <gurwls223@gmail.com> wrote:

> I am not so sure about it too. I think it is enough to expose JDBCDialect
> as an API (which seems already is).
> It brings some overhead to dev (e.g., to test and review PRs related to
> another third party).
> Such third party integration might better exist as a third party library
> without a strong reason.
>
> 2019년 12월 12일 (목) 오전 12:58, Bryan Herger <bryan.herger@microfocus.com>님이
> 작성:
>
>> It kind of already is.  I was able to build the VerticaDialect as a sort
>> of plugin as follows:
>>
>>
>>
>> Check out apache/spark tree
>>
>> Copy in VerticaDialect.scala
>>
>> Build with “mvn -DskipTests compile”
>>
>> package the compiled class plus companion object into a JAR
>>
>> Copy JAR to jars folder in Spark binary installation (optional, probably
>> can set path in an extra --jars argument instead)
>>
>>
>>
>> Then run the following test in spark-shell after creating Vertica table
>> and sample data:
>>
>>
>>
>>
>> org.apache.spark.sql.jdbc.JdbcDialects.registerDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>>
>> val jdbcDF = spark.read.format("jdbc").option("url",
>> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
>> "test_alltypes").option("user", "dbadmin").option("password",
>> "Vertica1!").load()
>>
>> jdbcDF.show()
>>
>> jdbcDF.write.mode("append").format("jdbc").option("url",
>> "jdbc:vertica://hpbox:5433/docker").option("dbtable",
>> "test_alltypes").option("user", "dbadmin").option("password",
>> "Vertica1!").save()
>>
>> JdbcDialects.unregisterDialect(org.apache.spark.sql.jdbc.VerticaDialect)
>>
>>
>>
>> If it would be preferable to write documentation describing the above, I
>> can do that instead.  The hard part is checking out the matching
>> apache/spark tree then copying to the Spark cluster – I can install master
>> branch and latest binary and apply patches since I have root on all my test
>> boxes, but customers may not be able to.  Still, this provides another
>> route to support new JDBC dialects.
>>
>>
>>
>> BryanH
>>
>>
>>
>> *From:* Wenchen Fan [mailto:cloud0fan@gmail.com]
>> *Sent:* Wednesday, December 11, 2019 10:48 AM
>> *To:* Xiao Li <lixiao@databricks.com>
>> *Cc:* Bryan Herger <bryan.herger@microfocus.com>; Sean Owen <
>> srowen@gmail.com>; dev@spark.apache.org
>> *Subject:* Re: I would like to add JDBCDialect to support Vertica
>> database
>>
>>
>>
>> Can we make the JDBCDialect a public API that users can plugin? It looks
>> like an end-less job to make sure Spark JDBC source supports all databases.
>>
>>
>>
>> On Wed, Dec 11, 2019 at 11:41 PM Xiao Li <lixiao@databricks.com> wrote:
>>
>> You can follow how we test the other JDBC dialects. All JDBC dialects
>> require the docker integration tests.
>> https://github.com/apache/spark/tree/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc
>>
>>
>>
>>
>>
>> On Wed, Dec 11, 2019 at 7:33 AM Bryan Herger <bryan.herger@microfocus.com>
>> wrote:
>>
>> Hi, to answer both questions raised:
>>
>>
>>
>> Though Vertica is derived from Postgres, Vertica does not recognize type
>> names TEXT, NVARCHAR, BYTEA, ARRAY, and also handles DATETIME differently
>> enough to cause issues.  The major changes are to use type names and date
>> format supported by Vertica.
>>
>>
>>
>> For testing, I have a SQL script plus Scala and PySpark scripts, but
>> these require a Vertica database to connect, so automated testing on a
>> build server wouldn’t work.  It’s possible to include my test scripts and
>> directions to run manually, but not sure where in the repo that would go.
>> If automated testing is required, I can ask our engineers whether there
>> exists something like a mockito that could be included.
>>
>>
>>
>> Thanks, Bryan H
>>
>>
>>
>> *From:* Xiao Li [mailto:lixiao@databricks.com]
>> *Sent:* Wednesday, December 11, 2019 10:13 AM
>> *To:* Sean Owen <srowen@gmail.com>
>> *Cc:* Bryan Herger <bryan.herger@microfocus.com>; dev@spark.apache.org
>> *Subject:* Re: I would like to add JDBCDialect to support Vertica
>> database
>>
>>
>>
>> How can the dev community test it?
>>
>>
>>
>> Xiao
>>
>>
>>
>> On Wed, Dec 11, 2019 at 6:52 AM Sean Owen <srowen@gmail.com> wrote:
>>
>> It's probably OK, IMHO. The overhead of another dialect is small. Are
>> there differences that require a new dialect? I assume so and might
>> just be useful to summarize them if you open a PR.
>>
>> On Tue, Dec 10, 2019 at 7:14 AM Bryan Herger
>> <bryan.herger@microfocus.com> wrote:
>> >
>> > Hi, I am a Vertica support engineer, and we have open support requests
>> around NULL values and SQL type conversion with DataFrame read/write over
>> JDBC when connecting to a Vertica database.  The stack traces point to
>> issues with the generic JDBCDialect in Spark-SQL.
>> >
>> > I saw that other vendors (Teradata, DB2...) have contributed a
>> JDBCDialect class to address JDBC compatibility, so I wrote up a dialect
>> for Vertica.
>> >
>> > The changeset is on my fork of apache/spark at
>> https://github.com/bryanherger/spark/commit/84d3014e4ead18146147cf299e8996c5c56b377d
>> >
>> > I have tested this against Vertica 9.3 and found that this changeset
>> addresses both issues reported to us (issue with NULL values - setNull() -
>> for valid java.sql.Types, and String to VARCHAR conversion)
>> >
>> > Is the an acceptable change?  If so, how should I go about submitting a
>> pull request?
>> >
>> > Thanks, Bryan Herger
>> > Vertica Solution Engineer
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>> --
>>
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>>
>>
>>
>> --
>>
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>>

-- 
---
Takeshi Yamamuro

Mime
View raw message