spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Documenting the various DataFrame/SQL join types
Date Tue, 08 May 2018 15:42:41 GMT
Would be great to document. Probably best with examples.

On Tue, May 8, 2018 at 6:13 AM Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:

> The documentation for DataFrame.join()
> <https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join>
> lists all the join types we support:
>
>    - inner
>    - cross
>    - outer
>    - full
>    - full_outer
>    - left
>    - left_outer
>    - right
>    - right_outer
>    - left_semi
>    - left_anti
>
> Some of these join types are also listed on the SQL Programming Guide
> <http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#supported-hive-features>
> .
>
> Is it obvious to everyone what all these different join types are? For
> example, I had never heard of a LEFT ANTI join until stumbling on it in the
> PySpark docs. It’s quite handy! But I had to experiment with it a bit just
> to understand what it does.
>
> I think it would be a good service to our users if we either documented
> these join types ourselves clearly, or provided a link to an external
> resource that documented them sufficiently. I’m happy to file a JIRA about
> this and do the work itself. It would be great if the documentation could
> be expressed as a series of simple doc tests, but brief prose describing
> how each join works would still be valuable.
>
> Does this seem worthwhile to folks here? And does anyone want to offer
> guidance on how best to provide this kind of documentation so that it’s
> easy to find by users, regardless of the language they’re using?
>
> Nick
> ​
>

Mime
View raw message