spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: Documenting the various DataFrame/SQL join types
Date Wed, 09 May 2018 02:53:22 GMT
OK great, I’m happy to take this on.

Does it make sense to approach this by adding an example for each join type
here
<https://github.com/apache/spark/blob/master/examples/src/main/python/sql/basic.py>
(and perhaps also in the matching areas for Scala, Java, and R), and then
referencing the examples from the SQL Programming Guide
<https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md>
using include_example tags?

e.g.:

<div data-lang="python"  markdown="1">
{% include_example write_sorting_and_bucketing python/sql/datasource.py %}</div>

And would this let me implement simple tests for the examples? It’s not
clear to me whether the comment blocks in that example file are used for
testing somehow.

Just looking for some high level guidance.

Nick
​

On Tue, May 8, 2018 at 11:42 AM Reynold Xin <rxin@databricks.com> wrote:

> Would be great to document. Probably best with examples.
>
> On Tue, May 8, 2018 at 6:13 AM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> The documentation for DataFrame.join()
>> <https://spark.apache.org/docs/2.3.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.join>
>> lists all the join types we support:
>>
>>    - inner
>>    - cross
>>    - outer
>>    - full
>>    - full_outer
>>    - left
>>    - left_outer
>>    - right
>>    - right_outer
>>    - left_semi
>>    - left_anti
>>
>> Some of these join types are also listed on the SQL Programming Guide
>> <http://spark.apache.org/docs/2.3.0/sql-programming-guide.html#supported-hive-features>
>> .
>>
>> Is it obvious to everyone what all these different join types are? For
>> example, I had never heard of a LEFT ANTI join until stumbling on it in the
>> PySpark docs. It’s quite handy! But I had to experiment with it a bit just
>> to understand what it does.
>>
>> I think it would be a good service to our users if we either documented
>> these join types ourselves clearly, or provided a link to an external
>> resource that documented them sufficiently. I’m happy to file a JIRA about
>> this and do the work itself. It would be great if the documentation could
>> be expressed as a series of simple doc tests, but brief prose describing
>> how each join works would still be valuable.
>>
>> Does this seem worthwhile to folks here? And does anyone want to offer
>> guidance on how best to provide this kind of documentation so that it’s
>> easy to find by users, regardless of the language they’re using?
>>
>> Nick
>> ​
>>
>

Mime
View raw message