[SPARK-38841][SQL] Enable Bloom filter Joins by default#36102
[SPARK-38841][SQL] Enable Bloom filter Joins by default#36102andylam-db wants to merge 7 commits into
Conversation
6b42b47 to
722f881
Compare
|
Can one of the admins verify this patch? |
| .version("3.3.0") | ||
| .booleanConf | ||
| .createWithDefault(false) | ||
| .createWithDefault(true) |
There was a problem hiding this comment.
Hmm, is this change for master or 3.3? Seems this is a new feature added in 3.3, isn't it safer to keep it as false by default? Usually we choose to be conservative instead of aggressive on enabling new feature.
There was a problem hiding this comment.
It's for master only, and will not be merged to 3.3.
This way, new PRs to the master will less likely to break this feature.
There was a problem hiding this comment.
@sunchao Maybe you can enable vectorized Parquet reader by default in master branch only too. So it won't be silently broken.
There was a problem hiding this comment.
Thanks, that's a good idea. Let me switch it.
|
cc @wangyum |
|
Could we use a new ticket ID? Otherwise users might get confused: same ticket ID, but different behavior in 3.3 and 3.4. |
@andylam-db has updated this PR with a new ticket ID. |
|
Merged to master. |
What changes were proposed in this pull request?
Enable Bloom Filter Joins by default.
Why are the changes needed?
Improve the performance of some joins by pre-filtering one side of a join using a Bloom filter. See: https://issues.apache.org/jira/browse/SPARK-32268 for performance benchmarks.
How was this patch tested?
N/A