用户级调优主要手段:
- https://www.jianshu.com/p/048aa1cac43c
资源调优:
- https://www.jianshu.com/p/c853997ea1f6
参数调优:
- https://blog.csdn.net/yuanbingze/article/details/97368552
数据本地性:
- https://blog.csdn.net/zy_zhengyang/article/details/78714346
推测执行:
- https://blog.csdn.net/wangpei1949/article/details/88927332
官网 调优 tips:
- https://spark.apache.org/docs/3.0.0-preview/sql-performance-tuning.html
- https://spark.apache.org/docs/3.0.0-preview/tuning.html
databrick 视频 :
- https://databricks.com/session/scalable-monitoring-using-prometheus-with-apache-spark-clusters
调优相关书籍:
- https://github.com/vaquarkhan/vaquarkhan/blob/master/high-performance-spark.pdf
spark 内存架构及管理:
- https://www.jianshu.com/p/02fca6460c37
- https://www.jianshu.com/p/395fc098eedf
spark 性能分析 rest API:
- https://spark.apache.org/docs/latest/monitoring.html#rest-api
spark 性能分析 监测工具:
- https://github.com/netdata/netdata/issues/4853
spark 的一些常识:
- https://zhuanlan.zhihu.com/p/76518708
- https://www.jianshu.com/p/330ec1347423 (集群架构)
# Deep Dive into Spark SQL with Advanced Performance Tuning
refer to `https://databricks.com/session/scalable-monitoring-using-prometheus-with-apache-spark-clusters`
---
### This video talks
- API selection
- optimize the meta catalog
- cache manager
- whole stage code generation
- data sources (eg.parquet vectorized)
- partitioning and bucketing (avoid shuffle)(http://dbricks.co/2oG6ZBL)
---
---
### Databricks official optimization plan
- Catalyst optimization phase (https://databricks.com/glossary/catalyst-optimizer https://databricks.com/session/a-deep-dive-into-the-catalyst-optimizer https://databricks.com/session/a-deep-dive-into-the-catalyst-optimizer-hands-on-lab?utm_campaign=Spark%20Summit%20EU%202016&utm_content=34985851&utm_medium=social&utm_source=twitter)
- rule
- strategy (eg.use HINT)
- etc
- Tungsten Execution phase (https://databricks.com/glossary/tungsten)
- Memory Management and Binary Processing
- Cache-aware computation
- Code generation
- No virtual function dispatches
- Intermediate data in memory vs CPU registers
- Loop unrolling and SIMD
---