My Awesome Data Engineer
1. Databases
- Bytebase - Database devops - https://www.bytebase.com/
- DuckDB - SQL OLAP database management system - https://duckdb.org/
- Harlequin.sh - A drop-in replacement for the DuckDB CLI. - https://harlequin.sh/
- MotherDuck - A serverless data analytics with DuckerDB - https://motherduck.com/
2. Python
- RustPython (python intepreter written in Rust) > GoPython > PythonPy
3. Computation
- SparkMeasure - measure and monitor spark job - https://github.com/LucaCanali/sparkMeasure
4. Kubernetes
- EKS Distro - An open source Kubernetes distribution based on EKS.
- Karpenter - Just-in-time Nodes for Any Kubernetes Cluster - https://karpenter.sh/
- KubeCost - Monitor and Reduce Kubernetes Spend - https://www.kubecost.
- KCP - Kubernetes-like control plane https://github.com/kcp-dev/kcp
5. Visualisation
- BI tool + Markdown https://evidence.dev/blog
6. Tool
- Querybook - Pinterest’s open source Big Data IDE with Notebook interface - https://www.querybook.org/
7. LLM
- AnythingLLM - Local UI installation to setup RAG on your machine to work with LLM - https://useanything.com/
July 8, 2024 ∙
data-engineer