The Apache Software Foundation, the all-volunteer developers, stewards, and incubators of more than 320 active open source projects and initiatives, today announced Apache DataFusion™ is now a Top-Level Project. DataFusion is a fast, extensible query engine for building high-quality data-centric systems in Rust, using the Apache Arrow in-memory format. To download the latest release of DataFusion.
DataFusion aims to be the query engine of choice for new, fast, data-centric systems such as databases, dataframe libraries, machine learning, and streaming applications by leveraging the unique features of Apache Arrow and Rust. By using DataFusion, projects can focus on developing specific features and avoid reimplementing standard features such as an expression representation, standard optimizations, parallelized streaming execution plans, file format support, etc.
DataFusion can be used without modification as an embedded SQL engine or can be customized and used as a foundation for building new systems. It is used for systems focused on analytic (high throughput), streaming and transaction (low latency) workloads such as:
- Specialized analytical database systems such as Apache HoraeDB
- New query language engines such as prql-query and accelerators such as VegaFusion
- Research platforms for new database systems, such as opt-d
- Streaming data platforms such as Synnada
- SQL support for another library, such as dask-sql
- Tools for reading / sorting / transcoding files such as qv
- Apache Spark runtime replacements such as Comet and Blaze
“Apache DataFusion has grown tremendously since its inception. What started as a modest project to provide a simple and efficient query engine has evolved into a robust, high-performance system that powers data-centric applications worldwide. This growth is a testament to the Apache Way,” said Andy Grove, Apache DataFusion PMC Member and original creator of DataFusion. “Becoming a Top-Level Project is a significant milestone, and I am excited to see how the project will continue to innovate and shape the future of data processing.”
SOURCE: GlobeNewswire