Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,304 public repositories matching this topic...
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
Jun 2, 2024 - C++
DataPulse is a platform for developers to build, schedule and monitor data pipelines.
-
Updated
Jun 2, 2024 - JavaScript
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
-
Updated
Jun 2, 2024 - Go
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Jun 2, 2024 - Python
DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
-
Updated
Jun 2, 2024 - HCL
This project implements an end-to-end techstack for a data platform, can be used on production.
-
Updated
Jun 2, 2024 - Python
An opensource AI & model as a service platform.
-
Updated
Jun 2, 2024 - TypeScript
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
-
Updated
Jun 2, 2024 - TypeScript
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
Updated
Jun 1, 2024 - Python
Splittable Gzip codec for Hadoop
-
Updated
Jun 1, 2024 - Java
REST API for Apache Spark on K8S or YARN
-
Updated
Jun 1, 2024 - Java
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 417 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia