apis

package module

v0.1.6 Latest Latest Go to latest Published: Sep 10, 2025 License: Apache-2.0 Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/BaizeAI/dataset

Links

Open Source Insights

README ¶

Introduction

Dataset is a Kubernetes-native tool designed to simplify data management and sharing across AI/ML workflows. It leverages Persistent Volume Claims (PVCs) to preload datasets and models from public sources like Huggingface or S3 into local Kubernetes clusters. This eliminates the need for custom data loaders in individual workloads and ensures seamless data sharing across namespaces.

With Dataset, teams can efficiently manage and access data in multi-tenant environments while maintaining compatibility with any Kubernetes CSI driver. Its simplicity and ease of use make it an ideal choice for organizations looking to streamline AI/ML workflows without adding operational complexity.

Key Features

Preloaded Datasets: Load data from external sources into PVCs for immediate use in training and inference tasks.
Cross-Namespace Data Sharing: Securely share data across namespaces, overcoming the traditional limitations of PVCs.
Kubernetes-Native Design: Fully compatible with any Kubernetes CSI driver, avoiding reliance on external technologies like FUSE.
Cascading Deletion: Optional feature to automatically delete dependent datasets when source datasets are removed, ensuring data consistency.
Operational Simplicity: Designed for easy deployment and maintenance, with minimal overhead.

Benefits

Streamlined Workflows: Eliminates repetitive data-loading logic, allowing teams to focus on core AI/ML development.
Enhanced Collaboration: Enables secure, efficient data sharing in multi-tenant Kubernetes environments.
Data Consistency: Automatic cleanup of dependent resources prevents orphaned references and maintains data integrity.
Scalable and Reliable: Works seamlessly with Kubernetes-native resources, ensuring compatibility and stability.

Configuration

The Dataset controller supports configurable options through a YAML configuration file:

Cascading Deletion

When enabled, cascading deletion automatically removes reference datasets when their source dataset is deleted:

# Enable cascading deletion (default: false)
enable_cascading_deletion: true

Important: This feature should be used with caution as it will automatically delete datasets that reference the source dataset. Consider the impact on dependent workloads before enabling this feature.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

Directories ¶

Path	Synopsis
api
client
client/fake This package has the automatically generated fake clientset.	This package has the automatically generated fake clientset.
client/scheme This package contains the scheme of the automatically generated clientset.	This package contains the scheme of the automatically generated clientset.
client/typed/dataset/v1alpha1 This package has the automatically generated typed clients.	This package has the automatically generated typed clients.
client/typed/dataset/v1alpha1/fake Package fake has the automatically generated clients.	Package fake has the automatically generated clients.
dataset/v1alpha1 Package v1alpha1 contains API Schema definitions for the dataset v1alpha1 API group +kubebuilder:object:generate=true +groupName=dataset.baizeai.io	Package v1alpha1 contains API Schema definitions for the dataset v1alpha1 API group +kubebuilder:object:generate=true +groupName=dataset.baizeai.io
cmd
data-loader command
config
internal
cmd/dataloader
controller/dataset
pkg/constants
pkg/datasources
pkg/datasources/conda
pkg/datasources/huggingface
pkg/datasources/huggingface/fake Code generated by counterfeiter.	Code generated by counterfeiter.
pkg/datasources/modelscope
pkg/datasources/modelscope/fake Code generated by counterfeiter.	Code generated by counterfeiter.
pkg/datasources/pip
pkg
clients
kubeutils
log
utils

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL