bamboost
A python data framework for managing scientific simulation data. All your simulation data is organized, indexed, and easily accessible.
Highlights
- Zero setup. Use a collection; add simulations with input parameters; store your data; retrieve it.
- Fits any existing workflow. You don't need to set up everything around bamboost. You can use it your way.
- Immediate access to your experimental design. All
simulations, with their input parameters and metadata.
name created_at description tags status submitted E L N amplitude contrast eps kappa maxiter nu r_soft asperity_width 0 a7428463ab 2025-07-17 13:31:22.507583 contact area [] initialized False 1 1 127 0.05 0.01 0.03 100 20 0.3 0.4 0.25 1 fc5add3cb1 2025-07-17 12:19:23.840253 contact area [] finished False 1 1 127 0.05 0.01 0.03 100 20 0.3 0.4 0.25 2 a3249ecfcf 2025-07-17 12:01:20.553065 contact area [] finished False 1 1 127 0.05 0.01 0.03 100 20 0.3 0.4 0.50 3 68689cd6aa 2025-07-17 11:16:02.597160 contact area [] finished False 1 1 127 0.05 0.01 0.03 100 20 0.3 0.4 1.00 4 99c4a73da7 2025-07-17 10:44:28.160353 line search [] finished False 1 1 200 0.05 0.01 0.03 200 20 0.3 0.4 NaN 5 0776a10b6f 2025-06-19 11:44:26.134032 line search [] finished False 1 1 200 0.05 0.01 0.03 200 20 0.3 0.4 NaN - Everything indexed. All your collections,
simulations, parameters, links are findable. Accessible through their unique ID, from
anywhere.
UID Path Aliases Tags 0 315628DE80 /home/florez/work/code/bamboost-docs/content/data/getting-started 1 C8C4A62EB9 /home/florez/work/code/bamboost-docs/content/data/example 2 C818107BA4 /home/florez/work/code/bamboost-docs/content/docs/simulation/data docs 3 70393345C1 /home/florez/work/projects/25-fft-jax/data-sine - Straightforward API to write and read. Automatic HDF5 file handling. Store your data in a structured way, and retrieve it with ease.
Introduction
bamboost is a python framework for managing scientific simulation data.
It provides an organized model for storing, indexing, and retrieving
simulation results, making it easier to work with large-scale computational
studies.
In its core, it is a filesystem storage model, providing directories for
simulations, bundled in collections.
However, with the benefits of a database.
bamboost knows two entities; Collection and Simulation. Self-similar simulations are bundled in collections.
Principles
- Independence: Any dataset must be complete and understandable on it's own. You can copy or extract any of your data and distribute it without external dependencies.
- Path redundancy: Data must be referencable without knowledge of it's path. This serves several purposes: You can share your data easily ( supplementary material for papers), and renaming directories, moving files, switching computer, etc. will not break your data referencing.
This leads to the following requirements:
- Simulation parameters must be stored locally, inside the simulation directory. Crucially, not exclusively in a global database of any kind.
- Collections must have unique identifiers that are independent of its path.
- Simulations must have unique identifiers that are independent of its path.
Concept
We organize simulations in collections within structured directories. Let's consider the following directory:
This is a valid bamboost collection at the path ./test_data. It contains an
identifier file giving this collection a unique identifier. In this case, it is
ABCD1234.
This file defines the unique ID of the collection.
It contains two entries; simulation_1 and simulation_2.
As you can see, each simulation owns a directory inside a collection.
The directory names are simultaneously used as their name as well as their ID.
The unique identifier for a single simulation becomes the combination of the collection ID that it belongs to and the simulation ID.
That means, the full identifier of simulation_1 is ABCD1234:simulation_1.
Each simulation contains a central HDF5 file named data.h5. This file is used to store the parameters, as well as generated data.
The simulation API of bamboost provides extensive functionality to store and retrieve data from this file. However, users are not limited to this file, or using python in general.
The reason why simulations are directories instead of just a single HDF file is that you can dump any file that belongs to this simulation into its path. This can be output from 3rd party software (think LAMMPS), additional input files such as images, and also scripts to reproduce the generated data.
Indexing
Given the above described model, querying data is feasible but expensive.
If a user requires all simulations in collection ABCD1234 where param1 = 73 (just an example parameter), the following must happen:
- Search the file system for the identifier file
.bamboost-collection-ABCD1234 - Iterate through all subdirectories of the collection and gather the simulation's parameters from
data.h5 - Filter the simulations with
param1 = 73
To improve the experience, we cache collections, simulations, and their parameters in a global sqlite database.
Crucially, this is an important but nonetheless only a convenience feature! Corruption, deletion or absence of the database has no consequences on the integrity of the data.
In fact, the cache will be automatically rebuilt accordingly. I mean, yep it's a cache.
Features
The core functionality of bamboost can be split into two parts:
- Structured file-based data model with a database-like experience
- A python hdf5 interface to easily store and retrieve data of simulations
Additionally, bamboost offers the following:
- Manage simulation workflow, i.e. the creation and submission of simulations on HPC clusters
- MPI-parallel writes for large-scale simulations
- A command-line interface for index and collection management
- A Terminal User Interface (TUI) to browse your data
- Extensibility via the plugin system, e.g. solver-specific writers
Installation
We recommended to use
uvfor python projects. Runuv add bamboostto add the dependency to your project.
To install Bamboost, use pip:
pip install bamboostTo install the latest development version from GitHub:
pip install git+https://github.com/smec-ethz/bamboost.gitTo install the latest version of the bamboost TUI
pip install git+https://github.com/zrlf/bamboost-tuiDependencies
| Dependency | Purpose |
|---|---|
| Python 3.10+ | Yes. |
| HDF5 | Data storage format for simulation results and parameters. |
| MPI (optional) | Enables parallel I/O capabilities for large-scale simulations. |
| SQLite | Local database for caching collection and simulation metadata. |
Next Steps
- Read the Getting Started guide.
- Learn about Managing Collections.
- Explore Simulation Handling.
- Understand the HDF5 data model: File Handler, Objects, and Series.
- Use the CLI for index management from the terminal.
- Set up Configuration for your project.
Bamboost