bamboost

A python data framework for managing scientific simulation data. All your simulation data is organized, indexed, and easily accessible.

Highlights

Zero setup. Use a collection; add simulations with input parameters; store your data; retrieve it.
Fits any existing workflow. You don't need to set up everything around bamboost. You can use it your way.

Immediate access to your experimental design. All simulations, with their input parameters and metadata.

	name	created_at	description	tags	status	submitted	E	L	N	amplitude	contrast	eps	kappa	maxiter	nu	r_soft	asperity_width
0	a7428463ab	2025-07-17 13:31:22.507583	contact area	[]	initialized	False	1	1	127	0.05	0.01	0.03	100	20	0.3	0.4	0.25
1	fc5add3cb1	2025-07-17 12:19:23.840253	contact area	[]	finished	False	1	1	127	0.05	0.01	0.03	100	20	0.3	0.4	0.25
2	a3249ecfcf	2025-07-17 12:01:20.553065	contact area	[]	finished	False	1	1	127	0.05	0.01	0.03	100	20	0.3	0.4	0.50
3	68689cd6aa	2025-07-17 11:16:02.597160	contact area	[]	finished	False	1	1	127	0.05	0.01	0.03	100	20	0.3	0.4	1.00
4	99c4a73da7	2025-07-17 10:44:28.160353	line search	[]	finished	False	1	1	200	0.05	0.01	0.03	200	20	0.3	0.4	NaN
5	0776a10b6f	2025-06-19 11:44:26.134032	line search	[]	finished	False	1	1	200	0.05	0.01	0.03	200	20	0.3	0.4	NaN

Everything indexed. All your collections, simulations, parameters, links are findable. Accessible through their unique ID, from anywhere.

   UID         Path                                                               Aliases  Tags
0  315628DE80  /home/florez/work/code/bamboost-docs/content/data/getting-started               
1  C8C4A62EB9  /home/florez/work/code/bamboost-docs/content/data/example                       
2  C818107BA4  /home/florez/work/code/bamboost-docs/content/docs/simulation/data  docs         
3  70393345C1  /home/florez/work/projects/25-fft-jax/data-sine

Straightforward API to write and read. Automatic HDF5 file handling. Store your data in a structured way, and retrieve it with ease.

bamboost is a python framework for managing scientific simulation data. It provides an organized model for storing, indexing, and retrieving simulation results, making it easier to work with large-scale computational studies. In its core, it is a filesystem storage model, providing directories for simulations, bundled in collections. However, with the benefits of a database.

bamboost knows two entities; Collection and Simulation. Self-similar simulations are bundled in collections.

Principles

Independence: Any dataset must be complete and understandable on it's own. You can copy or extract any of your data and distribute it without external dependencies.
Path redundancy: Data must be referencable without knowledge of it's path. This serves several purposes: You can share your data easily ( $e.g.$ supplementary material for papers), and renaming directories, moving files, switching computer, etc. will not break your data referencing.

This leads to the following requirements:

Simulation parameters must be stored locally, inside the simulation directory. Crucially, not exclusively in a global database of any kind.
Collections must have unique identifiers that are independent of its path.
Simulations must have unique identifiers that are independent of its path.

Concept

We organize simulations in collections within structured directories. Let's consider the following directory:

data.h5

data.xdmf

additional_file_1.txt

additional_file_2.csv

data.h5

additional_file_3.txt

.bamboost-collection-ABCD1234

This is a valid bamboost collection at the path ./test_data. It contains an identifier file giving this collection a unique identifier. In this case, it is ABCD1234. This file defines the unique ID of the collection.

It contains two entries; simulation_1 and simulation_2. As you can see, each simulation owns a directory inside a collection. The directory names are simultaneously used as their name as well as their ID. The unique identifier for a single simulation becomes the combination of the collection ID that it belongs to and the simulation ID. That means, the full identifier of simulation_1 is ABCD1234:simulation_1.

Each simulation contains a central HDF5 file named data.h5. This file is used to store the parameters, as well as generated data. The simulation API of bamboost provides extensive functionality to store and retrieve data from this file. However, users are not limited to this file, or using python in general. The reason why simulations are directories instead of just a single HDF file is that you can dump any file that belongs to this simulation into its path. This can be output from 3rd party software (think LAMMPS), additional input files such as images, and also scripts to reproduce the generated data.

Indexing

Given the above described model, querying data is feasible but expensive. If a user requires all simulations in collection ABCD1234 where param1 = 73 (just an example parameter), the following must happen:

Search the file system for the identifier file .bamboost-collection-ABCD1234
Iterate through all subdirectories of the collection and gather the simulation's parameters from data.h5
Filter the simulations with param1 = 73

To improve the experience, we cache collections, simulations, and their parameters in a global sqlite database. Crucially, this is an important but nonetheless only a convenience feature! Corruption, deletion or absence of the database has no consequences on the integrity of the data. In fact, the cache will be automatically rebuilt accordingly. I mean, yep it's a cache.

Features

The core functionality of bamboost can be split into two parts:

Structured file-based data model with a database-like experience
A python hdf5 interface to easily store and retrieve data of simulations

Additionally, bamboost offers the following:

Manage simulation workflow, i.e. the creation and submission of simulations on HPC clusters
MPI-parallel writes for large-scale simulations
A command-line interface for index and collection management
A Terminal User Interface (TUI) to browse your data
Extensibility via the plugin system, e.g. solver-specific writers

Installation

We recommended to use uv for python projects. Run uv add bamboost to add the dependency to your project.

To install Bamboost, use pip:

pip install bamboost

To install the latest development version from GitHub:

pip install git+https://github.com/smec-ethz/bamboost.git

To install the latest version of the bamboost TUI

pip install git+https://github.com/zrlf/bamboost-tui

Dependencies

Dependency	Purpose
Python 3.10+	Yes.
HDF5	Data storage format for simulation results and parameters.
MPI (optional)	Enables parallel I/O capabilities for large-scale simulations.
SQLite	Local database for caching collection and simulation metadata.

Next Steps

Read the Getting Started guide.
Learn about Managing Collections.
Explore Simulation Handling.
Understand the HDF5 data model: File Handler, Objects, and Series.
Use the CLI for index management from the terminal.
Set up Configuration for your project.