Bamboost

Getting Started

Collections

A collection is the central unit here. It offers an object that allows you to create, access and query its entries, from now on denoted as a Simulation.

Collections are implicitally created if they don’t exist yet. The first argument to Collection is a path. Let’s create our first collection at ../data/getting-started. This will create the directory and assign a unique ID to the new collection.

# Creating a new Collection
from bamboost import Collection

coll = Collection("../data/getting-started")
coll

BAMBOOST / 315628DE80

Once created, it is often easier and safer to use the uid to reference the collection.

Although you can create and access collections with their path, it is good practice to explicitly create them using the command line interface and then use their ID in your code.

bamboost-cli new ./data-foo
bamboost-cli list
coll = Collection(uid="315628DE80")
coll.df
name created_at description tags status submitted links.baseline Re dt mesh_cells nu
0 mesh_reference 2026-04-11 12:48:07.021634 [] initialized False NaN NaN NaN NaN NaN
1 kelvin-helmholtz 2026-04-11 12:48:07.006774 [] initialized False 315628DE80:mesh_reference 1000.0 0.005 256.0 0.001

If you are working in an ipython session (e.g. a jupyter notebook), use Collection.fromUID which will give you autocompletion for all of your existing (cached) collections.

coll = Collection.fromUID["315628DE80 - ost-docs/content/docs/data-foo"]

Simulations

Now that you have a collection, you can create simulations inside it. A simulation not only stores the output of your experiment, but simultaneously acts as its input file (assuming you run numerical experiments).

This means, the same entity is used in multiple steps of your workflow; experimental design, execution, postprocessing/analysis.

Experimental design

Creation

You have the intention of running a certain experiment with a specific set of input parameters, or input files, or anything. So we create the simulation with all the instructions it needs. Bundling all of this in a single place ensures reproducability. This most likely includes:

  • A dictionary of parameters
  • A script that produces the result for this simulation
  • A set of instructions on how to run the script

To create a new simulation, use add:

sim = coll.add(
    name="my-simulation",
    parameters={
        "param1": 73,
        "bar": [2, 3, 4, 5],
    },
)

Relevant files

Then, copy relevant files (or entire directories) into the simulation directory.

add includes a files argument to directly copy a list of files or directories.

sim.copy_files(["path/to/script.py", "img1.png", "path/to/some/directory"])

Run script

As a next step we can create a run script for the simulation. This is an auto-generated bash script with the purpose of providing a single access point to produce the results for this simulation.

create_run_script takes up to 3 arguments:

  • commands: an iterable of bash commands to run in sequence.
  • euler: a boolean flag. If set to true, then a slurm submission script is written instead of a pure bash script.
  • sbatch_kwargs: a dictionary of slurm job arguments.
sim.create_run_script(
    commands=["source .venv/bin/activate", "python3 script.py"], euler=False
)

Which will create the following file…

run.sh
#!/bin/bash

export SIMULATION_DIR=/absolute/path/to/collection/data/getting-started/my-simulation
export SIMULATION_ID=315628DE80:my-simulation

source .venv/bin/activate
python3 script.py

Notice that the script exports two variables; SIMULATION_DIR and SIMULATION_ID. These should be used in your executable script script.py to infer the simulation on which to execute the script.

Execution

After creation, a simulation is started by running it’s run script. Of course, you could manually run it, however, you can also start it from within python or using the CLI (not implemented yet).

If you use the bamboost methods to submit your job, and you have previously created a SLURM submission script instead of a pure bash script, your job is automatically submitted on the cluster.

You can also submit the simulation like this directly after you create it.

Submit simulation from python
sim.submit_simulation()
Submit simulation using CLI
bamboost-cli Collection-ID Simulation-Name

Postprocessing

After your job has finished, bamboost provides rich tools for reading and analysing results.

bamboost is optimized for interactive use in jupyter notebooks. It offers autocompletion for most of its objects, such as collections and simulations, but also objects and data that is part of your simulations.

Collection

Initialize the collection as seen above, preferably using its ID. coll.df returns a pandas DataFrame of the collection.

coll = Collection.fromUID["315628DE80"]
coll.df
name created_at description tags status submitted links.baseline Re dt mesh_cells nu bar param1
0 my-simulation 2026-04-11 12:48:13.635137 [] initialized False NaN NaN NaN NaN NaN [2, 3, 4, 5] 73.0
1 mesh_reference 2026-04-11 12:48:07.021634 [] initialized False NaN NaN NaN NaN NaN NaN NaN
2 kelvin-helmholtz 2026-04-11 12:48:07.006774 [] initialized False 315628DE80:mesh_reference 1000.0 0.005 256.0 0.001 NaN NaN

Filter simulations based on parameters using the built-in filtering interface:

filtered = coll.filter(
    coll.k["param1"] > 50,
    coll.k["bar"] <= 4,
)
filtered.df

See the Collection guide for full filtering syntax.

Simulation

To get a Simulation object, use brackets (it will autocomplete in notebooks).

sim = coll["my-simulation"]
sim
my-simulation
initialized
Not submitted
time stamp: 2026-04-11 12:48:13.635137
Files
⭗ my-simulation
├── run.sh
└── data.h5

0 directories, 2 files
Parameters
Parameter Value
bar [2 3 4 5]
param1 73

Key properties of a Simulation object:

PropertyDescription
sim.parametersDictionary of input parameters
sim.metadataMetadata (description, status, created_at, …)
sim.statusCurrent lifecycle status
sim.filesFile picker for files in the simulation directory
sim.dataDefault time-series at /data in data.h5
sim.rootRoot HDF5 group (/) in data.h5
sim.linksLinks to other simulations

Reading results

The sim.data property returns the default Series:

sim.data.values          # array of time values
sim.data.get_field_names()   # list of stored fields

# Read a field across all steps
pressure = sim.data["pressure"][:]    # shape: (n_steps, *field_shape)

# Read global/scalar data as a DataFrame
sim.data.globals.df

See Writing data and Reading data for complete guides.

Next steps

On this page