Collections

A collection is the central unit here. It offers an object that allows you to create, access and query its entries, from now on denoted as a Simulation.

Collections are implicitally created if they don’t exist yet. The first argument to Collection is a path. Let’s create our first collection at ../data/getting-started. This will create the directory and assign a unique ID to the new collection.

# Creating a new Collection
from bamboost import Collection

coll = Collection("../data/getting-started")
coll

BAMBOOST / 315628DE80

Database	/home/runner/work/bamboost-docs/bamboost-docs/content/docs/../data/getting-started
UID	315628DE80
Size	2

Once created, it is often easier and safer to use the uid to reference the collection.

Although you can create and access collections with their path, it is good practice to explicitly create them using the command line interface and then use their ID in your code.

bamboost-cli new ./data-foo
bamboost-cli list

coll = Collection(uid="315628DE80")
coll.df

	name	created_at	description	tags	status	submitted	links.baseline	Re	dt	mesh_cells	nu
0	mesh_reference	2026-04-11 12:48:07.021634		[]	initialized	False	NaN	NaN	NaN	NaN	NaN
1	kelvin-helmholtz	2026-04-11 12:48:07.006774		[]	initialized	False	315628DE80:mesh_reference	1000.0	0.005	256.0	0.001

If you are working in an ipython session (e.g. a jupyter notebook), use Collection.fromUID which will give you autocompletion for all of your existing (cached) collections.

coll = Collection.fromUID["315628DE80 - ost-docs/content/docs/data-foo"]

Simulations

Now that you have a collection, you can create simulations inside it. A simulation not only stores the output of your experiment, but simultaneously acts as its input file (assuming you run numerical experiments).

This means, the same entity is used in multiple steps of your workflow; experimental design, execution, postprocessing/analysis.

Experimental design

Creation

You have the intention of running a certain experiment with a specific set of input parameters, or input files, or anything. So we create the simulation with all the instructions it needs. Bundling all of this in a single place ensures reproducability. This most likely includes:

A dictionary of parameters
A script that produces the result for this simulation
A set of instructions on how to run the script

To create a new simulation, use add:

sim = coll.add(
    name="my-simulation",
    parameters={
        "param1": 73,
        "bar": [2, 3, 4, 5],
    },
)

Relevant files

Then, copy relevant files (or entire directories) into the simulation directory.

add includes a files argument to directly copy a list of files or directories.

sim.copy_files(["path/to/script.py", "img1.png", "path/to/some/directory"])

Run script

As a next step we can create a run script for the simulation. This is an auto-generated bash script with the purpose of providing a single access point to produce the results for this simulation.

create_run_script takes up to 3 arguments:

commands: an iterable of bash commands to run in sequence.
euler: a boolean flag. If set to true, then a slurm submission script is written instead of a pure bash script.
sbatch_kwargs: a dictionary of slurm job arguments.

sim.create_run_script(
    commands=["source .venv/bin/activate", "python3 script.py"], euler=False
)

Which will create the following file…

run.sh

#!/bin/bash

export SIMULATION_DIR=/absolute/path/to/collection/data/getting-started/my-simulation
export SIMULATION_ID=315628DE80:my-simulation

source .venv/bin/activate
python3 script.py

Notice that the script exports two variables; SIMULATION_DIR and SIMULATION_ID. These should be used in your executable script script.py to infer the simulation on which to execute the script.

Execution

After creation, a simulation is started by running it’s run script. Of course, you could manually run it, however, you can also start it from within python or using the CLI (not implemented yet).

If you use the bamboost methods to submit your job, and you have previously created a SLURM submission script instead of a pure bash script, your job is automatically submitted on the cluster.

You can also submit the simulation like this directly after you create it.

Submit simulation from python

sim.submit_simulation()

Submit simulation using CLI

bamboost-cli Collection-ID Simulation-Name

Postprocessing

After your job has finished, bamboost provides rich tools for reading and analysing results.

bamboost is optimized for interactive use in jupyter notebooks. It offers autocompletion for most of its objects, such as collections and simulations, but also objects and data that is part of your simulations.

Collection

Initialize the collection as seen above, preferably using its ID. coll.df returns a pandas DataFrame of the collection.

coll = Collection.fromUID["315628DE80"]
coll.df

	name	created_at	tags	status	submitted	links.baseline	Re	dt	mesh_cells	nu	bar	param1
0	my-simulation	2026-04-11 12:48:13.635137	[]	initialized	False	NaN	NaN	NaN	NaN	NaN	[2, 3, 4, 5]	73.0
1	mesh_reference	2026-04-11 12:48:07.021634	[]	initialized	False	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	kelvin-helmholtz	2026-04-11 12:48:07.006774	[]	initialized	False	315628DE80:mesh_reference	1000.0	0.005	256.0	0.001	NaN	NaN

Filter simulations based on parameters using the built-in filtering interface:

filtered = coll.filter(
    coll.k["param1"] > 50,
    coll.k["bar"] <= 4,
)
filtered.df

See the Collection guide for full filtering syntax.

Simulation

To get a Simulation object, use brackets (it will autocomplete in notebooks).

sim = coll["my-simulation"]
sim

my-simulation

initialized

Not submitted

time stamp: 2026-04-11 12:48:13.635137

Files

⭗ my-simulation
├── run.sh
└── data.h5

0 directories, 2 files

Parameters

Parameter	Value
bar	[2 3 4 5]
param1	73

Key properties of a Simulation object:

Property	Description
`sim.parameters`	Dictionary of input parameters
`sim.metadata`	Metadata (description, status, created_at, …)
`sim.status`	Current lifecycle status
`sim.files`	File picker for files in the simulation directory
`sim.data`	Default time-series at `/data` in `data.h5`
`sim.root`	Root HDF5 group (`/`) in `data.h5`
`sim.links`	Links to other simulations

Reading results

The sim.data property returns the default Series:

sim.data.values          # array of time values
sim.data.get_field_names()   # list of stored fields

# Read a field across all steps
pressure = sim.data["pressure"][:]    # shape: (n_steps, *field_shape)

# Read global/scalar data as a DataFrame
sim.data.globals.df

See Writing data and Reading data for complete guides.

Next steps

Collection guide — filtering, parameter studies, and more
Simulation setup — run scripts and HPC submission
Writing data — storing results in data.h5
Reading data — post-processing your results
CLI reference — command-line tools for index management
Configuration — search paths, logging, and other options

Getting Started