bamboost.index.base
Indexing of bamboost collections and their simulations/parameters. SQLAlchemy is used to interact with the SQLite database.
The index is generated on the fly or can be explicitly created by scanning the
search_paths for collections. The index is stored as a SQLite database that stores the
path of collections (characterized with a unique UID), as well as the metadata and
parameters of all simulations.
The Index class provides the public API for interacting with the
index. This works in paralell execution, but the class is designed to execute any
operations on the database on the root process only. Methods that return something use
bcast to cast the result to all processes. Any SQL operation is executed only on the
root process!
Database schema:
collections: Contains information about the collections, namely uids and corresponding paths.simulations: Contains information about the simulations, including names, statuses, and links to the corresponding parameters.parameters: Contains the parameters associated with the simulations.
Attributes
- log=
BAMBOOST_LOGGER.getChild('Database') - P=
ParamSpec('P') - T=
TypeVar('T')
Functions
(func) -> Callable[Concatenate[Index, P], T]Decorator to add a session to the function signature.
Arguments:
- func:
typing.Callable[typing_extensions.Concatenate[Index, bamboost.index.base.P], bamboost.index.base.T]The function to decorate.
Classes
LazyDefaultIndex
(self) -> NoneAttributes:
- _instance=
None
(self, instance) -> NoneArguments:
- instance:
None
(self, instance, value) -> NoneArguments:
- instance:
None - value:
Index
(self, instance, owner) -> IndexArguments:
- instance:
None - owner:
type[Index]
Index
(self, sql_file=None, comm=None, *, search_paths=None) -> NoneAPI for indexing BAMBOOST collections and simulations.
Arguments:
- search_paths:
typing.Iterable[str | pathlib.Path] | None=None
Attributes:
- _comm=
Communicator() - _engine:
sqlalchemy.Engine - _sm:
typing.Callable[..., sqlalchemy.orm.Session] - _s:
sqlalchemy.orm.Session Paths to scan for collections.
A default index instance. Uses the default SQLite database file and search paths from the configuration.
- _file=
bamboost.index.base.Index(sql_file) or bamboost._config.config.bamboost._config.config.index.bamboost._config.config.index.databaseFileThe path to the SQLite database file.
- _isolated=
bamboost._config.config.bamboost._config.config.index.bamboost._config.config.index.isolatedWhether project based indexing is used.
- _url=
f'sqlite:///{bamboost.index.base.Index(self).bamboost.index.base.Index(self)._file}'The URL to the SQLite database file.
- all_collections:
list[bamboost.index.store.CollectionRecord]Return all collections in the index.
- all_simulations:
list[bamboost.index.store.SimulationRecord]Return all simulations in the index.
- all_parameters:
list[bamboost.index.store.ParameterRecord]Return all parameters in the index.
- all_links:
list[bamboost.index.store.LinkRecord]Return all simulation links in the index.
Usage
Create an instance of the Index class and use its methods to interact
with the index.
$ from bamboost.index import Index
$ index = Index()
Scan for collections in known paths: $ index.scan_for_collections()
Resolve the path of a collection: $ index.resolve_path()
Get a simulation from its collection and simulation name: $ index.get_simulation(, )
(self) -> Generator[Session, None, None]Context manager for a SQL transaction.
If no transaction is active, a new transaction is started. If a transaction is active, the current session is used.
Usage
>>> with index.sql_transaction() as s:
... s.execute(...)(self, *, search_paths=None) -> list[tuple[str, Path]]Scan known paths for collections and update the index.
Iterates through the search paths and searches files with the identifier file structure. If a collection is found, it is added to the cache.
Arguments:
- search_paths:
List[pathlib.Path]=NonePaths to scan for collections. Defaults to config.index.searchPaths.
(self) -> NoneCheck the integrity of the cache.
This method checks if the paths stored in the cache are valid. If a path is not valid, it is removed from the cache.
(self, uid, *, search_paths=None, return_uid=False)Resolve and return the path of a collection from its UID. Raises a
FileNotFoundError if the collection is not found in the search paths.
Arguments:
- uid:
strUID of the collection
- search_paths=
NonePaths to search for the collection
- return_uid=
FalseIf True, return a tuple of (path, uid). The uid is the resolved UID of the collection, which can be different from the input uid if the input uid is an alias.
(self, path) -> CollectionUIDResolve the UID of a collection from a path.
Arguments:
- path:
StrPathPath of the collection
Returns
CollectionUIDUID of the collection at the path.(self, uid, path=None, *, force_all=False, heal_links=True) -> NoneSync the table with the file system.
Iterates through the simulations in the collection and updates the metadata and parameters if the HDF5 file has been modified.
Arguments:
- uid:
strUID of the collection
- path:
Optional=NonePath of the collection
- force_all:
bool=FalseIf True, sync all simulations regardless of modification time.
- heal_links:
bool=TrueIf True, attempt to heal stale links in the collection after sync.
(self, uid) -> store.CollectionRecord | NoneReturn a collection from the index.
Arguments:
- uid:
strUID of the collection
(self, uid) -> store.SimulationRecord | NoneReturn a simulation from the index.
Arguments:
- uid:
str | SimulationUID
(self, uids) -> list[store.SimulationRecord]Return multiple simulations from the index.
Arguments:
- uids:
typing.Iterable[str | SimulationUID]Iterable of simulation UIDs
(self, uid) -> list[store.LinkRecord]Return all simulation links for a specific collection.
Arguments:
- uid:
strUID of the collection
(self, uid) -> dict[str, dict[str, SimulationUID]]Return a mapping of simulation names to their links for a specific collection.
Arguments:
- uid:
strUID of the collection
(self, collection_record, include_links=True) -> store.CollectionRecordInsert parameters of linked simulations into the simulations of a collection record.
Modifies the collection record in place and returns it.
Arguments:
- collection_record:
bamboost.index.store.CollectionRecordThe collection record to modify
- include_links:
typing.Iterable[str] | typing.Literal[True]=TrueIf provided, only include parameters of linked simulations with link names in this iterable. If True, include parameters of all linked simulations.
(self, uid) -> list[tuple[SimulationUID, str]]Return all simulations that link to the given simulation.
Arguments:
- uid:
str | SimulationUIDUID of the target simulation.
(self, uid) -> NoneDrop a collection from the cache.
Arguments:
- uid:
strUID of the collection
(self, collection_uid, simulation_name) -> NoneDrop a simulation from the cache.
Arguments:
- collection_uid:
strUID of the collection
- simulation_name:
strName of the simulation
(self, uid, path, metadata=None) -> NoneCache a collection in the index.
Arguments:
- uid:
strUID of the collection
- path:
pathlib.PathPath of the collection
- metadata:
typing.Mapping[str, typing.Any] | None=None
(self, collection_uid, simulation_name, parameters=None, metadata=None, links=None, *, collection_path=None) -> NoneCache a simulation from a collection.
Arguments:
- collection_uid:
strUID of the collection
- simulation_name:
strName of the simulation
- parameters:
typing.Mapping[typing.Any, typing.Any] | None=None - metadata:
typing.Mapping[typing.Any, typing.Any] | None=None - links:
typing.Mapping[typing.Any, typing.Any] | None=None - collection_path:
Optional=NonePath of the collection
(self, collection_uid, simulation_name, data) -> NoneUpdate the metadata of a simulation by passing it as a dict.
Arguments:
- collection_uid:
str - simulation_name:
str - data:
typing.MappingDictionary with new data
(self, collection_uid, simulation_name, links, *, raise_on_invalid_target=config.index.strictLinksWhenSyncing, scan_and_sync=True) -> NoneUpdate the links of a simulation.
Arguments:
- collection_uid:
strUID of the collection
- simulation_name:
strName of the simulation
- links:
typing.Mapping[str, typing.Any]Dictionary with new links
- raise_on_invalid_target:
bool=bamboost._config.config.bamboost._config.config.index.bamboost._config.config.index.strictLinksWhenSyncingIf True, raise an error if any of the target simulations do not exist in the index. If False, log a warning and skip the invalid links.
- scan_and_sync:
bool=TrueIf True, if a target simulation is not found in the index, scan for collections and sync the target collection before trying again. This can be helpful to find targets.
(self, collection_uid, simulation_name, parameters) -> NoneUpdate the parameters of a simulation by passing it as a dict.
Arguments:
- collection_uid:
str - simulation_name:
str - parameters:
typing.Mapping[str, typing.Any]Dictionary with new parameters
(self, collection_uid=None) -> NoneHeal stale links in the database by resolving target_uids in JSON to target_ids.
Arguments:
- collection_uid:
Optional=NoneUID of the collection to heal. If None, heal all collections.
(self, url) -> NoneArguments:
- url:
str
(self, uid) -> tuple[Path | None, str | None]Fetch the path and UID of a collection given its UID or an alias.
Arguments:
- uid:
str
(self, uid) -> Path | NoneArguments:
- uid:
str
(self, alias) -> str | NoneArguments:
- alias:
str
(self) -> list[store.CollectionRecord](self, collection_uid, simulation_name) -> store.SimulationRecord | NoneArguments:
- collection_uid:
CollectionUID | str - simulation_name:
str
Bamboost