Wildcards and dependency dispatcher
The wildcards that are used in the pipeline are:
dataset- dataset IDsample- sample IDrun- SRR IDspecies- currently onlymmandhsare supported
There are usually several runs in a sample, and several samples in a dataset. There can be multi-species datasets, but sample typically comes from a single species.
This relationships are kept within the resulting file of
get_all_meta (see how we obtain meta information).
This many-to-one relationships means that in many cases we have to use
input functions
for snakemake rules.
Dependency dispatcher
To facilitate writing input functions we isolated most of the often used functionality into files
DependencyDispatcher.py and Classes.py.
Within Classes.py we define python classes (Run, Sample, Dataset) and implement serialization
of FFQ results (which are in JSON) into these classes.
And within DependencyDispatcher.py we implement many functions that we use as input functions,
or within input functions. Methods of an instance of DependencyDispatcher take wildcards
as an input and return requested properties of the dataset or the sample.
Example
DependencyDispatcher(config).get_all_datasets() # returns list of dataset IDs
DependencyDispatcher(config).get_sample_names(wildcards) # will search for `dataset` wildcard, and return sample names
DependencyDispatcher(config).get_species(wildcards) # will search for `sample` wildcard, and return species