Prepare merge jobs by scanning directories for ROOT files. More...
Public Member Functions | |
| __init__ (self, parent_dir, output_prefix="merge_jobs", max_files_per_job=20, file_pattern="*.root", run_pattern="hps_*") | |
| Initialize the merge job preparation. | |
| scan_directories (self) | |
| Scan parent directory for run directories and ROOT files. | |
| create_batches (self, run_files) | |
| Create batches of files for merge jobs. | |
| write_input_file_lists (self, batches, single_file=False) | |
| Write input file lists for each batch. | |
| write_batch_metadata (self, batches, output_file=None) | |
| Write batch metadata to a JSON file. | |
| generate_iteration_vars (self, batches, output_file=None) | |
| Generate iteration variables JSON for hps-mc-job-template. | |
| run (self, write_vars=True, write_metadata=True, separate_lists=True) | |
| Run the full preparation workflow. | |
Public Attributes | |
| parent_dir | |
| output_prefix | |
| max_files_per_job | |
| file_pattern | |
| run_pattern | |
Prepare merge jobs by scanning directories for ROOT files.
Definition at line 17 of file prepare_merge_jobs.py.
| __init__ | ( | self, | |
| parent_dir, | |||
output_prefix = "merge_jobs", |
|||
max_files_per_job = 20, |
|||
file_pattern = "*.root", |
|||
run_pattern = "hps_*" |
|||
| ) |
Initialize the merge job preparation.
| parent_dir | Parent directory containing run subdirectories |
| output_prefix | Prefix for output file lists and job configs |
| max_files_per_job | Maximum number of ROOT files per merge job |
| file_pattern | Glob pattern for files to merge (default: *.root) |
| run_pattern | Glob pattern for run directories (default: hps_*) |
Definition at line 20 of file prepare_merge_jobs.py.
| create_batches | ( | self, | |
| run_files | |||
| ) |
Create batches of files for merge jobs.
If a run has more than max_files_per_job files, it will be split into multiple batches.
| run_files | Dictionary mapping run names to lists of file paths |
Definition at line 69 of file prepare_merge_jobs.py.
| generate_iteration_vars | ( | self, | |
| batches, | |||
output_file = None |
|||
| ) |
Generate iteration variables JSON for hps-mc-job-template.
Since the template system creates Cartesian products of iteration variables, we create a single "batch_index" variable that can be used to index into the batch metadata.
Note: For merge jobs, it's often simpler to NOT use iteration variables and instead use the -r (repeat) option with file path parsing in templates.
| batches | List of batch dictionaries |
| output_file | Path to output file (default: {output_prefix}_vars.json) |
Definition at line 155 of file prepare_merge_jobs.py.
| run | ( | self, | |
write_vars = True, |
|||
write_metadata = True, |
|||
separate_lists = True |
|||
| ) |
Run the full preparation workflow.
| write_vars | Write iteration variables JSON file |
| write_metadata | Write batch metadata JSON file |
| separate_lists | Write separate input file list per batch |
Definition at line 186 of file prepare_merge_jobs.py.
| scan_directories | ( | self | ) |
Scan parent directory for run directories and ROOT files.
Definition at line 39 of file prepare_merge_jobs.py.
| write_batch_metadata | ( | self, | |
| batches, | |||
output_file = None |
|||
| ) |
Write batch metadata to a JSON file.
This provides information about how files were grouped into batches, useful for generating appropriate output file names.
| batches | List of batch dictionaries |
| output_file | Path to output file (default: {output_prefix}_batches.json) |
Definition at line 135 of file prepare_merge_jobs.py.
| write_input_file_lists | ( | self, | |
| batches, | |||
single_file = False |
|||
| ) |
Write input file lists for each batch.
Creates either a single file list or separate files per batch.
| batches | List of batch dictionaries |
| single_file | If True, write all files to one list; if False, one list per batch |
Definition at line 100 of file prepare_merge_jobs.py.
| file_pattern |
Definition at line 33 of file prepare_merge_jobs.py.
| max_files_per_job |
Definition at line 32 of file prepare_merge_jobs.py.
| output_prefix |
Definition at line 31 of file prepare_merge_jobs.py.
| parent_dir |
Definition at line 30 of file prepare_merge_jobs.py.
| run_pattern |
Definition at line 34 of file prepare_merge_jobs.py.