HPS-MC
 
Loading...
Searching...
No Matches
MergeJobPreparation Class Reference

Prepare merge jobs by scanning directories for ROOT files. More...

Public Member Functions

 __init__ (self, parent_dir, output_prefix="merge_jobs", max_files_per_job=20, file_pattern="*.root", run_pattern="hps_*")
 Initialize the merge job preparation.
 
 scan_directories (self)
 Scan parent directory for run directories and ROOT files.
 
 create_batches (self, run_files)
 Create batches of files for merge jobs.
 
 write_input_file_lists (self, batches, single_file=False)
 Write input file lists for each batch.
 
 write_batch_metadata (self, batches, output_file=None)
 Write batch metadata to a JSON file.
 
 generate_iteration_vars (self, batches, output_file=None)
 Generate iteration variables JSON for hps-mc-job-template.
 
 run (self, write_vars=True, write_metadata=True, separate_lists=True)
 Run the full preparation workflow.
 

Public Attributes

 parent_dir
 
 output_prefix
 
 max_files_per_job
 
 file_pattern
 
 run_pattern
 

Detailed Description

Prepare merge jobs by scanning directories for ROOT files.

Definition at line 17 of file prepare_merge_jobs.py.

Constructor & Destructor Documentation

◆ __init__()

__init__ (   self,
  parent_dir,
  output_prefix = "merge_jobs",
  max_files_per_job = 20,
  file_pattern = "*.root",
  run_pattern = "hps_*" 
)

Initialize the merge job preparation.

Parameters
parent_dirParent directory containing run subdirectories
output_prefixPrefix for output file lists and job configs
max_files_per_jobMaximum number of ROOT files per merge job
file_patternGlob pattern for files to merge (default: *.root)
run_patternGlob pattern for run directories (default: hps_*)

Definition at line 20 of file prepare_merge_jobs.py.

Member Function Documentation

◆ create_batches()

create_batches (   self,
  run_files 
)

Create batches of files for merge jobs.

If a run has more than max_files_per_job files, it will be split into multiple batches.

Parameters
run_filesDictionary mapping run names to lists of file paths
Returns
List of batch dictionaries with metadata

Definition at line 69 of file prepare_merge_jobs.py.

◆ generate_iteration_vars()

generate_iteration_vars (   self,
  batches,
  output_file = None 
)

Generate iteration variables JSON for hps-mc-job-template.

Since the template system creates Cartesian products of iteration variables, we create a single "batch_index" variable that can be used to index into the batch metadata.

Note: For merge jobs, it's often simpler to NOT use iteration variables and instead use the -r (repeat) option with file path parsing in templates.

Parameters
batchesList of batch dictionaries
output_filePath to output file (default: {output_prefix}_vars.json)
Returns
Path to the written file

Definition at line 155 of file prepare_merge_jobs.py.

◆ run()

run (   self,
  write_vars = True,
  write_metadata = True,
  separate_lists = True 
)

Run the full preparation workflow.

Parameters
write_varsWrite iteration variables JSON file
write_metadataWrite batch metadata JSON file
separate_listsWrite separate input file list per batch
Returns
Dictionary with paths to generated files and batch info

Definition at line 186 of file prepare_merge_jobs.py.

◆ scan_directories()

scan_directories (   self)

Scan parent directory for run directories and ROOT files.

Returns
Dictionary mapping run names to lists of ROOT file paths

Definition at line 39 of file prepare_merge_jobs.py.

◆ write_batch_metadata()

write_batch_metadata (   self,
  batches,
  output_file = None 
)

Write batch metadata to a JSON file.

This provides information about how files were grouped into batches, useful for generating appropriate output file names.

Parameters
batchesList of batch dictionaries
output_filePath to output file (default: {output_prefix}_batches.json)
Returns
Path to the written file

Definition at line 135 of file prepare_merge_jobs.py.

◆ write_input_file_lists()

write_input_file_lists (   self,
  batches,
  single_file = False 
)

Write input file lists for each batch.

Creates either a single file list or separate files per batch.

Parameters
batchesList of batch dictionaries
single_fileIf True, write all files to one list; if False, one list per batch
Returns
List of file paths written or single file path if single_file=True

Definition at line 100 of file prepare_merge_jobs.py.

Member Data Documentation

◆ file_pattern

file_pattern

Definition at line 33 of file prepare_merge_jobs.py.

◆ max_files_per_job

max_files_per_job

Definition at line 32 of file prepare_merge_jobs.py.

◆ output_prefix

output_prefix

Definition at line 31 of file prepare_merge_jobs.py.

◆ parent_dir

parent_dir

Definition at line 30 of file prepare_merge_jobs.py.

◆ run_pattern

run_pattern

Definition at line 34 of file prepare_merge_jobs.py.


The documentation for this class was generated from the following file: