rfwtools.example_validator.ExampleValidator
- class rfwtools.example_validator.ExampleValidator(mya_deployment='ops')[source]
Bases:
objectThis class provides functionality for checking that an individual example meets the criteria for validity.
Some checks are very basic, e.g., do we have all of the necessary data from the fault event. Others are a bit more nuanced, e.g., was the cavity in the proper RF mode. See validate_data and other validation methods for details.
Note
This class loads capture file data at set_example, but defers any exceptions from that process until validate_data() is called.
Methods
__init__([mya_deployment])Create an instance for validating Example.
set_example(example)Set internal information about the example to validate.
This method checks that we have exactly one capture file per cavity/IOC.
Checks that all of the required waveforms are present exactly one time across all capture files.
validate_cavity_modes([mode, offset, ...])Checks that each cavity was in the appropriate control mode or is bypassed.
validate_data([deployment])Check that the event directory and it's data is of the expected format.
validate_waveform_times([max_start, ...])Verify the Time column of all capture files are identical and have a valid range and sample interval.
This method ensures that the model does not make predictions on certain C100 zones, namely 0L04.
Attributes
The datetime of the fault.
The zone where the fault occurred
The DataFrame of waveform signals
The raw capture file content (typically produced by the harvester daemon)
The exception generated by example_load() in set_example() or None
The mya deployment that should be used when running this test
- event_capture_filenames
The raw capture file content (typically produced by the harvester daemon)
- Type:
(dict of str
- Type:
str)
- event_datetime
The datetime of the fault.
- Type:
(datetime)
- event_df
The DataFrame of waveform signals
- Type:
(pd.DataFrame)
- event_df_exception
The exception generated by example_load() in set_example() or None
- Type:
(Exception)
- event_zone
The zone where the fault occurred
- Type:
(str)
- mya_deployment
The mya deployment that should be used when running this test
- Type:
(str)
- set_example(example)[source]
Set internal information about the example to validate.
- Parameters:
example (
Example) – The example that is to be validated.- Return type:
None
- validate_capture_file_counts()[source]
This method checks that we have exactly one capture file per cavity/IOC.
The harvester grouping logic coupled with unreliable IOC behavior seems to produce fault event directories where either an IOC has multiple capture files or are missing. We want to make sure we have exactly eight capture files - one per IOC. Raises an exception in the case that something is amiss.
- Raises:
ValueError – if either missing or “duplicate” capture files are found.
- Return type:
None
- validate_capture_file_waveforms()[source]
Checks that all of the required waveforms are present exactly one time across all capture files.
If event_df is None, then the capture files themselves are loaded. If event_df is not None, then the files are checked directly.
- Raises:
ValueError – if any required waveform is repeated or missing
- Return type:
None
- validate_cavity_modes(mode=(4,), offset=-1.0, deployment=None, max_workers=6)[source]
Checks that each cavity was in the appropriate control mode or is bypassed.
A request is made to the internal CEBAF myaweb myquery HTTP service at the specified offset from the event timestamp. Currently the proper mode is GDR (I/Q).
According to the RF low-level software developer (lahti), the proper PV for C100 IOCs is R<Linac><Zone><Cavity>CNTL2MODE which is a float treated like a bit word. At the time of writing, the most common modes are:
2 == SEL
4 == GDR (I/Q)
A single cavity may be bypassed by operations to alleviate performance problems. In the situation the rest of the zone is working normally and is considered to produce valid data for modeling purposes. Only the control modes of the non-bypassed cavities will be considered for invalidating the data.
- Parameters:
mode (
Union[Tuple[int,...],int]) – A list of mode numbers associated with acceptable control modes.offset (
float) – The number of seconds before the fault event the mode setting should be checked.deployment (
Optional[str]) – The MYA archiver deployment used for querying historical PV valuesmax_workers (
int) – This makes web-based MYA requests in parallel. This sizes the thread pool.
- Raises:
ValueError – if any cavity mode does not match the value specified by the mode parameter.
- Return type:
None
- validate_data(deployment=None)[source]
Check that the event directory and it’s data is of the expected format.
This method inspects the event directory and raises an exception if a problem is found. The following aspects of the event directory and waveform data are validated.
Data can be found on disk
All eight cavities are represented by exactly one capture file
All of the required waveforms are represented exactly once
All of the capture files use the same timespan and have constant sampling intervals
All of the cavity are in the appropriate control mode (GDR I/Q => 4) or bypassed
- Parameters:
deployment (
Optional[str]) – Which MYA deployment should be used when checking archiver data.- Raises:
ValueError – If a problem is found with the data.
- Return type:
None
- validate_waveform_times(max_start=-100.0, min_end=100.0, step_size=0.2, delta_max=0.02)[source]
Verify the Time column of all capture files are identical and have a valid range and sample interval.
Note: The default 0.02 delta_max is chosen because the actual time step ranges from 0.18… to 0.21… when a time step of 0.2 is specified.
- Parameters:
max_start (
float) – The latest acceptable start time for the waveformsmin_end (
float) – The earliest acceptable end time for the waveformsstep_size (
float) – The expected step_size of each waveform in millisecondsdelta_max (
float) – The maximum difference between the observed time steps and step_size in milliseconds.
- Raises:
ValueError – if either Time columns mismatch or Time columns are beyond expected thresholds
- Return type:
None