rfwtools.example_validator.WindowedExampleValidator

class rfwtools.example_validator.WindowedExampleValidator(mya_deployment='ops')[source]

Bases: ExampleValidator

Checks that WindowedExamples meet our validation criteria. Similar to ExampleValidator

Major difference between parent class is that this has updated checks for waveform Time stamps being relative to the start of the window

__init__(mya_deployment='ops')[source]

Construct an instance.

Parameters:: mya_deployment (str) – Which mya deployment should be used to look up historical EPICS information

Methods

`__init__`([mya_deployment])	Construct an instance.
`set_example`(example)	Set internal information about the example to validate.
`validate_capture_file_counts`()	This method checks that we have exactly one capture file per cavity/IOC.
`validate_capture_file_waveforms`()	Checks that all of the required waveforms are present exactly one time across all capture files.
`validate_cavity_modes`([mode, offset, ...])	Checks that each cavity was in the appropriate control mode or is bypassed.
`validate_data`([deployment])	Check that the event directory and it's data is of the expected format.
`validate_waveform_times`([max_start, ...])	Verify the Time column of all capture files are identical and have a valid range and sample interval.
`validate_zones`()	This method ensures that the model does not make predictions on certain C100 zones, namely 0L04.

Attributes

`window_start`	The start of the Example's time window
`window_end`	The end of the Example's time window

event_capture_filenames

The raw capture file content (typically produced by the harvester daemon)

Type:: (dict of str
Type:: str)

event_datetime

The datetime of the fault.

Type:: (datetime)

event_df

The DataFrame of waveform signals

Type:: (pd.DataFrame)

event_df_exception

The exception generated by example_load() in set_example() or None

Type:: (Exception)

event_zone

The zone where the fault occurred

Type:: (str)

mya_deployment

The mya deployment that should be used when running this test

Type:: (str)

set_example(example)[source]

Set internal information about the example to validate.

Parameters:: example (WindowedExample) – The example that is to be validated.
Return type:: None

validate_capture_file_counts()

This method checks that we have exactly one capture file per cavity/IOC.

The harvester grouping logic coupled with unreliable IOC behavior seems to produce fault event directories where either an IOC has multiple capture files or are missing. We want to make sure we have exactly eight capture files - one per IOC. Raises an exception in the case that something is amiss.

Raises:: ValueError – if either missing or “duplicate” capture files are found.
Return type:: None

validate_capture_file_waveforms()

Checks that all of the required waveforms are present exactly one time across all capture files.

If event_df is None, then the capture files themselves are loaded. If event_df is not None, then the files are checked directly.

Raises:: ValueError – if any required waveform is repeated or missing
Return type:: None

validate_cavity_modes(mode=(4,), offset=-1.0, deployment=None, max_workers=6)

Checks that each cavity was in the appropriate control mode or is bypassed.

A request is made to the internal CEBAF myaweb myquery HTTP service at the specified offset from the event timestamp. Currently the proper mode is GDR (I/Q).

According to the RF low-level software developer (lahti), the proper PV for C100 IOCs is R<Linac><Zone><Cavity>CNTL2MODE which is a float treated like a bit word. At the time of writing, the most common modes are:

2 == SEL
4 == GDR (I/Q)

A single cavity may be bypassed by operations to alleviate performance problems. In the situation the rest of the zone is working normally and is considered to produce valid data for modeling purposes. Only the control modes of the non-bypassed cavities will be considered for invalidating the data.

Parameters:

mode (Union[Tuple[int, ...], int]) – A list of mode numbers associated with acceptable control modes.
offset (float) – The number of seconds before the fault event the mode setting should be checked.
deployment (Optional[str]) – The MYA archiver deployment used for querying historical PV values
max_workers (int) – This makes web-based MYA requests in parallel. This sizes the thread pool.

Raises:

ValueError – if any cavity mode does not match the value specified by the mode parameter.

Return type:

None

validate_data(deployment=None)[source]

Check that the event directory and it’s data is of the expected format.

This method inspects the event directory and raises an exception if a problem is found. The following aspects of the event directory and waveform data are validated.

Data can be found on disk
All eight cavities are represented by exactly one capture file
All of the required waveforms are represented exactly once
All of the capture files use the same timespan, have constant sampling intervals, and are within tolerances for their stated windows.
All of the cavity are in the appropriate control mode (GDR I/Q => 4) or bypassed

Parameters:: deployment (Optional[str]) – Which MYA deployment should be used when checking archiver data.
Raises:: ValueError – If a problem is found with the data.
Return type:: None

validate_waveform_times(max_start=-100.0, min_end=100.0, step_size=0.2, delta_max=0.02)

Verify the Time column of all capture files are identical and have a valid range and sample interval.

Note: The default 0.02 delta_max is chosen because the actual time step ranges from 0.18… to 0.21… when a time step of 0.2 is specified.

Parameters:

max_start (float) – The latest acceptable start time for the waveforms
min_end (float) – The earliest acceptable end time for the waveforms
step_size (float) – The expected step_size of each waveform in milliseconds
delta_max (float) – The maximum difference between the observed time steps and step_size in milliseconds.

Raises:

ValueError – if either Time columns mismatch or Time columns are beyond expected thresholds

Return type:

None

validate_zones()

This method ensures that the model does not make predictions on certain C100 zones, namely 0L04.

Raises:: ValueError – if the zone name is 0L04.
Return type:: None

window_end: The end of the Example’s time window

window_start: The start of the Example’s time window