rfwtools.example.Example

class rfwtools.example.Example(zone, dt, cavity_label, fault_label, cavity_conf, fault_conf, label_source, data_dir=None)[source]

Bases: IExample

A class representing a (SME) labeled fault event. Manages data access and can download missing data.

Harvester fault data typically occupies ~10 MB of memory/disk space, and collections of these events often number in the thousands. Holding all of this event data in memory is typically not an option. Additionally, the data is found in multiple files organized in a directory tree, saved as a single tar.gz file, or is downloadable from web services. This class provides methods for easing access of this data and quickly loading/unloading it from memory.

If the data is not found in the specified location, then it will attempt to download the data and create the appropriate directory structure. This functionality typically requires access to JLab’s internal networks (e.g., VPN).

Expected data structure is <data_dir>/<zone>/<date>/<timestamp>/<capture files>. Alternatively, event data may be compressed at the <timestamp> directory level, i.e. <timestamp>.tar.gz.

Attributes:

zone: A string identifying the zone in CED format (e.g., 1L21) dt: datetime object matching the local time of the fault event cavity_label: a string label specifying the cavity that caused fault (typically “0”, “1”, …, “8”) fault_label: a string label specifying the type of fault that occurred (ExampleSet has a list of “known” labels”) cavity_conf: A floating point number in [0, 1] representing the probability/confidence placed in the cavity label fault_conf: A floating point number in [0, 1] representing the probability/confidence placed in the fault label label_source: The source of the labels. Typically, either label files or the output of a model data_dir: string defining filesystem path under which data can be found. If None, Config().data_dir is used.

__init__(zone, dt, cavity_label, fault_label, cavity_conf, fault_conf, label_source, data_dir=None)[source]

Construct an instance of the Example class.

Methods

__init__(zone, dt, cavity_label, ...[, data_dir])

Construct an instance of the Example class.

capture_files_on_disk([compressed])

Checks if captures files are currently saved to disk.

convert_waveform_column_names(columns)

Turns waveform PV names (R1M1WFSGMES) into more uniform name based on cavity and waveform (1_GMES)

get_capture_file_list()

Creates a list of capture file names.

get_event_path([compressed])

Generates the expected location for uncompressed event waveform data.

get_example_type()

Get this Example's ExampleType.

has_matching_labels(example)

Check if the supplied example has the same cavity and fault type label.

is_capture_file(filename)

Validates if filename appears to be a valid capture file.

load_data([verbose])

Top-level method for loading data associated with Example instance.

parse_capture_file(file)

Parses an individual capture file into a Pandas DataFrame object.

parse_event_dir(event_path[, compressed])

Parses the capture files in the BaseModel's event_dir and sets event_df to the appropriate pandas DataFrame.

plot_waveforms([signals, downsample])

Plot the waveform data associated with this example.

remove_event_df_from_disk()

Deletes the 'cached' event waveform data for this event from disk.

save_event_df_to_disk(event_df)

This method is saves the event waveform DataFrame to disk.

to_string()

This provides a more descriptive string than __str__.

unload_data([verbose])

Top-level method for deleting the Examples data (event_df) from memory.

Attributes

capture_file_regex

A regex for matching capture file filenames

cavity_label

Expert/model provided cavity label

fault_label

Expert/model provided fault label

cavity_conf

Cavity label confidence

fault_conf

Fault label confidence

label_source

Source of labeles (which model, file, etc.)

event_datetime

When did the event occur

event_zone

Which zone had the event

e_type

The type of example this is.

capture_file_regex = re.compile('R.*harv\\..*\\.txt')

A regex for matching capture file filenames

Type:

(re.Pattern)

capture_files_on_disk(compressed=False)[source]

Checks if captures files are currently saved to disk.

Parameters:

compressed (bool) – Are we checking for compressed file (True), or uncompressed (False, default)?

Return type:

bool

Returns:

True if the compressed file or regular directors were found.

cavity_conf

Cavity label confidence

Type:

(float)

cavity_label

Expert/model provided cavity label

Type:

(str)

static convert_waveform_column_names(columns)[source]

Turns waveform PV names (R1M1WFSGMES) into more uniform name based on cavity and waveform (1_GMES)

Parameters:

columns (List[str]) – List of waveform columns from a single zone, i.e., a list of event waveform names to convert.

Return type:

List[str]

Returns:

The updated/standardized column names sans zone identifier.

data_dir

The directory where the waveform data can be found. None if Config is to be referenced.

Type:

(str)

e_type

The type of example this is.

Type:

(ExampleType)

event_datetime

When did the event occur

Type:

(datetime.datetime)

event_df

The DataFrame for holding the actual waveform data

Type:

(pd.DataFrame)

event_zone

Which zone had the event

Type:

(str)

fault_conf

Fault label confidence

Type:

(float)

fault_label

Expert/model provided fault label

Type:

(str)

get_capture_file_list()[source]

Creates a list of capture file names. Typically, this has eight file names.

This replaced a method that reads in file contents and returned a dictionary of names to content. The only internal use case was getting the list of file names, so it was replaced with the simpler method.

Return type:

List[str]

Returns:

A list of capture file names for the Example.

get_event_path(compressed=False)[source]

Generates the expected location for uncompressed event waveform data.

Parameters:

compressed (bool) – Should the returned path be for a compressed (tgz) event

Return type:

str

Returns:

The expected path to uncompressed directory of waveform data.

get_example_type()

Get this Example’s ExampleType.

Return type:

ExampleType

Returns:

The Enum corresponding to the class type

has_matching_labels(example)[source]

Check if the supplied example has the same cavity and fault type label.

Parameters:

example (Example) – A Example object to compare labels against

Return type:

bool

Returns:

True if both cavity and fault labels match. False otherwise.

static is_capture_file(filename)[source]

Validates if filename appears to be a valid capture file.

Parameters:

filename (str) – The name of the file that is to be validated

Returns:

True if the filename appears to be a valid capture file. Otherwise False.

Return type:

bool

label_source

Source of labeles (which model, file, etc.)

Type:

(str)

load_data(verbose=False)[source]

Top-level method for loading data associated with Example instance.

Some early waveforms were saved with the Time column essentially inverted. This method checks for and fixes that problem by means a simple criteria. If Time[0] > -1000, then the Time array is flipped.

Parameters:

verbose (bool) – Should extra information be printed to STDOUT

Return type:

None

static parse_capture_file(file)[source]

Parses an individual capture file into a Pandas DataFrame object.

Reads all data in as float64 dtypes because a column of all integers will default to integers (e.g., all zeroes) :rtype: DataFrame

Args:

file (file): A file like object. Either the string of the filename or a file_like_object

Returns:

DataFrame: A pandas DataFrame containing the data from the specified capture file

static parse_event_dir(event_path, compressed=False)[source]

Parses the capture files in the BaseModel’s event_dir and sets event_df to the appropriate pandas DataFrame.

The waveform names are converted from <EPICS_NAME><Waveform> (e.g., R123WFSGMES), to <Cavity_Number>_<Waveform> (e.g., 3_GMES). This allows analysis code to more easily handle waveforms from different zones.

Parameters:
  • event_path (str) – The path to the event directory or compressed tar.gz file

  • compressed (bool) – Is the data a compressed tar.gz file or a regular directory

Raises:

ValueError – if a column name is discovered with an unexpected format

Return type:

None

plot_waveforms(signals=None, downsample=32)[source]

Plot the waveform data associated with this example. Optionally down sample the signals.

Parameters:
  • signals (Optional[List[str]]) – A list of signal names to plot, e.g. ‘1_GMES’. If None, then GMES, DETA2, GASK, CRFP, and PMES will be plotted for all cavities

  • downsample (int) – The down sampling factor, i.e., keep every <downsample>-th point. By default keep every 32nd point

Return type:

None

remove_event_df_from_disk()[source]

Deletes the ‘cached’ event waveform data for this event from disk. Both compressed and uncompressed data.

Return type:

None

save_event_df_to_disk(event_df)[source]

This method is saves the event waveform DataFrame to disk. Can provide faster access to ‘raw’ data later.

If capture files already exist, it won’t try to overwrite them. Does nothing if event_path is None. Note that every capture file will end up with the same timestamp as self.event_datetime.

Parameters:

event_df (DataFrame) – The DataFrame for which we should create a fault event directory of capture files.

Return type:

None

to_string()[source]

This provides a more descriptive string than __str__.

Return type:

str

Returns:

A string representation of the example including zone, time, label info, and label source.

unload_data(verbose=False)[source]

Top-level method for deleting the Examples data (event_df) from memory.

Parameters:

verbose (bool) – Should extra information be printed to STDOUT

Return type:

None