Welcome to rfwtools’s documentation!
This package aims to provide standardized and easy usage of the C100’s harvester RF fault waveform data.
Github Page: https://github.com/JeffersonLab/rfwtools
Contents:
Indices and tables
Usage Examples:
Here are a couple of different workflows supported by the package.
Initial Setup
Start by saving this data in my-sample-labels.txt in the Config().label_dir directory (defaults to ./data/labels/). THESE FIELDS SHOULD BE TAB SEPARATED. DOCUMENTATION SYSTEM INSISTS ON CONVERTING THEM TO SPACES. PLEASE FIX IF YOU TRY THIS EXAMPLE ON YOUR OWN
zone cavity cav# fault time
1L25 4 44 Microphonics 2020/03/10 01:08:41
2L24 5 77 Controls Fault 2020/03/10 01:42:03
1L25 5 45 Microphonics 2020/03/10 02:50:07
2L26 8 96 E_Quench 2020/03/10 02:58:13
1L25 5 45 Microphonics 2020/03/10 04:55:21
1L22 4 20 Quench_3ms 2020/03/10 05:06:13
1L25 5 45 Microphonics 2020/03/10 07:35:32
2L22 0 57 Multi Cav turn off 2020/03/10 07:59:49
2L23 0 65 Multi Cav turn off 2020/03/10 07:59:56
2L24 0 73 Multi Cav turn off 2020/03/10 08:00:03
You may also need to create a configuration file or update the configuration in code. Place a file called rfwtools.cfg in your current directory with the following information. Other options are available.:
data_dir: /path/to/parent_dir/of/zone_dirs
label_dir: /path/to/dir/containing/label_files
output_dir: /path/to/where/save/files/live
Alternatively, do this in code:
from rfwtools.config import Config
Config().data_dir = "/path/to/parent_dir/of/zone_dirs"
Config().label_dir = "/path/to/dir/containing/label_files"
Config().output_dir = "/path/to/where/save/files/live"
Workflow Using DataSet and “regular” Examples
Here we use the tiny my-sample-labels.txt setup above. Much the workflow goes through the DataSet object. Since the Example class is the default for the DataSet, you don’t need to specify much.
from rfwtools.data_set import DataSet
from rfwtools.extractor.autoregressive import autoregressive_extractor
# Create a DataSet. For demo-purposes, I would make a small label file and run through. This can take hours/days to
# process all of our data
ds = DataSet(label_files=['my-sample-labels.txt'])
# This will process the label files you have and create an ExampleSet under ds.example_set
ds.produce_example_set()
# Save a CSV of the examples.
ds.save_example_set_csv("my_example_set.csv")
# Show data from label sources, color by fault_label
ds.example_set.display_frequency_barplot(x='label_source', color_by="fault_label")
# Show heatmaps for 1L22-1L26
ds.example_set.display_zone_label_heatmap(zones=['1L22', '1L23', '1L24', '1L25', '1L26'])
# Generate autoregressive features for this data set. This can take a while - e.g. a few seconds per example.
ds.produce_feature_set(autoregressive_extractor)
# Save the feature_set to a CSV
ds.save_feature_set_csv("my_feature_set.csv")
# Do dimensionality reduction
ds.feature_set.do_pca_reduction(n_components=10)
# Plot out some different aspects
# Color by fault, marker style by cavity
ds.feature_set.display_2d_scatterplot(hue="fault_label", style="cavity_label")
# Color by zone, marker style by cavity, only microphonics faults
ds.feature_set.display_2d_scatterplot(hue="zone", style="cavity_label", query="fault_label == 'Microphonics'")
Workflow Using DataSet and WindowedExamples
This is very similar to the above, except that now we are using WindowedExamples and everything that comes with it. Please note that this is one of two ways to get “windowed” data. The other is to use the rfwtools.extractor.window_extractor method (see rfwtools.extractor.windowing for details).
from rfwtools.data_set import DataSet
from rfwtools.extractor.autoregressive import autoregressive_extractor
from rfwtools.example import ExampleType
from rfwtools.example_validator import WindowedExampleValidator
# This tells the DataSet that you will want to work with WindowedExamples
e_type = ExampleType.WINDOWED_EXAMPLE
# These parameters will be passed to the Example objects upon construction, e.g., all example will have the same
# window. Here we assume 0.2ms sample steps, and we want windows of 100ms, so 100*(1/0.2) = 500.
e_kw = {"start": -1536, "n_samples": 500}
# The WindowedExample class works slightly differently so it needs a different validator. This makes sure that the
# each example has all of the characteristics we want (sample step size, number of capture files, etc.).
ev = WindowedExampleValidator()
# Create a DataSet. For demo-purposes, I would make a small label file and run through. This can take hours/days to
# process all of our data.
ds = DataSet(label_files=['my-sample-labels.txt'], e_type=e_type, example_validator=ev, example_kwargs=e_kw)
# This will process the label files you have and create an ExampleSet under ds.example_set
ds.produce_example_set()
# From here on it's the same
...
Workflow Without Using a DataSet:
There may be times when using a DataSet is cumbersome. A DataSet is really useful for generating an ExampleSet and/or FeatureSet, but their is no need to use one if you already have saved files ready to load.
Here we load a file and add a day of week to the ExampleSet:
from rfwtools.example_set import ExampleSet
es = ExampleSet()
es.load_csv("my_example_set.csv")
df = es.get_example_df()
df['my_feature'] = df.dtime.dt.day_name()
es.update_example_set(df)
Here we determine bypassed cavity information for a FeatureSet:
from rfwtools.feature_set import FeatureSet
from rfwtools.example import Example
import pandas as pd
# This method determines if a cavity was producing gradient above a threshold. It not, it is considered bypassed.
def bypassed_cavity_extractor(example: Example, threshold: float = 0.5) -> pd.DataFrame:
example.load_data()
df = example.event_df
example.unload_data()
out = pd.DataFrame(
{'has_bypassed': [False], 'num_bypassed': [0], 'c1_bypassed': [False], 'c2_bypassed': [False], 'c3_bypassed': [False],
'c4_bypassed': [False], 'c5_bypassed': [False], 'c6_bypassed': [False], 'c7_bypassed': [False],
'c8_bypassed': [False]
})
for cav in range(1,9):
if df[f"{cav}_GMES"].max() < threshold:
out.has_bypassed = True
out.num_bypassed += 1
out[f"c{cav}_bypassed"] = True
return out
# Load up the FeatureSet
fs = FeatureSet()
fs.load_csv("my_feature_set.csv")
# Add the bypassed column data to the DataFrame
df = fs.get_example_df()
bypassed_df = pd.concat(df['example'].apply(bypassed_cavity_extractor).values, ignore_index=True)
df = pd.concat([df, bypassed_df], axis=1)
# Update the FeatureSet
new_cols = ['has_bypassed', 'num_bypassed', 'c1_bypassed', 'c2_bypassed', 'c3_bypassed', 'c4_bypassed',
'c5_bypassed', 'c6_bypassed', 'c7_bypassed', 'c8_bypassed']
m_cols = fs.metadata_columns + new_cols
fs.update_example_set(df, metadata_columns=m_cols)
fs.do_pca_reduction()
fs.display_2d_scatterplot(style='zone', hue='num_bypassed')