.. rfwtools documentation master file, created by sphinx-quickstart on Mon Feb 22 16:23:19 2021. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. Welcome to rfwtools's documentation! ==================================== This package aims to provide standardized and easy usage of the C100's harvester RF fault waveform data. Github Page: https://github.com/JeffersonLab/rfwtools Contents: .. autosummary:: :toctree: _autosummary :template: custom-module-template.rst :recursive: rfwtools Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search` Usage Examples: ----------------- Here are a couple of different workflows supported by the package. Initial Setup ~~~~~~~~~~~~~~~~~~~ Start by saving this data in my-sample-labels.txt in the Config().label_dir directory (defaults to ./data/labels/). **THESE FIELDS SHOULD BE TAB SEPARATED. DOCUMENTATION SYSTEM INSISTS ON CONVERTING THEM TO SPACES. PLEASE FIX IF YOU TRY THIS EXAMPLE ON YOUR OWN** :: zone cavity cav# fault time 1L25 4 44 Microphonics 2020/03/10 01:08:41 2L24 5 77 Controls Fault 2020/03/10 01:42:03 1L25 5 45 Microphonics 2020/03/10 02:50:07 2L26 8 96 E_Quench 2020/03/10 02:58:13 1L25 5 45 Microphonics 2020/03/10 04:55:21 1L22 4 20 Quench_3ms 2020/03/10 05:06:13 1L25 5 45 Microphonics 2020/03/10 07:35:32 2L22 0 57 Multi Cav turn off 2020/03/10 07:59:49 2L23 0 65 Multi Cav turn off 2020/03/10 07:59:56 2L24 0 73 Multi Cav turn off 2020/03/10 08:00:03 You may also need to create a configuration file or update the configuration in code. Place a file called rfwtools.cfg in your current directory with the following information. Other options are available.: :: data_dir: /path/to/parent_dir/of/zone_dirs label_dir: /path/to/dir/containing/label_files output_dir: /path/to/where/save/files/live Alternatively, do this in code: :: from rfwtools.config import Config Config().data_dir = "/path/to/parent_dir/of/zone_dirs" Config().label_dir = "/path/to/dir/containing/label_files" Config().output_dir = "/path/to/where/save/files/live" Workflow Using DataSet and "regular" Examples ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here we use the tiny my-sample-labels.txt setup above. Much the workflow goes through the DataSet object. Since the Example class is the default for the DataSet, you don't need to specify much. :: from rfwtools.data_set import DataSet from rfwtools.extractor.autoregressive import autoregressive_extractor # Create a DataSet. For demo-purposes, I would make a small label file and run through. This can take hours/days to # process all of our data ds = DataSet(label_files=['my-sample-labels.txt']) # This will process the label files you have and create an ExampleSet under ds.example_set ds.produce_example_set() # Save a CSV of the examples. ds.save_example_set_csv("my_example_set.csv") # Show data from label sources, color by fault_label ds.example_set.display_frequency_barplot(x='label_source', color_by="fault_label") # Show heatmaps for 1L22-1L26 ds.example_set.display_zone_label_heatmap(zones=['1L22', '1L23', '1L24', '1L25', '1L26']) # Generate autoregressive features for this data set. This can take a while - e.g. a few seconds per example. ds.produce_feature_set(autoregressive_extractor) # Save the feature_set to a CSV ds.save_feature_set_csv("my_feature_set.csv") # Do dimensionality reduction ds.feature_set.do_pca_reduction(n_components=10) # Plot out some different aspects # Color by fault, marker style by cavity ds.feature_set.display_2d_scatterplot(hue="fault_label", style="cavity_label") # Color by zone, marker style by cavity, only microphonics faults ds.feature_set.display_2d_scatterplot(hue="zone", style="cavity_label", query="fault_label == 'Microphonics'") Workflow Using DataSet and WindowedExamples ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is very similar to the above, except that now we are using WindowedExamples and everything that comes with it. Please note that this is one of two ways to get "windowed" data. The other is to use the rfwtools.extractor.window_extractor method (see rfwtools.extractor.windowing for details). :: from rfwtools.data_set import DataSet from rfwtools.extractor.autoregressive import autoregressive_extractor from rfwtools.example import ExampleType from rfwtools.example_validator import WindowedExampleValidator # This tells the DataSet that you will want to work with WindowedExamples e_type = ExampleType.WINDOWED_EXAMPLE # These parameters will be passed to the Example objects upon construction, e.g., all example will have the same # window. Here we assume 0.2ms sample steps, and we want windows of 100ms, so 100*(1/0.2) = 500. e_kw = {"start": -1536, "n_samples": 500} # The WindowedExample class works slightly differently so it needs a different validator. This makes sure that the # each example has all of the characteristics we want (sample step size, number of capture files, etc.). ev = WindowedExampleValidator() # Create a DataSet. For demo-purposes, I would make a small label file and run through. This can take hours/days to # process all of our data. ds = DataSet(label_files=['my-sample-labels.txt'], e_type=e_type, example_validator=ev, example_kwargs=e_kw) # This will process the label files you have and create an ExampleSet under ds.example_set ds.produce_example_set() # From here on it's the same ... Workflow Without Using a DataSet: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There may be times when using a DataSet is cumbersome. A DataSet is really useful for generating an ExampleSet and/or FeatureSet, but their is no need to use one if you already have saved files ready to load. Here we load a file and add a day of week to the ExampleSet: :: from rfwtools.example_set import ExampleSet es = ExampleSet() es.load_csv("my_example_set.csv") df = es.get_example_df() df['my_feature'] = df.dtime.dt.day_name() es.update_example_set(df) Here we determine bypassed cavity information for a FeatureSet: :: from rfwtools.feature_set import FeatureSet from rfwtools.example import Example import pandas as pd # This method determines if a cavity was producing gradient above a threshold. It not, it is considered bypassed. def bypassed_cavity_extractor(example: Example, threshold: float = 0.5) -> pd.DataFrame: example.load_data() df = example.event_df example.unload_data() out = pd.DataFrame( {'has_bypassed': [False], 'num_bypassed': [0], 'c1_bypassed': [False], 'c2_bypassed': [False], 'c3_bypassed': [False], 'c4_bypassed': [False], 'c5_bypassed': [False], 'c6_bypassed': [False], 'c7_bypassed': [False], 'c8_bypassed': [False] }) for cav in range(1,9): if df[f"{cav}_GMES"].max() < threshold: out.has_bypassed = True out.num_bypassed += 1 out[f"c{cav}_bypassed"] = True return out # Load up the FeatureSet fs = FeatureSet() fs.load_csv("my_feature_set.csv") # Add the bypassed column data to the DataFrame df = fs.get_example_df() bypassed_df = pd.concat(df['example'].apply(bypassed_cavity_extractor).values, ignore_index=True) df = pd.concat([df, bypassed_df], axis=1) # Update the FeatureSet new_cols = ['has_bypassed', 'num_bypassed', 'c1_bypassed', 'c2_bypassed', 'c3_bypassed', 'c4_bypassed', 'c5_bypassed', 'c6_bypassed', 'c7_bypassed', 'c8_bypassed'] m_cols = fs.metadata_columns + new_cols fs.update_example_set(df, metadata_columns=m_cols) fs.do_pca_reduction() fs.display_2d_scatterplot(style='zone', hue='num_bypassed')