Welcome | Instructions | Tutorial |
---|
Tutorial
Before following this tutorial, please have a look at the instructions for supported file types and formatting. This tutorial assumes that you are working in root directory where you cloned the repository. It is the directory where AIMCEG folder is created by git.
Importing the event generator
from AIMCEG.ThreeParticlesEvent import EventGenerator
Building the generator
generator = EventGenerator()
This will build the generator which we have trained and provided in the toolkit. It is trained on Proton, Pi_plus, X events. Now you can call generate method to generate the events.
generator.generate(100000)
- It will generate 100,000 events and save them in the current directory as csv file named AI_Events.csv
- If you are using jupyter notebook, it will plot their distribution as histograms and compare with the data in provided datafile.
- If you want to save it into a different directory, you can tell the generator function by passing path argument.
- Similarly, you can ask generator function to save it in a different file format (currently supported formats are csv, json, and root) as shown below
generator.generate(100000, path="path_to_directory", fileType='root')
These generated event will represent the distribution that we trained on.
Training the generator
Let’s start from scratch and build a generator to train on a different dataset.
generator = EventGenerator(filePath="eventData.csv", ignoreColumns=['M2pi', 'FIELD15'], computeParticle="pim", mass={"p":0.93827, "pip":0.1395, "pim": 0.1395})
In order to train your own generator, you need to provide the following things,
-
filePath : string, required
representing path to the data file. -
ignoreColumns: list of str, optional (default set to None)
column names from the data file to be ignored -
ignoreParticles: list of str, optional
Names of the particles from the datafile that should be ignored. For instance your datafile may contain features of many particles and you need to ignore features of those additional particles. The generator will ignore all the features of given particles in this list. -
computeParticle: str, optional
In order to follow the law of conservation of momentum, the generator will always generate one particle less than the actual number of particles in the event. Then the generator will compute the feature of that remaining particle based on others.
You can specify name of the Particle that is not considered for generator training using this computeParticle argument.
default - last particle in the list of particles -
headNode: str, optional
name of the head node of the tree containing the data if root file is used as input data
default - first tree in the root file -
load_weights : Boolean, optional (default set to True)
load_weights if set to True, will initialize the model with pre-trained weights enabling transfer learning. Can be set to False to initialize model with random weights. -
load_weights_from: str, optional
Path to the directory containing pre-trained models to load. By default if load_weights is not set to False, it will load the trained generator we provide. -
mass: dict, required
Mass of the particles. Required in order to compute the energy values. The keys of the dictionary should match names of the particles in the datafile. -
noiseDistribution: str, optional Type of noise to be fed to the generator. By default it is drawn from normal distribution but you can also use “uniform”.
-
lr_scheduler: tf.keras.optimizers.schedules.LearningRateSchedule object, optional Learning rate scheduler, if not provided, the training will use Adam optimizor with fixed learning rate. TF/keras provided learning rate schedulers can be use, one such example is as below,
scheduler = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=1e-4, decay_steps=10000, decay_rate=0.9) generator = EventGenerator(filePath="eventData.csv", ignoreColumns=['M2pi', 'FIELD15'], computeParticle="pim", mass={"p":0.93827, "pip":0.1395, "pim": 0.1395}, lr_scheduler=scheduler)
After you have successfully built the generator, it will print the information about the data for you to validate. An example output is shown below,
Training on:
Particles: ['p', 'pip']
Particles' features: ['x', 'y', 'z']
Other features: ['gamma']
You can access the data it parsed with
print(generator.data)
print(generator.trainingFeaturesList)
to validate.
Now, it’s time to start the training!
generator.train()
This will start the training with default parameters. But you can pass in the following arguments to change them.
-
epochs : int, optional (default set to 100,000)
Number of training iterations. -
batch_size : int, optional (default set to 512)
Number of samples to include in each training batch. -
sample_interval : int, optional (default set to 1000)
Period in terms of number of iterations, after each period plots will be generated to visualize the training. -
path : str, optional (default set to current directory)
Path to the directory where trained models will be saved. -
verbose : int, optional (default set to 1) if set to 0, no intermediate training steps will be printed.
Once the training is finished the models will be saved in the current directory (or the directory you specified) in two folders named generator and discriminator.
You can generate new events by calling generate function as above. Now, they will represent distribution of the data you trained it on.
(Beta) Uncertainty Quantification
You can quantify the uncertainties in the trained generator by calling below function
generator.quantifyUncertainty()
It will show uncertainty plots for the first three features in your trainingFeaturesList. This is a beta function, still under development.
Lambda Layer
The generator tries to use physics informed Machine Learning at some level by utilizing a lambda layer. It always learns to generate the features of all the particles but one. This one particle’s features are computed based on features of the other particles so that law of conservation of momentum is not violated. Which particle to be computed by the labmda layer can be set set by passing computeParticle argument while building the generator.
(Beta) Custom Lambda Layer
You may want to use your own version of lambda layer to do some postprocessing on the generated features before they are fed to the discriminator during training. To implement your own version of lambda layer, you can create a generator class while inherriting our EventGenerator class and override the function named generatorLambdaLayer as shown below
class MyGenerator(EventGenerator):
"""
"""
self.lambdaParams = [4,5,4,6]
def __init__(self, filePath, ignoreColumns=[], computeParticle="", mass={}):
# Optional learning rate scheduler
self.lr_scheduler = scheduler = tf.keras.optimizers.schedules.ExponentialDecay(initial_learning_rate=1e-4,
decay_steps=10000,
decay_rate=0.9)
super().__init__(filePath, ignoreColumns=ignoreColumns, computeParticle=computeParticle, mass=mass)
def generatorLambdaLayer(self, x, params):
<your lambda layer definition>
You can pass in some static elements (for instance mean, std) to your lambda layer if needed by setting self.lambdaParams variable as shown above. Make sure to initialize this variable before calling the constructor of the super class because it will be used there to pass in the model. You can access this variable in the lambda layer definition with name params as shown above.