CSV Data

We provide the relevant part *.EDM4EIC.root data converted to the CSV format

The CVS files are located in csv folder in the campaign directory. See DATA ACCESS
File names start the same as source edm4eic.root file and correspond to each other. E.g. k_lambda_5x41_5000evt_001.*
We also provide .csv.zip - zipped versions. Pandas can work with such files out of the box
Access to the CSV and .csv.zip files is the same. See DATA ACCESS page
CSV table names are embedded in extension before .csv , e.g. *.mcdis.csv, *.mcpart_lambda.csv
Column names are listed in the first line of the file (standard for CSV)

Example file names:

bash

# Original file
k_lambda_5x41_5000evt_001.edm4eic.root

# Related CSV-s
k_lambda_5x41_5000evt_001.mcdis.csv
k_lambda_5x41_5000evt_001.mcpart_lambda.csv

# Zi

All scripts that make EDM4HEP to CSV conversion are located at csv_convert dir.

Table definitions

For analyzing data, we can work with multiple CSV files that contain related information. The files are linked relationally. The first columns of a CSV table is always a primary key (e.g. event number). Or a composite key (e.g. event number + particle index). For example, all data related to e.g. k_lambda_5x41_5000evt_001.* will refer the same events.

These CSV files are essentially database tables, and understanding this relationship helps us organize and analyze data more effectively.

With python and pandas it is easy to organize them joined tables like MCvsReconstructed events

mc_dis

Files: *.mc_dis.csv
Conversion script: csv_convert/csv_mc_dis.cxx

True event level values that come from the event generator. evt - evnet id in file, the rest of the names correspond to table: mc-variables

Columns:

evt
alphas
mx2
nu
p_rt
pdrest
pperps
pperpz
q2
s_e,s_q
tempvar
tprime
tspectator
twopdotk
twopdotq
w
x_d
xbj
y_d
yplus

reco_dis

Files: *.reco_dis.csv
Conversion script: csv_convert/csv_reco_dis.cxx

Reconstructed (and true MC) event kinematic parameters including the reconstructed scattered electron information, beam particles, Lambda particles, and various t-value calculations.

EICRecon provides several algorithms calculating the DIS kinematics. We save them all to CSV. E.g. jb_q2 corresponds to Q2 obtained by Jacquet-Blondel method and electron_q2 corresponds to scattered electron method.

DIS Kinematics Columns

Prefixes - EDM4EIC Collection name:

da - "InclusiveKinematicsDA"
esigma - "InclusiveKinematicsESigma"
electron - "InclusiveKinematicsElectron"
jb - "InclusiveKinematicsJB"
ml - "InclusiveKinematicsML"
sigma - "InclusiveKinematicsSigma"
mc - True MC values from event parameters

For each kinematic method (da, esigma, electron, jb, ml, sigma, mc), variables are saved like:

{}_x - Bjorken x
{}_q2 - Q² [GeV²]
{}_y - Inelasticity y
{}_nu - Energy transfer ν [GeV]
{}_w - Invariant mass W [GeV]

T-value Columns

The script calculates several t-values (momentum transfer squared) using different beam configurations:

mc_true_t - True t-value from MC event parameters (dis_tspectator)
mc_lam_tb_t - t calculated using MC Lambda and true beam proton
mc_lam_exp_t - t calculated using MC Lambda and experimental beam proton
ff_lam_tb_t - t calculated using far-forward reconstructed Lambda and true beam
ff_lam_exp_t - t calculated using far-forward reconstructed Lambda and experimental beam

Important Physics Note:

True beam uses the actual MC beam proton momentum from the simulation
Experimental beam approximates what we would know in a real experiment:
- Detects the beam mode (41, 100, 130, or 275 GeV) from the true momentum
- Applies crossing angles: 25 mrad horizontal, 100 μrad vertical
- This mimics experimental conditions where we don't know the exact beam momentum

Scattered Electron Columns

For the reconstructed scattered electron (from the Electron method):

elec_id - Particle index in ReconstructedParticles collection
elec_energy - Total energy [GeV]
elec_px - Momentum x-component [GeV/c]
elec_py - Momentum y-component [GeV/c]
elec_pz - Momentum z-component [GeV/c]
elec_ref_x - Reference point x-coordinate
elec_ref_y - Reference point y-coordinate
elec_ref_z - Reference point z-coordinate
elec_pid_goodness - Particle ID quality metric
elec_type - Reconstruction type flag
elec_n_clusters - Number of associated clusters
elec_n_tracks - Number of associated tracks
elec_n_particles - Number of daughter particles
elec_n_particle_ids - Number of particle ID objects

MC Scattered Electron Momentum

mc_elec_px - MC truth scattered electron px [GeV/c]
mc_elec_py - MC truth scattered electron py [GeV/c]
mc_elec_pz - MC truth scattered electron pz [GeV/c]

Lambda Momentum Columns

MC truth Lambda:

mc_lam_px - MC Lambda px [GeV/c]
mc_lam_py - MC Lambda py [GeV/c]
mc_lam_pz - MC Lambda pz [GeV/c]

Far-forward reconstructed Lambda:

ff_lam_px - Far-forward Lambda px [GeV/c]
ff_lam_py - Far-forward Lambda py [GeV/c]
ff_lam_pz - Far-forward Lambda pz [GeV/c]

Beam Particle Momentum Columns

MC beam proton:

mc_beam_prot_px - Beam proton px [GeV/c]
mc_beam_prot_py - Beam proton py [GeV/c]
mc_beam_prot_pz - Beam proton pz [GeV/c]

MC beam electron:

mc_beam_elec_px - Beam electron px [GeV/c]
mc_beam_elec_py - Beam electron py [GeV/c]
mc_beam_elec_pz - Beam electron pz [GeV/c]

evt is the first column = event number.

So the complete column list is:

evt,
da_x,da_q2,da_y,da_nu,da_w,
esigma_x,esigma_q2,esigma_y,esigma_nu,esigma_w,
electron_x,electron_q2,electron_y,electron_nu,electron_w,
jb_x,jb_q2,jb_y,jb_nu,jb_w,
ml_x,ml_q2,ml_y,ml_nu,ml_w,
sigma_x,sigma_q2,sigma_y,sigma_nu,sigma_w,
mc_x,mc_q2,mc_y,mc_nu,mc_w,
mc_true_t,mc_lam_tb_t,mc_lam_exp_t,ff_lam_tb_t,ff_lam_exp_t,
elec_id,elec_energy,elec_px,elec_py,elec_pz,elec_ref_x,elec_ref_y,elec_ref_z,elec_pid_goodness,elec_type,elec_n_clusters,elec_n_tracks,elec_n_particles,elec_n_particle_ids,
mc_elec_px,mc_elec_py,mc_elec_pz,
mc_lam_px,mc_lam_py,mc_lam_pz,
ff_lam_px,ff_lam_py,ff_lam_pz,
mc_beam_prot_px,mc_beam_prot_py,mc_beam_prot_pz,
mc_beam_elec_px,mc_beam_elec_py,mc_beam_elec_pz

Notes:

The electron particle information is only available when the Electron method successfully reconstructs the scattered electron
If particles are not found/reconstructed, their columns will contain null values
T-values are calculated as t = (p1 - p2)² using 4-vectors
The experimental beam approximation is crucial for understanding systematic uncertainties in real experiments

mcpart_lambda

Files: *.mcpart_lambda.csv
Conversion script: csv_convert/csv_mcpart_lambda.cxx

Full MC particles information for a lambda decays chain by using MCParticles EDM4EIC table. MCParticles has relations like daughters and parents. Those relations are flattened by lambda decay. The columns represent possible lambda decays are grouped by particles:

Prefixes (each has the same parameters after)

lam - Λ
prot - p (if pπ- decay or nulls)
pimin - π- (if pπ- decay or nulls)
neut - Neutron (if n π0 decay)
pizero - pi0 - (if n π0 decay)
gamone - γ one from π0 decay (if pi0 decays)
gamtwo - γ two from π0 decay (if pi0 decays)

For each particle prefix, the next columns are saved:

{0}_id - id - particle index in MCParticles table
{0}_pdg - pdg - particle PDG
{0}_gen - gen - Generator Status (1 stable... probably)
{0}_sim - sim - Simulation Status (by Geant4)
{0}_px - px - Momentum
{0}_py - py
{0}_pz - pz
{0}_vx - vx - Origin vertex information
{0}_vy - vy
{0}_vz - vz
{0}_epx - epx - End Point (decay, or out of detector)
{0}_epy - epy
{0}_epz - epz
{0}_time - time - Time of origin
{0}_nd - nd - Number of daughters

So in the end the columns are:

yaml

evt,
lam_id,lam_pdg,lam_gen,lam_sim,lam_px,lam_py,lam_pz,lam_vx,lam_vy,lam_vz,lam_epx,lam_epy,lam_epz,lam_time,lam_nd,
prot_id,prot_pdg,prot_gen,prot_sim,prot_px,prot_py,prot_pz,prot_vx,prot_vy,prot_vz,prot_epx,prot_epy,prot_epz,prot_time,prot_nd,
pimin_id,pimin_pdg,pimin_gen,pimin_sim,pimin_px,pimin_py,pimin_pz,pimin_vx,pimin_vy,pimin_vz,pimin_epx,pimin_epy,pimin_epz,pimin_time,pimin_nd,neut_id,
neut_pdg,neut_gen,neut_sim,neut_px,neut_py,neut_pz,neut_vx,neut_vy,neut_vz,neut_epx,neut_epy,neut_epz,neut_time,neut_nd,
pizero_id,pizero_pdg,pizero_gen,pizero_sim,pizero_px,pizero_py,pizero_pz,pizero_vx,pizero_vy,pizero_vz,pizero_epx,pizero_epy,pizero_epz,pizero_time,pizero_nd,
gamone_id,gamone_pdg,gamone_gen,gamone_sim,gamone_px,gamone_py,gamone_pz,gamone_vx,gamone_vy,gamone_vz,gamone_epx,gamone_epy,gamone_epz,gamone_time,gamone_nd,
gamtwo_id,gamtwo_pdg,gamtwo_gen,gamtwo_sim,gamtwo_px,gamtwo_py,gamtwo_pz,gamtwo_vx,gamtwo_vy,gamtwo_vz,gamtwo_epx,gamtwo_epy,gamtwo_epz,gamtwo_time,gamtwo_nd

Notes:

Particles may not be decayed. E.g. Lambda may just go outside of detector designated volume, in this case lam_nd - Number of daughters will be 0 and the rest of columns will be null

reco_ff_lambdas

Files: *.reco_ff_lambdas_ngamgam.csv
Conversion script: csv_convert/csv_reco_ff_lambda.cxx

Reconstructed Lambda particles and their decay products from the far-forward Zero Degree Calorimeter (ZDC), specifically for the decay channel Λ → n + π⁰ → n + γ + γ. This table uses the ReconstructedFarForwardZDCLambdas collection from EDM4EIC and flattens the decay hierarchy similar to mcpart_lambda.

The columns are grouped by particles in the decay chain:

Prefixes (each has the same parameters after):

lam - Λ (Lambda baryon)
neut - Neutron from Λ decay
gam1 - First γ from π⁰ decay
gam2 - Second γ from π⁰ decay

For each particle prefix, the following columns are saved:

{0}_id - id - particle index in ReconstructedParticles collection
{0}_pdg - pdg - particle PDG code
{0}_charge - charge - electric charge
{0}_energy - energy - total energy [GeV]
{0}_mass - mass - invariant mass [GeV/c²]
{0}_px - px - momentum x-component [GeV/c]
{0}_py - py - momentum y-component [GeV/c]
{0}_pz - pz - momentum z-component [GeV/c]
{0}_ref_x - ref_x - reference point x-coordinate
{0}_ref_y - ref_y - reference point y-coordinate
{0}_ref_z - ref_z - reference point z-coordinate
{0}_pid_goodness - pid_goodness - particle ID quality metric
{0}_type - type - reconstruction type flag
{0}_n_clusters - n_clusters - number of associated clusters
{0}_n_tracks - n_tracks - number of associated tracks
{0}_n_particles - n_particles - number of daughter particles
{0}_n_particle_ids - n_particle_ids - number of particle ID objects
{0}_cov_xx - cov_xx - covariance matrix element
{0}_cov_xy - cov_xy - covariance matrix element
{0}_cov_xz - cov_xz - covariance matrix element
{0}_cov_yy - cov_yy - covariance matrix element
{0}_cov_yz - cov_yz - covariance matrix element
{0}_cov_zz - cov_zz - covariance matrix element
{0}_cov_xt - cov_xt - covariance matrix element
{0}_cov_yt - cov_yt - covariance matrix element
{0}_cov_zt - cov_zt - covariance matrix element
{0}_cov_tt - cov_tt - covariance matrix element

The complete column list is:

yaml

event,
lam_id,lam_pdg,lam_charge,lam_energy,lam_mass,lam_px,lam_py,lam_pz,lam_ref_x,lam_ref_y,lam_ref_z,lam_pid_goodness,lam_type,lam_n_clusters,lam_n_tracks,lam_n_particles,lam_n_particle_ids,lam_cov_xx,lam_cov_xy,lam_cov_xz,lam_cov_yy,lam_cov_yz,lam_cov_zz,lam_cov_xt,lam_cov_yt,lam_cov_zt,lam_cov_tt,
neut_id,neut_pdg,neut_charge,neut_energy,neut_mass,neut_px,neut_py,neut_pz,neut_ref_x,neut_ref_y,neut_ref_z,neut_pid_goodness,neut_type,neut_n_clusters,neut_n_tracks,neut_n_particles,neut_n_particle_ids,neut_cov_xx,neut_cov_xy,neut_cov_xz,neut_cov_yy,neut_cov_yz,neut_cov_zz,neut_cov_xt,neut_cov_yt,neut_cov_zt,neut_cov_tt,
gam1_id,gam1_pdg,gam1_charge,gam1_energy,gam1_mass,gam1_px,gam1_py,gam1_pz,gam1_ref_x,gam1_ref_y,gam1_ref_z,gam1_pid_goodness,gam1_type,gam1_n_clusters,gam1_n_tracks,gam1_n_particles,gam1_n_particle_ids,gam1_cov_xx,gam1_cov_xy,gam1_cov_xz,gam1_cov_yy,gam1_cov_yz,gam1_cov_zz,gam1_cov_xt,gam1_cov_yt,gam1_cov_zt,gam1_cov_tt,
gam2_id,gam2_pdg,gam2_charge,gam2_energy,gam2_mass,gam2_px,gam2_py,gam2_pz,gam2_ref_x,gam2_ref_y,gam2_ref_z,gam2_pid_goodness,gam2_type,gam2_n_clusters,gam2_n_tracks,gam2_n_particles,gam2_n_particle_ids,gam2_cov_xx,gam2_cov_xy,gam2_cov_xz,gam2_cov_yy,gam2_cov_yz,gam2_cov_zz,gam2_cov_xt,gam2_cov_yt,gam2_cov_zt,gam2_cov_tt

Notes:

ZDC reconstructed lambdas look only Lambda decays (Λ → n + π⁰ → n + γ + γ channel)
If a particle is not reconstructed or missing, its columns will contain null values
The n_particles field for the Lambda indicates the number of reconstructed daughter particles

Combine Multiple Files

When we have multiple CSV files from different runs or datasets, each file starts its event numbering from 0:

File 1: evt = [0, 1, 2, 3, 4, ...]
File 2: evt = [0, 1, 2, 3, 4, ...]  ← ID Collision!
File 3: evt = [0, 1, 2, 3, 4, ...]  ← ID Collision!

Problem: Event 0 from File 1 is completely different from Event 0 from File 2, but they have the same ID if read in pandas directly!

Use functions like this to read multiple files in one DF

python

import pandas as pd
import glob

def concat_csvs_with_unique_events(files):
    """Load and concatenate CSV files with globally unique event IDs"""
    dfs = []
    offset = 0
    
    for file in files:
        df = pd.read_csv(file)
        df['evt'] = df['evt'] + offset  # Make IDs globally unique
        offset = df['evt'].max() + 1    # Set offset for next file
        dfs.append(df)
    
    return pd.concat(dfs, ignore_index=True)

# Load both tables with unique event IDs
lambda_df = concat_csvs_with_unique_events(sorted(glob.glob("mcpart_lambda*.csv")))
dis_df = concat_csvs_with_unique_events(sorted(glob.glob("dis_parameters*.csv")))

Result: Now we have globally unique event IDs:

File 1: evt = [0, 1, 2, 3, 4]
File 2: evt = [5, 6, 7, 8, 9]     ← No collision!  
File 3: evt = [10, 11, 12, 13, 14] ← No collision!

CSV Data ​

Table definitions ​

mc_dis ​

reco_dis ​

DIS Kinematics Columns ​

T-value Columns ​

Scattered Electron Columns ​

MC Scattered Electron Momentum ​

Lambda Momentum Columns ​

Beam Particle Momentum Columns ​

mcpart_lambda ​

reco_ff_lambdas ​

Combine Multiple Files ​

CSV Data

Table definitions

mc_dis

reco_dis

DIS Kinematics Columns

T-value Columns

Scattered Electron Columns

MC Scattered Electron Momentum

Lambda Momentum Columns

Beam Particle Momentum Columns

mcpart_lambda

reco_ff_lambdas

Combine Multiple Files