rfwtools.dim_reduction.pca.do_pca_reduction

rfwtools.dim_reduction.pca.do_pca_reduction(feature_df, metadata_cols, n_components=3, standardize=True, **kwargs)[source]

Performs PCA on subset of columns of feature_df and maintains some example info in results.

Parameters:
  • feature_df (DataFrame) – DataFrame containing example information and feature data.

  • metadata_cols (List[str]) – The column names of feature_df that contain the metadata of the events (labels, etc.). All columns not listed in event_cols are used in PCA analysis.

  • n_components (int) – The number of primary components to return

  • standardize (bool) – Should the features be standardized (i.e. (x-mean)/stddev)?

  • kwargs – A dictionary of keyword parameter name/values to be passed to sklearn.decomposition.PCA

Return type:

Union[Tuple[Union[DataFrame, Series], None], Tuple[Union[DataFrame, Series], PCA]]

Returns:

A tuple of a DataFrame and the PCA model object after fit_transform has been called. The DataFrame contains the PCA output (pc1, pc2, …, pcN) and specified metadata_cols. No data will be in the pc columns should n_components > len(feature_df). If no PCA object can be fit, then None will be returned.