API Reference

The shapicant module implements a feature selection algorithm based on SHAP and target permutation.

class shapicant.BaseSelector(estimator: object, explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Abstract base class for all selectors in shapicant.

Parameters
  • estimator – A supervised learning estimator with a ‘fit’ method.

  • explainer_type – A SHAP explainer type.

  • n_iter – The number of iterations to perform.

  • verbose – Controls verbosity of output.

  • random_state – Parameter to control the random number generator used.

p_values_

Series containing the empirical p-values ​​of the features.

Type

Series

abstract fit(*args, **kwargs)[source]

Abstract ‘fit’ method.

abstract fit_transform(*args, **kwargs)[source]

Abstract ‘fit_transform’ method.

get_features(alpha: float = 0.05) List[object][source]

Get a list of the features selected.

Parameters

alpha – Level at which the empirical p-values will get rejected.

Returns

The list of features with a p-value <= alpha.

abstract transform(*args, **kwargs)[source]

Abstract ‘transform’ method.

class shapicant.PandasSelector(estimator: Union[sklearn.base.BaseEstimator, Callable], explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Class for the Pandas selector in shapicant.

Parameters
  • estimator – A supervised learning estimator with a ‘fit’ method.

  • explainer_type – A SHAP explainer type.

  • n_iter – The number of iterations to perform.

  • verbose – Controls verbosity of output.

  • random_state – Parameter to control the random number generator used.

fit(X: pandas.core.frame.DataFrame, y: Union[numpy.array, pandas.core.series.Series, pandas.core.frame.DataFrame], X_validation: Optional[pandas.core.frame.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None) shapicant._pandas_selector.PandasSelector[source]

Fit the Pandas selector with the provided estimator.

Parameters
  • X – The training input samples.

  • y – The target values.

  • X_validation – The validation input samples.

  • estimator_params – Additional parameters for the underlying estimator’s fit method.

  • explainer_type_params – Additional parameters for the explainer’s init.

  • explainer_params – Additional parameters for the explainer’s shap_values method.

fit_transform(X: pandas.core.frame.DataFrame, y: Union[numpy.array, pandas.core.series.Series, pandas.core.frame.DataFrame], X_validation: Optional[pandas.core.frame.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, alpha: float = 0.05) pandas.core.frame.DataFrame[source]

Fit the Pandas selector and reduce data to the selected features.

Parameters
  • X – The training input samples.

  • y – The target values.

  • X_validation – The validation input samples.

  • estimator_params – Additional parameters for the underlying estimator’s fit method.

  • explainer_type_params – Additional parameters for the explainer’s init.

  • explainer_params – Additional parameters for the explainer’s shap_values method.

  • alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features.

transform(X: pandas.core.frame.DataFrame, alpha: float = 0.05) pandas.core.frame.DataFrame[source]

Reduce data to the selected features.

Parameters
  • X – The input samples.

  • alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features.

class shapicant.SparkSelector(estimator: pyspark.ml.wrapper.JavaEstimator, explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Class for the Spark selector in shapicant.

Parameters
  • estimator – A supervised learning estimator with a ‘fit’ method.

  • explainer_type – A SHAP explainer type.

  • n_iter – The number of iterations to perform.

  • verbose – Controls verbosity of output.

  • random_state – Parameter to control the random number generator used.

fit(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, broadcast: bool = True) shapicant._spark_selector.SparkSelector[source]

Fit the Spark selector with the provided estimator.

Parameters
  • sdf – The training input samples.

  • label_col – The target column name.

  • sdf_validation – The validation input samples.

  • estimator_params – Additional parameters for the underlying estimator’s fit method.

  • explainer_type_params – Additional parameters for the explainer’s init.

  • explainer_params – Additional parameters for the explainer’s shap_values method.

  • broadcast – Whether to broadcast the target column when joining.

fit_transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, broadcast: bool = True, alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]

Fit the Spark selector and reduce data to the selected features.

Parameters
  • sdf – The training input samples.

  • label_col – The target column name.

  • sdf_validation – The validation input samples.

  • estimator_params – Additional parameters for the underlying estimator’s fit method.

  • explainer_type_params – Additional parameters for the explainer’s init.

  • explainer_params – Additional parameters for the explainer’s shap_values method.

  • broadcast – Whether to broadcast the target column when joining.

  • alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.

transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]

Reduce data to the selected features.

Parameters
  • sdf – The input samples.

  • label_col – The target column name.

  • alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.

class shapicant.SparkUdfSelector(estimator: Union[sklearn.base.BaseEstimator, Callable], explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Class for the Spark UDF selector in shapicant.

Parameters
  • estimator – A supervised learning estimator with a ‘fit’ method.

  • explainer_type – A SHAP explainer type.

  • n_iter – The number of iterations to perform.

  • verbose – Controls verbosity of output.

  • random_state – Parameter to control the random number generator used.

fit(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None) shapicant._spark_udf_selector.SparkUdfSelector[source]

Fit the Spark UDF selector with the provided estimator.

Parameters
  • sdf – The training input samples.

  • label_col – The target column name.

  • sdf_validation – The validation input samples.

  • estimator_params – Additional parameters for the underlying estimator’s fit method.

  • explainer_type_params – Additional parameters for the explainer’s init.

  • explainer_params – Additional parameters for the explainer’s shap_values method.

fit_transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]

Fit the Spark UDF selector and reduce data to the selected features.

Parameters
  • sdf – The training input samples.

  • label_col – The target column name.

  • sdf_validation – The validation input samples.

  • estimator_params – Additional parameters for the underlying estimator’s fit method.

  • explainer_type_params – Additional parameters for the explainer’s init.

  • explainer_params – Additional parameters for the explainer’s shap_values method.

  • alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.

transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]

Reduce data to the selected features.

Parameters
  • sdf – The input samples.

  • label_col – The target column name.

  • alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.