API Reference

The shapicant module implements a feature selection algorithm based on SHAP and target permutation.

class shapicant.BaseSelector(estimator: object, explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Abstract base class for all selectors in shapicant.

Parameters

estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.

p_values_

Series containing the empirical p-values of the features.

Type: Series

abstract fit(*args, **kwargs)[source]: Abstract ‘fit’ method.

abstract fit_transform(*args, **kwargs)[source]: Abstract ‘fit_transform’ method.

get_features(alpha: float = 0.05) → List[object][source]

Get a list of the features selected.

Parameters: alpha – Level at which the empirical p-values will get rejected.
Returns: The list of features with a p-value <= alpha.

abstract transform(*args, **kwargs)[source]: Abstract ‘transform’ method.

class shapicant.PandasSelector(estimator: Union[sklearn.base.BaseEstimator, Callable], explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Class for the Pandas selector in shapicant.

Parameters

estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.

fit(X: pandas.core.frame.DataFrame, y: Union[numpy.array, pandas.core.series.Series, pandas.core.frame.DataFrame], X_validation: Optional[pandas.core.frame.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None) → shapicant._pandas_selector.PandasSelector[source]

Fit the Pandas selector with the provided estimator.

Parameters

X – The training input samples.
y – The target values.
X_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.

fit_transform(X: pandas.core.frame.DataFrame, y: Union[numpy.array, pandas.core.series.Series, pandas.core.frame.DataFrame], X_validation: Optional[pandas.core.frame.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, alpha: float = 0.05) → pandas.core.frame.DataFrame[source]

Fit the Pandas selector and reduce data to the selected features.

Parameters

X – The training input samples.
y – The target values.
X_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features.

transform(X: pandas.core.frame.DataFrame, alpha: float = 0.05) → pandas.core.frame.DataFrame[source]

Reduce data to the selected features.

Parameters

X – The input samples.
alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features.

class shapicant.SparkSelector(estimator: pyspark.ml.wrapper.JavaEstimator, explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Class for the Spark selector in shapicant.

Parameters

estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.

fit(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, broadcast: bool = True) → shapicant._spark_selector.SparkSelector[source]

Fit the Spark selector with the provided estimator.

Parameters

sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
broadcast – Whether to broadcast the target column when joining.

fit_transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, broadcast: bool = True, alpha: float = 0.05) → pyspark.sql.dataframe.DataFrame[source]

Fit the Spark selector and reduce data to the selected features.

Parameters

sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
broadcast – Whether to broadcast the target column when joining.
alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.

transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', alpha: float = 0.05) → pyspark.sql.dataframe.DataFrame[source]

Reduce data to the selected features.

Parameters

sdf – The input samples.
label_col – The target column name.
alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.

class shapicant.SparkUdfSelector(estimator: Union[sklearn.base.BaseEstimator, Callable], explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]

Class for the Spark UDF selector in shapicant.

Parameters

estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.

fit(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None) → shapicant._spark_udf_selector.SparkUdfSelector[source]

Fit the Spark UDF selector with the provided estimator.

Parameters

sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.

fit_transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, alpha: float = 0.05) → pyspark.sql.dataframe.DataFrame[source]

Fit the Spark UDF selector and reduce data to the selected features.

Parameters

sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.

transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', alpha: float = 0.05) → pyspark.sql.dataframe.DataFrame[source]

Reduce data to the selected features.

Parameters

sdf – The input samples.
label_col – The target column name.
alpha – Level at which the empirical p-values will get rejected.

Returns

The input DataFrame reduced to the selected features and target.