API Reference
The shapicant module implements a feature selection algorithm based on SHAP and target permutation.
- class shapicant.BaseSelector(estimator: object, explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]
Abstract base class for all selectors in shapicant.
- Parameters
estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.
- p_values_
Series containing the empirical p-values of the features.
- Type
Series
- class shapicant.PandasSelector(estimator: Union[sklearn.base.BaseEstimator, Callable], explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]
Class for the Pandas selector in shapicant.
- Parameters
estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.
- fit(X: pandas.core.frame.DataFrame, y: Union[numpy.array, pandas.core.series.Series, pandas.core.frame.DataFrame], X_validation: Optional[pandas.core.frame.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None) shapicant._pandas_selector.PandasSelector[source]
Fit the Pandas selector with the provided estimator.
- Parameters
X – The training input samples.
y – The target values.
X_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
- fit_transform(X: pandas.core.frame.DataFrame, y: Union[numpy.array, pandas.core.series.Series, pandas.core.frame.DataFrame], X_validation: Optional[pandas.core.frame.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, alpha: float = 0.05) pandas.core.frame.DataFrame[source]
Fit the Pandas selector and reduce data to the selected features.
- Parameters
X – The training input samples.
y – The target values.
X_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
alpha – Level at which the empirical p-values will get rejected.
- Returns
The input DataFrame reduced to the selected features.
- class shapicant.SparkSelector(estimator: pyspark.ml.wrapper.JavaEstimator, explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]
Class for the Spark selector in shapicant.
- Parameters
estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.
- fit(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, broadcast: bool = True) shapicant._spark_selector.SparkSelector[source]
Fit the Spark selector with the provided estimator.
- Parameters
sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
broadcast – Whether to broadcast the target column when joining.
- fit_transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, broadcast: bool = True, alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]
Fit the Spark selector and reduce data to the selected features.
- Parameters
sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
broadcast – Whether to broadcast the target column when joining.
alpha – Level at which the empirical p-values will get rejected.
- Returns
The input DataFrame reduced to the selected features and target.
- transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]
Reduce data to the selected features.
- Parameters
sdf – The input samples.
label_col – The target column name.
alpha – Level at which the empirical p-values will get rejected.
- Returns
The input DataFrame reduced to the selected features and target.
- class shapicant.SparkUdfSelector(estimator: Union[sklearn.base.BaseEstimator, Callable], explainer_type: Type[shap.explainers._explainer.Explainer], n_iter: int = 100, verbose: Union[int, bool] = 1, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None)[source]
Class for the Spark UDF selector in shapicant.
- Parameters
estimator – A supervised learning estimator with a ‘fit’ method.
explainer_type – A SHAP explainer type.
n_iter – The number of iterations to perform.
verbose – Controls verbosity of output.
random_state – Parameter to control the random number generator used.
- fit(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None) shapicant._spark_udf_selector.SparkUdfSelector[source]
Fit the Spark UDF selector with the provided estimator.
- Parameters
sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
- fit_transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', sdf_validation: Optional[pyspark.sql.dataframe.DataFrame] = None, estimator_params: Optional[Dict[str, object]] = None, explainer_type_params: Optional[Dict[str, object]] = None, explainer_params: Optional[Dict[str, object]] = None, alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]
Fit the Spark UDF selector and reduce data to the selected features.
- Parameters
sdf – The training input samples.
label_col – The target column name.
sdf_validation – The validation input samples.
estimator_params – Additional parameters for the underlying estimator’s fit method.
explainer_type_params – Additional parameters for the explainer’s init.
explainer_params – Additional parameters for the explainer’s shap_values method.
alpha – Level at which the empirical p-values will get rejected.
- Returns
The input DataFrame reduced to the selected features and target.
- transform(sdf: pyspark.sql.dataframe.DataFrame, label_col: str = 'label', alpha: float = 0.05) pyspark.sql.dataframe.DataFrame[source]
Reduce data to the selected features.
- Parameters
sdf – The input samples.
label_col – The target column name.
alpha – Level at which the empirical p-values will get rejected.
- Returns
The input DataFrame reduced to the selected features and target.