src.data.featurizer

class src.data.featurizer.EcfpFeaturizer(radius: int = 2, n_bits: int = 2048, count: bool = False)

Extended Connectivity Fingerprint (Morgan/ECFP) featurizer.

property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

featurize(smiles_list: List[str]) ndarray

Generate ECFP fingerprints for given SMILES.

get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str

class src.data.featurizer.FeaturizerBase

Base class for molecular featurizers. Abstract interface for converting SMILES strings to numerical feature representations.

abstract property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

abstract featurize(smiles_list: List[str]) ndarray

Convert SMILES strings to numerical feature arrays.

Parameters:

smiles_list (List[str]) – List of SMILES strings representing molecules

Returns:

2D array where each row corresponds to a molecule’s feature vector

Return type:

np.ndarray

get_cache_key()

Generate a 5-character cache key from featurizer parameters.

Creates identifier by MD5 hashing the parameter values and combining with featurizer name.

Returns:

Cache key in format ‘{name}_{hash[:5]}’

Return type:

str

abstract get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

abstract property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str

class src.data.featurizer.KlekotaRothFeaturizer(keys_path: str)

Klekota-Roth fingerprint featurizer.

property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

featurize(smiles_list: List[str]) ndarray

Generate Klekota-Roth fingerprints for given SMILES.

get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str

class src.data.featurizer.MaccsFeaturizer(*args, **kwargs)

MACCS keys fingerprint featurizer.

property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

featurize(smiles_list: List[str]) ndarray

Generate MACCS keys fingerprints for given SMILES.

get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str

class src.data.featurizer.Map4Featurizer(size: int = 2048, radius: int = 2, include_duplicated_shingles: bool = False)

MAP4 fingerprint featurizer.

property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

featurize(smiles_list: List[str]) ndarray

Generate MAP4 fingerprints for given SMILES.

get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str

class src.data.featurizer.PropertyEcfpFeaturizer(radius: int = 2, n_bits: int = 2048, count: bool = False)

Combined property descriptor and ECFP fingerprint featurizer.

property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

featurize(smiles_list: List[str]) ndarray

Generate combined ECFP and property features.

get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str

class src.data.featurizer.PropertyFeaturizer(scaler=StandardScaler())

RDKit molecular property descriptor featurizer with normalization.

property feature_name: str

Column name for storing features.

Returns:

Name used for feature column in DataFrames

Return type:

str

featurize(smiles_list: List[str]) ndarray

Generate normalized molecular descriptors.

get_hashable_params_values() List[Hashable]

Return parameters for hashing/caching purposes.

Returns:

List of parameter values that uniquely identify this featurizer configuration

Return type:

List[Hashable]

property name: str

Human-readable featurizer name.

Returns:

Descriptive name for this featurizer

Return type:

str