src.data.featurizer¶
- class src.data.featurizer.EcfpFeaturizer(radius: int = 2, n_bits: int = 2048, count: bool = False)¶
Extended Connectivity Fingerprint (Morgan/ECFP) featurizer.
- property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- featurize(smiles_list: List[str]) ndarray¶
Generate ECFP fingerprints for given SMILES.
- get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str
- class src.data.featurizer.FeaturizerBase¶
Base class for molecular featurizers. Abstract interface for converting SMILES strings to numerical feature representations.
- abstract property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- abstract featurize(smiles_list: List[str]) ndarray¶
Convert SMILES strings to numerical feature arrays.
- Parameters:
smiles_list (List[str]) – List of SMILES strings representing molecules
- Returns:
2D array where each row corresponds to a molecule’s feature vector
- Return type:
np.ndarray
- get_cache_key()¶
Generate a 5-character cache key from featurizer parameters.
Creates identifier by MD5 hashing the parameter values and combining with featurizer name.
- Returns:
Cache key in format ‘{name}_{hash[:5]}’
- Return type:
str
- abstract get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- abstract property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str
- class src.data.featurizer.KlekotaRothFeaturizer(keys_path: str)¶
Klekota-Roth fingerprint featurizer.
- property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- featurize(smiles_list: List[str]) ndarray¶
Generate Klekota-Roth fingerprints for given SMILES.
- get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str
- class src.data.featurizer.MaccsFeaturizer(*args, **kwargs)¶
MACCS keys fingerprint featurizer.
- property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- featurize(smiles_list: List[str]) ndarray¶
Generate MACCS keys fingerprints for given SMILES.
- get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str
- class src.data.featurizer.Map4Featurizer(size: int = 2048, radius: int = 2, include_duplicated_shingles: bool = False)¶
MAP4 fingerprint featurizer.
- property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- featurize(smiles_list: List[str]) ndarray¶
Generate MAP4 fingerprints for given SMILES.
- get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str
- class src.data.featurizer.PropertyEcfpFeaturizer(radius: int = 2, n_bits: int = 2048, count: bool = False)¶
Combined property descriptor and ECFP fingerprint featurizer.
- property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- featurize(smiles_list: List[str]) ndarray¶
Generate combined ECFP and property features.
- get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str
- class src.data.featurizer.PropertyFeaturizer(scaler=StandardScaler())¶
RDKit molecular property descriptor featurizer with normalization.
- property feature_name: str¶
Column name for storing features.
- Returns:
Name used for feature column in DataFrames
- Return type:
str
- featurize(smiles_list: List[str]) ndarray¶
Generate normalized molecular descriptors.
- get_hashable_params_values() List[Hashable]¶
Return parameters for hashing/caching purposes.
- Returns:
List of parameter values that uniquely identify this featurizer configuration
- Return type:
List[Hashable]
- property name: str¶
Human-readable featurizer name.
- Returns:
Descriptive name for this featurizer
- Return type:
str