Data exploration and visualization ================================== #### Introductory notes ADMET-XSpec supports creating PCA projection visualizations in 2 or 3 dimensions, as well as t-SNE and UMAP visualizations in 2 dimensions. #### Goals After reading this section, you should understand: 1. How the `ReducerBase` and `VisualizerBase` classes work 2. How to choose the right `processing_plan` for data exploration and visualization The `VisualizerBase` interface, implemented only by `ProjectionVisualizer`, takes a dictionary of pandas dataframes where the string keys are the dataset `friendly_names` and the values are the datasets. `VisualizerBase` exposes a public `get_visualization` method and enforces that the implementing class handles conversion to numpy form (as expected by matplotlib) inside `_get_visualizable_form`. The public `get_visualization` method expects data that has been "reduced" - i.e., passed through an implementation of the `ReducerBase` interface. `ReducerBase` is composed of a `VisualizerBase` class: for our current PCA, t-SNE, and UMAP visualizations, it is always `ProjectionVisualizer`. It exposes the public `get_reduced_df` method, which maps the features of a preprocessed (non-null, canonicalized, normalized & featurized) dataset - i.e., its columns - into the reducer's lower-dimensional output. **This pairing of `get_reduced_df` and `get_visualization` is exploited in ProcessingPipeline**, specifically whenever `do_visualize_datasets` or `do_visualize_train_test` are enabled in a processing plan. An example of a generated visualization is provided here: ```{eval-rst} .. image:: tsne.png ```