This functionality is available to customers who have purchased the Data Distiller add on. For more information, contact your Adobe representative.
To meet your feature engineering needs use the SQL transformer extension to simplify and automate your data preprocessing. Use this extension to build features and enjoy seamless experimentation with different feature engineering techniques, including associating them with models. Designed for distributed computing, you can perform feature engineering on large datasets in a parallel and scalable manner, significantly reducing the time required for data preprocessing with the Data Distiller feature engineering SQL extension.
The feature engineering capabilities cover three main areas: Feature Extraction, Feature Transformation, and Feature Selection. Each area includes specific functions designed to extract, convert, focus and improve your data preprocessing.
Extract relevant information from your data, especially text data, and convert it into a numerical format that the supported models can consume or transform and derive datasets. Use the following functions to perform feature extraction:
In addition to extracting features, use the following general transformers to prepare your features for advanced statistical models and derived datasets. Apply scaling, normalization, or encoding to ensure that your features are on the same scale and have a similar distribution.
Below is a list of tools for processing a wide range of data types to enhance your data preprocessing workflow.
true
or false
.Apply these techniques to effectively process and scale numerical data for improved model performance.
Use these transformers to convert and encode categorical data into formats suitable for machine learning models.
Next, focus on selecting a subset of the most important features from the original set. This process helps reduce the dimensionality of your data, making it easier for your models to process and improving overall model performance.
When you define your model, use the OPTIONS
clause to specify the algorithm and its parameters. Begin by setting the type
parameter to indicate the algorithm you are using, such as K-Means
. Then, define the relevant parameters in the OPTIONS
clause as key-value pairs to fine-tune your model. If you choose not to customize certain parameters, the system applies default settings. Refer to the relevant documentation to understand each parameter’s function and default values.
After learning the feature engineering techniques outlined in this document, progress onto the Models document. It guides you through the process of creating, training, and managing trusted models using the features you’ve engineered. Once your models are built, proceed to the Implement advanced statistical models document.. This document serves as an overview, linking to in-depth guides for different modeling techniques, including clustering, classification, and regression. By following these documents, you learn how to configure and implement various trusted models within your SQL workflows and optimize your models for advanced data analysis.