This functionality is available to customers who have purchased the Data Distiller add on. For more information, contact your Adobe representative.
Statistical modeling is used to make predictions, detect patterns, and generate insights from data. This applies to large, high-dimensional datasets with complex structures in a distributed fashion. Use the Data Distiller SQL extension to leverage statistical models and transform raw data by simplifying and automating data preprocessing on large datasets in a timely, parallel, and scalable manner.
This series of documents provides a comprehensive guide on using the Data Distiller SQL extension to perform traditional feature engineering and machine learning operations on Trusted Flow. These documents are designed to help you effectively implement and leverage SQL based feature engineering, SQL based model creation, and algorithmic processing. The documentation guides you through the critical aspects necessary to seamlessly integrate advanced statistical modeling into your regular SQL data workflows.
Data Distiller equips you with the tools necessary to transform raw data into meaningful features, build and train statistical models, and use these models for predictive analysis. The documentation is organized to help you understand and apply these capabilities step by step:
Feature engineering: Learn how to preprocess your data by extracting, transforming, and selecting the most relevant features. Learn about the available SQL functions that simplify and automate the feature engineering process, and how to ensure your data is optimally prepared for model training.
Models: Discover how to manage, evaluate and predict advances statistical models using SQL. Understand the core processes involved in SQL to define the life cycle of these models on your datasets.
Algorithms: Explore the advanced statistical modeling algorithms supported by Data Distiller, including clustering, classification, and regression. This document details the process to use available algorithms, their parameters, and how to generate customer-specific models using SQL extension to meet your business needs.
To learn how to perform sophisticated machine learning tasks with Data Distiller capabilities, read the Feature Engineering document. It outlines how to transform your data into features that are ready for modeling. Next, proceed to the Models document, which guides you through the process of creating, training, and managing trusted models using the features you’ve engineered. Finally, explore the Implement advanced statistical models document to learn about the various trusted models available and how to implement them within your SQL workflows.