Data Science Workspace is no longer available for purchase.
This documentation is intended for existing customers with prior entitlements to Data Science Workspace.
The Model Insights Framework provides the data scientist with tools in Data Science Workspace to make quick and informed choices for optimal machine learning models based on experiments. The framework will improve the speed and effectiveness of the machine learning workflow as well as improving ease of use for data scientists. This is done by providing a default template for each machine learning algorithm type to assist with model tuning. The end result allows data scientists and citizen data scientists to make better model optimization decisions for their end customers.
After implementing and training a model, the next step a data scientist would do is to find how well the model will perform. Various metrics are used to find how effective a model will do compared with others. Some examples of metrics used include:
Currently, the Model Insights Framework supports the following runtimes:
Sample code for recipes can be found in the experience-platform-dsw-reference repository under recipes
. Specific files from this repository will be referenced throughout this tutorial.
There are two ways to bring in metrics to the recipes. One is to use the default evaluation metrics provided by the SDK and the other is to write custom evaluation metrics.
Default evaluations are calculated as part of the classification algorithms. Here are some default values for evaluators that are currently implemented:
Evaluator Class | evaluation.class |
---|---|
DefaultBinaryClassificationEvaluator | com.adobe.platform.ml.impl.DefaultBinaryClassificationEvaluator |
DefaultMultiClassificationEvaluator | com.adobe.platform.ml.impl.DefaultMultiClassificationEvaluator |
RecommendationsEvaluator | com.adobe.platform.ml.impl.RecommendationsEvaluator |
The evaluator can be defined in the recipe in the application.properties file in the recipe
folder. Sample code enabling the DefaultBinaryClassificationEvaluator
is shown below:
evaluation.class=com.adobe.platform.ml.impl.DefaultBinaryClassificationEvaluator
evaluation.labelColumn=label
evaluation.predictionColumn=prediction
training.evaluate=true
After an evaluator class is enabled, a number of metrics will be calculated during training by default. Default metrics can be declared explicitly by adding the following line to your application.properties
.
evaluation.metrics.com=com.adobe.platform.ml.impl.Constants.DEFAULT
If the metric is not defined, the default metrics will be active.
A specific metric can be enabled by changing the value for evaluation.metrics.com
. In the following example, the F-Score metric is enabled.
evaluation.metrics=com.adobe.platform.ml.impl.Constants.FSCORE
The following table state the default metrics for each class. A user can also use the values in the evaluation.metric
column to enable a specific metric.
evaluator.class |
Default Metrics | evaluation.metric |
---|---|---|
DefaultBinaryClassificationEvaluator |
-Precision -Recall -Confusion Matrix -F-Score -Accuracy -Receiver Operating Characteristics -Area Under the Receiver Operating Characteristics |
-PRECISION - RECALL - CONFUSION_MATRIX - FSCORE - ACCURACY - ROC - AUROC |
DefaultMultiClassificationEvaluator |
-Precision -Recall -Confusion Matrix -F-Score -Accuracy -Receiver Operating Characteristics -Area Under the Receiver Operating Characteristics |
-PRECISION - RECALL - CONFUSION_MATRIX - FSCORE - ACCURACY - ROC - AUROC |
RecommendationsEvaluator |
-Mean Average Precision (MAP) -Normalized Discounted Cumulative Gain -Mean Reciprocal Rank -Metric K |
-MEAN_AVERAGE_PRECISION - NDCG - MRR - METRIC_K |
The custom evaluator can be provided by extending the interface of MLEvaluator.scala
in your Evaluator.scala
file. In the example Evaluator.scala file, we define custom split()
and evaluate()
functions. Our split()
function splits our data randomly with a ratio of 8:2 and our evaluate()
function defines and returns 3 metrics: MAPE, MAE, and RMSE.
For the MLMetric
class, do not use "measures"
for valueType
when creating a new MLMetric
else the metric will not populate in the custom evaluation metrics table.
Do this: metrics.add(new MLMetric("MAPE", mape, "double"))
Not this: metrics.add(new MLMetric("MAPE", mape, "measures"))
Once defined in the recipe, the next step is to enable it in the recipes. This is done in the application.properties file in the project’s resources
folder. Here the evaluation.class
is set to the Evaluator
class defined in Evaluator.scala
evaluation.class=com.adobe.platform.ml.Evaluator
In the Data Science Workspace, the user would be able to see the insights in the “Evaluation Metrics” tab in the experiment page.
As of now, there are no default evaluation metrics for Python or Tensorflow. Thus, to get the evaluation metrics for Python or Tensorflow, you will need to create a custom evaluation metric. This can be done by implementing the Evaluator
class.
For custom evaluation metrics, there are two main methods that need to be implemented for the evaluator: split()
and evaluate()
.
For Python, these methods would be defined in evaluator.py for the Evaluator
class. Follow the evaluator.py link for an example of the Evaluator
.
Creating evaluation metrics in Python requires the user to implement the evaluate()
and split()
methods.
The evaluate()
method returns the metric object which contains an array of metric objects with properties of name
, value
, and valueType
.
The purpose of the split()
method is to input data and to output a training and a testing dataset. In our example, the split()
method inputs data using the DataSetReader
SDK and then cleans up the data by removing unrelated columns. From there, additional features are created from existing raw features in the data.
The split()
method should return a training and testing dataframe which is then used by the pipeline()
methods to train and test the ML model.
For Tensorflow, similar to Python, the methods evaluate()
and split()
in the Evaluator
class will need to be implemented. For evaluate()
, the metrics should be returned while split()
returns the train and test data sets.
from ml.runtime.python.Interfaces.AbstractEvaluator import AbstractEvaluator
class Evaluator(AbstractEvaluator):
def __init__(self):
print ("initiate")
def evaluate(self, data=[], model={}, config={}):
return metrics
def split(self, config={}):
return 'train', 'test'
As of now, there are no default evaluation metrics for R. Thus, to get the evaluation metrics for R, you will need to define the applicationEvaluator
class as part of the recipe.
The main purpose of the applicationEvaluator
is to return a JSON object containing key-value pairs of metrics.
This applicationEvaluator.R can be used as an example. In this example, the applicationEvaluator
is split into three familiar sections:
Data is first loaded to a dataset from a source as defined in retail.config.json. From there, the data is cleaned and engineered to fit the machine learning model. Lastly, the model is used to make a prediction using our dataset and from the predicted values and actual values, metrics are calculated. In this case, MAPE, MAE, and RMSE are defined and returned in the metrics
object.
The Sensei Model Insights Framework will support one default template for each type of machine learning algorithm. The table below shows common high-level machine learning algorithm classes and corresponding evaluation metrics and visualizations.
ML Algorithm Type | Evaluation Metrics | Visualizations |
---|---|---|
Regression | - RMSE - MAPE - MASE - MAE |
Predicted vs actual values overlay curve |
Binary classification | - Confusion matrix - Precision-recall - Accuracy - F-score (specifically F1 ,F2) - AUC - ROC |
ROC curve and confusion matrix |
Multi-class classification | -Confusion matrix - For each class: - precision-recall accuracy - F-score (specifically F1, F2) |
ROC curve and confusion matrix |
Clustering (w/ ground truth) | - NMI (normalized mutual information score), AMI (adjusted mutual information score) - RI (Rand index), ARI (adjusted Rand index) - homogeneity score, completeness score, and V-measure - FMI (Fowlkes-Mallows index) - Purity - Jaccard index |
Clusters plot showing clusters and centroids with relative cluster sizes reflective of data points falling within cluster |
Clustering (w/o ground truth) | - Inertia - Silhouette coefficient - CHI (Calinski-Harabaz index) - DBI (Davies–Bouldin index) - Dunn index |
Clusters plot showing clusters and centroids with relative cluster sizes reflective of data points falling within cluster |
Recommendation | -Mean Average Precision (MAP) -Normalized Discounted Cumulative Gain -Mean Reciprocal Rank -Metric K |
TBD |
TensorFlow use cases | TensorFlow Model Analysis (TFMA) | Deepcompare neural network model comparison/visualization |
Other/error capture mechanism | Custom metric logic (and corresponding evaluation charts) defined by model author. Graceful error handling in case of template mismatch | Table with key-value pairs for evaluation metrics |