Predictive Model Markup Language (PMML) is an XML-based interchange format that allows machine- learning models to be easily shared between applications and systems. Supported models include logistic regression, neural networks, decision trees, naïve Bayes, regression models, and many others. A typical PMML file consists of the following sections:
- Header containing general information
- Data dictionary, describing data types
- Data transformations, specifying steps for normalization, discretization, aggregations, or custom functions
- Model definition, including parameters
- Mining schema listing attributes used by the model
- Targets allowing post-processing of the predicted results
- Output listing fields to be output and other post-processing steps
The generated PMML files can be imported to any PMML-consuming application, such as Zementis adaptive decision and predictive analytics (ADAPA) and universal PMML Plug-In (UPPI) scoring engines; Weka, which has built-in support for regression, general regression, neural network, TreeModel, RuleSetModel, and support vector machine (SVM) model; Spark, which can export k-means clustering, linear regression, ridge regression, lasso model, binary logistic model, and SVM; and cascading, which can transform PMML files into an application on Apache Hadoop.
The next generation of PMML is an emerging format called portable format for analytics (PFA), providing a common interface to deploy the complete workflows across environments.