PostgresML is an end-to-end machine learning system. It enables you to train models and make online predictions using only SQL, without your data ever leaving your favorite database.
Motivation behind PostgresML
Deploying machine learning models into existing applications is not straight forward. It involves operating new services, which need to be written in specialized languages with libraries outside of the experience of many software engineers. Those services tend to be architected around specialized datastores and hardware that requires additional management and know how. Data access needs to be secure across production and development environments without impeding productivity. This complexity pushes risks and costs beyond acceptable trade off limits for many otherwise valuable use cases.
PostgresML makes ML simple by moving the code to your data, rather than copying the data all over the place. You train models using simple SQL commands, and you get the predictions in your apps via a mechanism you’re already using: a query over a standard Postgres connection.
Our goal is that anyone with a basic understanding of SQL should be able to build, deploy and maintain machine learning models in production, while receiving the benefits of a high performance machine learning platform. Ultimately, PostgresML aims to be the easiest, safest and fastest way to gain value from machine learning.
Given a Postgres table or a view, PostgresML can train a model with many commonly used algorithms. We currently support the following regression and classification models from Scikit-Learn and XGBoost:
Support Vector Machines
Training a model is then as simple as:
SELECT * FROM pgml.train( 'Human-friendly project name', 'regression', '<name of the table or view containing the data>', '<name of the column containing the y target values>', '<algorithm name>', -- optional '<algorithm hyperparams>' -- optional );
PostgresML will snapshot the data from the table, train the model with the algorithm, and automatically deploy model improvements as measured by key performance metrics to make predictions in production.
Once the model is trained, making predictions is as simple as:
SELECT pgml.predict('Human-friendly project name', ARRAY[...]) AS prediction_score;
ARRAY[...] is the same list of features for a sample used in training. This score then can be used in normal queries, for example:
SELECT *, pgml.predict( 'Probability of buying our products', ARRAY[user.location, NOW() - user.created_at, user.total_purchases_in_dollars] ) AS likely_to_buy_score FROM users WHERE comapany_id = 5 ORDER BY likely_to_buy_score LIMIT 25;
Take a look below for an example with real data.
Model and data versioning
As data in your database changes, it is possible to retrain the model again to get better predictions. With PostgresML, it’s as simple as running the
pgml.train command again. If the model scores better, it will be automatically used in predictions; if not, the existing model will be kept and continue to score in your queries. There is also a deployment API if you need to manually manage which model is active. We also snapshot the training data, so models can be retrained deterministically to validate and fix any issues.
Models are automatically deployed if their key metric (
mean_squared_error for regression,
f1 for classification) is improved over the currently deployed version during training. If you want to manage deploys manually, you can always change which model is currently responsible for making predictions with:
SELECT pgml.deploy(project_name TEXT, strategy TEXT DEFAULT 'best_score', algorithm_name TEXT DEFAULT NULL)
The default behavior allows any algorithm to qualify, but deployment candidates can be further restricted to a specific algorithm.
|most_recent||The most recently trained model for this project|
|best_score||The model that achieved the best key metric score|
|rollback||The model that was previously deployed for this project|
What is Copy-On-Write
Copy-On-Write (COW) is a strategy used in computer programming and operating systems to optimize the performance and efficiency of memory...
Mobile App Development Best Practices – 11.12
Large photos in SwiftUI, Mutation Testing, Processing Payments in Firebase and more!
SDKs for which Apple will require a signature have been published
Apple has already listed SDKs commonly used in apps in the App Store that, from spring 2024, you will be...
Discover the Journaling Suggestions API
Journaling Suggestions is a visual picker interface for iPhone applications. It displays personal events that are happening in a person’s...
Tusky – An Android client for the microblogging server Mastodon
Tusky is a beautiful Android client for Mastodon. Mastodon is an ActivityPub federated social network. That means no single entity controls...
Swift 30 Projects: 30 mini Swift Apps for self-study
This repo is updated with Swift 5 and compatible with iPhone X: Simple UIKit components UIScrollView, UITableView, UICollectionView CAAnimations and...