Loading bridge
Homepage
Open in app
Sign in
Get started
data from the trenches
the nitty gritty of data science by the experts @ dataiku
Machine Learning
Engineering
Product
Dataiku
Follow
Latest
Making Neural Networks Smaller for Better Deployment
Making Neural Networks Smaller for Better Deployment
Deep learning models are known for their impressive size, but how do you deploy those on edge devices where size matters?
Vincent Houdebine
Jul 30
Towards Reliable ML Ops with Drift Detectors
Towards Reliable ML Ops with Drift Detectors
Detecting drift is critical in the monitoring of deployed models. So, how do we efficiently detect data drift ?
Simona Maggio
Jul 16
How Do Gradient Boosting Algorithms Handle Categorical Variables?
How Do Gradient Boosting Algorithms Handle Categorical Variables?
We review and experiment the various categorical encoding strategies of xgboost, lightgbm and catboost.
Pierre Louis Saint
Jul 3
The Many Flavors of Gradient Boosting Algorithms
The Many Flavors of Gradient Boosting Algorithms
We present the inner workings of XGBoost, Lightgbm, HistGradBoosting and compare their performance. Does one stand out ?
Pierre Louis Saint
Jun 19
Hunting for the Optimal AutoML Library
Hunting for the Optimal AutoML Library
Spoiler alert: on average, all algorithms except grid search produce similarly performant models.
Aimee Coelho
Jun 9
A Primer on Data Drift
A Primer on Data Drift
When Machine Learning models are not relevant anymore, it might be due to underlying data drift. Here, we introduce and review data drift.
Du Phan
May 22
Narrowing the Search: Which Hyperparameters Really Matter?
Narrowing the Search: Which Hyperparameters Really Matter?
Studying Hyperparameter Importance to Speed Up Optimization
Aimee Coelho
May 7
A (Slightly) Better Budget Allocation for Hyperband
A (Slightly) Better Budget Allocation for Hyperband
Rounding operations can lead Hyperband not to use 7% of the available budget. We propose a method that reduces unused budget to 3%.
Alexandre Abraham
Apr 30
Explaining Bias In Your Data
Explaining Bias In Your Data
An in-depth review of unfairness causes and their root in data.
Alexandre Landeau
Apr 23
Rediscovering Semi-Supervised Learning
Rediscovering Semi-Supervised Learning
How to make the most of your unlabeled data ? Can traditional semi-supervised techniques boost performance ?
Gaëlle Guillou
Apr 9
The Learning Rate Black Magic
The Learning Rate Black Magic
Evaluation of the Learning Rate Finder
Simona Maggio
Mar 26
Diverse Mini-Batch Active Learning: A Reproduction Exercise
Diverse Mini-Batch Active Learning: A Reproduction Exercise
Lessons learned from reproducing “Diverse Mini-Batch Active Learning”, a strategy mixing uncertainty and diversity techniques.
Alexandre Abraham
Mar 12
A Proactive Look at Active Learning Packages
A Proactive Look at Active Learning Packages
Introduction to Active Learning through a quick benchmark of major Python packages: modAL, libact, and alipy.
Alexandre Abraham
Feb 20
About data from the trenches
Latest Stories
Archive
About Medium
Terms
Privacy