Sign in Get started

data from the trenches

the nitty gritty of data science by the experts @ dataiku

Latest

Making Neural Networks Smaller for Better Deployment

Making Neural Networks Smaller for Better Deployment

Deep learning models are known for their impressive size, but how do you deploy those on edge devices where size matters?

Vincent Houdebine

Jul 30

Towards Reliable ML Ops with Drift Detectors

Towards Reliable ML Ops with Drift Detectors

Detecting drift is critical in the monitoring of deployed models. So, how do we efficiently detect data drift ?

Jul 16

How Do Gradient Boosting Algorithms Handle Categorical Variables?

How Do Gradient Boosting Algorithms Handle Categorical Variables?

We review and experiment the various categorical encoding strategies of xgboost, lightgbm and catboost.

Pierre Louis Saint

Jul 3

The Many Flavors of Gradient Boosting Algorithms

The Many Flavors of Gradient Boosting Algorithms

We present the inner workings of XGBoost, Lightgbm, HistGradBoosting and compare their performance. Does one stand out ?

Pierre Louis Saint

Jun 19

Hunting for the Optimal AutoML Library

Hunting for the Optimal AutoML Library

Spoiler alert: on average, all algorithms except grid search produce similarly performant models.

Jun 9

A Primer on Data Drift

A Primer on Data Drift

When Machine Learning models are not relevant anymore, it might be due to underlying data drift. Here, we introduce and review data drift.

May 22

Narrowing the Search: Which Hyperparameters Really Matter?

Narrowing the Search: Which Hyperparameters Really Matter?

Studying Hyperparameter Importance to Speed Up Optimization

May 7

A (Slightly) Better Budget Allocation for Hyperband

A (Slightly) Better Budget Allocation for Hyperband

Rounding operations can lead Hyperband not to use 7% of the available budget. We propose a method that reduces unused budget to 3%.

Alexandre Abraham

Apr 30

Explaining Bias In Your Data

Explaining Bias In Your Data

An in-depth review of unfairness causes and their root in data.

Alexandre Landeau

Apr 23

Rediscovering Semi-Supervised Learning

Rediscovering Semi-Supervised Learning

How to make the most of your unlabeled data ? Can traditional semi-supervised techniques boost performance ?

Gaëlle Guillou

Apr 9

The Learning Rate Black Magic

The Learning Rate Black Magic

Evaluation of the Learning Rate Finder

Mar 26

Diverse Mini-Batch Active Learning: A Reproduction Exercise

Diverse Mini-Batch Active Learning: A Reproduction Exercise

Lessons learned from reproducing “Diverse Mini-Batch Active Learning”, a strategy mixing uncertainty and diversity techniques.

Alexandre Abraham

Mar 12

A Proactive Look at Active Learning Packages

A Proactive Look at Active Learning Packages

Introduction to Active Learning through a quick benchmark of major Python packages: modAL, libact, and alipy.

Alexandre Abraham

Feb 20

About data from the trenchesLatest StoriesArchiveAbout MediumTermsPrivacy