Machine learning backend

You can easily connect your favorite machine learning framework with Label Studio Machine Learning SDK.

That gives you the opportunities to use:

Pre-labeling: Use model predictions for pre-labeling (e.g. make use on-the-fly model predictions for creating rough image segmentations for further manual refinements)
Autolabeling: Create automatic annotations
Online Learning: Simultaneously update (retrain) your model while new annotations are coming
Active Learning: Perform labeling in active learning mode - select only most complex examples
Prediction Service: Instantly create running production-ready prediction service

Tutorials

Create your own ML backend

Check examples in label-studio/ml/examples directory.

Quickstart

Here is a quick example tutorial on how to run the ML backend with a simple text classifier:

Clone repo

git clone https://github.com/heartexlabs/label-studio

Setup environment

cd label-studio
pip install -e .
cd label_studio/ml/examples
pip install -r requirements.txt

Create new ML backend

label-studio-ml init my_ml_backend --script label_studio/ml/examples/simple_text_classifier.py

Start ML backend server
```
label-studio-ml start my_ml_backend
```

Run Label Studio connecting it to the running ML backend:
```
label-studio start text_classification_project --init --template text_sentiment --ml-backends http://localhost:9090
```
You can confirm that the model has connected properly from the /model subpage in the Label Studio UI.
Getting predictions
You should see model predictions in the labeling interface. For example in an image classification task: the model will
pre-select an image class for you to verify.
Model training

Model training can be triggered manually by pushing the Start Training button on the /model page, or by using an API call:
```
curl -X POST http://localhost:8080/api/train
```
In development mode, training logs will have an output into the console. In production mode, runtime logs are available in
my_backend/logs/uwsgi.log and RQ training logs in my_backend/logs/rq.log

Start with docker compose

Label Studio ML scripts include everything you need to create production ready ML backend server, powered by docker. It uses uWSGI + supervisord stack, and handles background training jobs using RQ.

After running this command:

label-studio-ml init my-ml-backend --script label_studio/ml/examples/simple_text_classifier.py

you’ll see configs in my-ml-backend/ directory needed to build and run docker image using docker-compose.

Some preliminaries:

Ensure all requirements are specified in my-ml-backend/requirements.txt file, e.g. place
```
scikit-learn
```

There are no services currently running on ports 9090, 6379 (otherwise change default ports in my-ml-backend/docker-compose.yml)

Then from my-ml-backend/ directory run

docker-compose up

The server starts listening on port 9090, and you can connect it to Label Studio by specifying --ml-backends http://localhost:9090
or via UI on Model page.

Active Learning

The process of creating annotated training data for supervised machine learning models is often expensive and time-consuming. Active Learning is a branch of machine learning that seeks to minimize the total amount of data required for labeling by strategically sampling observations that provide new insight into the problem. In particular, Active Learning algorithms seek to select diverse and informative data for annotation (rather than random observations) from a pool of unlabeled data using prediction scores.

Depending on score types you can select a sampling strategy

prediction-score-min (min is the best score)
prediction-score-max (max is the best score)

Read more about active learning sampling on the task page.

← Frontend reference