Machine learning backend
You can easily connect your favorite machine learning framework with Label Studio Machine Learning SDK.
That gives you the opportunities to use:
- Pre-labeling: Use model predictions for pre-labeling (e.g. make use on-the-fly model predictions for creating rough image segmentations for further manual refinements)
- Autolabeling: Create automatic annotations
- Online Learning: Simultaneously update (retrain) your model while new annotations are coming
- Active Learning: Perform labeling in active learning mode - select only most complex examples
- Prediction Service: Instantly create running production-ready prediction service
Tutorials
- Create the simplest ML backend
- Text classification with Scikit-Learn
- Transfer learning for images with PyTorch
Create your own ML backend
Check examples in label-studio/ml/examples
directory.
Quickstart
Here is a quick example tutorial on how to run the ML backend with a simple text classifier:
- Clone repo
git clone https://github.com/heartexlabs/label-studio
- Setup environment
cd label-studio pip install -e . cd label_studio/ml/examples pip install -r requirements.txt
- Create new ML backend
label-studio-ml init my_ml_backend --script label_studio/ml/examples/simple_text_classifier.py
- Start ML backend server
label-studio-ml start my_ml_backend
Run Label Studio connecting it to the running ML backend:
label-studio start text_classification_project --init --template text_sentiment --ml-backends http://localhost:9090
You can confirm that the model has connected properly from the
/model
subpage in the Label Studio UI.Getting predictions
You should see model predictions in the labeling interface. For example in an image classification task: the model will
pre-select an image class for you to verify.Model training
Model training can be triggered manually by pushing the Start Training button on the
/model
page, or by using an API call:curl -X POST http://localhost:8080/api/train
In development mode, training logs will have an output into the console. In production mode, runtime logs are available in
my_backend/logs/uwsgi.log
and RQ training logs inmy_backend/logs/rq.log
Start with docker compose
Label Studio ML scripts include everything you need to create production ready ML backend server, powered by docker. It uses uWSGI + supervisord stack, and handles background training jobs using RQ.
After running this command:
label-studio-ml init my-ml-backend --script label_studio/ml/examples/simple_text_classifier.py
you’ll see configs in my-ml-backend/
directory needed to build and run docker image using docker-compose.
Some preliminaries:
Ensure all requirements are specified in
my-ml-backend/requirements.txt
file, e.g. placescikit-learn
- There are no services currently running on ports 9090, 6379 (otherwise change default ports in
my-ml-backend/docker-compose.yml
)
Then from my-ml-backend/
directory run
docker-compose up
The server starts listening on port 9090, and you can connect it to Label Studio by specifying --ml-backends http://localhost:9090
or via UI on Model page.
Active Learning
The process of creating annotated training data for supervised machine learning models is often expensive and time-consuming. Active Learning is a branch of machine learning that seeks to minimize the total amount of data required for labeling by strategically sampling observations that provide new insight into the problem. In particular, Active Learning algorithms seek to select diverse and informative data for annotation (rather than random observations) from a pool of unlabeled data using prediction scores.
Depending on score types you can select a sampling strategy
- prediction-score-min (min is the best score)
- prediction-score-max (max is the best score)
Read more about active learning sampling on the task page.