Tasks

Basic format

Label Studio expects the JSON-formatted list of tasks as input. Each task is a dictionary-like structure, with some specific keys reserved for internal use:

Note: in case "data" field is missing in imported task object, the whole task body is interpreted as task["data"], i.e. [{"my_key": "my_value"}] will be internally converted to [{"data": {"my_key": "my_value"}}]

Example

Here is an example of a config and tasks list composed of one element, for text classification project:

<View>
  <Text name="message" value="$my_text"/>
  <Choices name="sentiment_class" toName="message">
    <Choice value="Positive"/>
    <Choice value="Neutral"/>
    <Choice value="Negative"/>
  </Choices>
</View>
[{
  # "id" is a reserved field, avoid using it when importing tasks
  "id": 123,

  # "data" requires to contain "my_text" field defined by labeling config,
  # and can optionally include other fields
  "data": {
    "my_text": "Opossum is great",
    "ref_id": 456,
    "meta_info": {
      "timestamp": "2020-03-09 18:15:28.212882",
      "location": "North Pole"
    } 
  },

  # completions are the list of annotation results matched labeling config schema
  "completions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Positive"]
      }
    }]
  }],

  # "predictions" are pretty similar to "completions" 
  # except that they also include some ML related fields like prediction "score"
  "predictions": [{
    "result": [{
      "from_name": "sentiment_class",
      "to_name": "message",
      "type": "choices",
      "value": {
        "choices": ["Neutral"]
      }
    }],
  # score is used for active learning sampling mode
    "score": 0.95
  }]
}]

Import formats

There are a few possible ways to import data files to your labeling project:

The --input-path argument points to a file or a directory where your labeling tasks reside. By default it expects JSON-formatted tasks, but you can also specify all other formats listed bellow by using --input-format option.

JSON

label-studio init --input-path=my_tasks.json

tasks.json contains tasks in a basic Label Studio JSON format

Directory with JSON files

label-studio init --input-path=dir/with/json/files --input-format=json-dir

Instead of putting all tasks into one file, you can split your input data into several tasks.json, and specify the directory path. Each JSON file contains tasks in a basic Label Studio JSON format.

Note: that if you add more files into the directory then you need to restart Label Studio server.

CSV / TSV

When CSV / TSV formatted text file is used, column names are interpreted as task data keys:

my_text,optional_field
this is a first task,123
this is a second task,456

Note: Currently CSV / TSV files could be imported only in UI.

Note: If your config has one TimeSeries instance then CSV/TSV will be interpreted as time series data while import. This CSV/TSV will be hosted as a resource file. The LS will create a task automatically with a proper link to the uploaded CSV/TSV.

Plain text

label-studio init my-project --input-path=my_tasks.txt --input-format=text --label-config=config.xml

In a typical scenario, you may use only one input data stream (or in other words only one object tag specified in label config). In this case, you don’t need to use JSON format, but simply write down your values in a plain text file, line by line, e.g.

this is a first task
this is a second task

Directory with plain text files

label-studio init my-project --input-path=dir/with/text/files --input-format=text-dir --label-config=config.xml

You can split your input data into several plain text files, and specify the directory path. Then Label Studio scans each file line-by-line, creating one task per line. Each plain text file is formatted the same as above.

Directory with image files

label-studio init my-project --input-path=dir/with/images --input-format=image-dir --label-config=config.xml --allow-serving-local-files

WARNING: “–allow-serving-local-files” is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you’re doing.

You can point to a local directory, which is scanned recursively for image files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:

http://<host:port>/data/filename?d=<path/to/the/local/directory>

Supported formats are: .png .jpg .jpeg .tiff .bmp .gif

Directory with audio files

label-studio init my-project --input-path=my/audios/dir --input-format=audio-dir --label-config=config.xml --allow-serving-local-files

WARNING: “–allow-serving-local-files” is intended to use only for locally running instances: avoid using it for remote servers unless you are sure what you’re doing.

You can point to a local directory, which is scanned recursively for audio files. Each file is used to create one task. Since Label Studio works only with URLs, a web link is created for each task, pointing to your local directory as follows:

http://<host:port>/data/filename?d=<path/to/the/local/directory>

Supported formats are: .wav .aiff .mp3 .au .flac

Upload resource files on Import page

For label configs with one data key (e.g.: one input image) Label Studio supports a file uploading via GUI, just drag & drop your files (or select them from file dialog) on “Import” page. This option is suitable for limited file number.

Import using API

Use API to import tasks in Label Studio basic format if for any reason you can’t access either a local filesystem nor Web UI (e.g. if you are creating a data stream)

curl -X POST -H Content-Type:application/json http://localhost:8080/api/import \
--data "[{\"my_key\": \"my_value_1\"}, {\"my_key\": \"my_value_2\"}]"

Retrieve tasks using API

You can retrieve project settings including total task count using API in JSON format:

http://<host:port>/api/project

Response example:

{
  ... 
  "task_count": 3,
  ...
}

To get tasks with pagination in JSON format:

http://<host:port>/api/tasks?page=1&page_size=10&order={-}[id|completed_at]

Response example:

[
  {
    "completed_at": "2020-05-29 03:31:15", 
    "completions": [
      {
        "created_at": 1590712275, 
        "id": 10001, 
        "lead_time": 4.0, 
        "result": [ ... ]
      }
    ], 
    "data": {
      "image": "s3://htx-dev/dataset/training_set/dogs/dog.102.jpg"
    }, 
    "id": 2, 
    "predictions": []
  }
]

Sampling

You can define the way of how your imported tasks are exposed to annotators. Several options are available. To enable one of them, specify --sampling=<option> as command line option.

sequential

Tasks are ordered ascending by their "id" fields. This is default mode.

uniform

Tasks are sampled with equal probabilities.

prediction-score-min

Task with minimum average prediction score is taken. When this option is set, task["predictions"] list should be presented along with "score" field within each prediction.

prediction-score-max

Task with maximum average prediction score is taken. When this option is set, task["predictions"] list should be presented along with "score" field within each prediction.