Skip to content

Module Development

Note

This document has been machine translated.

Directory Structure

The directory structure of a module should follow the example below.

xdata-algo-modules/
├── module-1/  # Directory of another module project
├── module-2/  # Target module project directory
│   ├── src/  # Source code directory
│   │   ├── definitions/
│   │   │   └── definition.json  # File describing module definition information
│   │   └── xxxxx.py  # Source code to be executed within the module
│   ├── tests/  # Test code directory
│   │   └── test_xxxxx.py  # Test script
│   ├── .flake8  # Flake8 configuration file
│   ├── xxxxx-ci.yml  # Job configuration for GitLab Runner (per module)
│   ├── docker-compose.yml  # Docker‑compose file to launch the module container
│   ├── docker-compose.dev.yml  # Development‑time Docker‑compose file
│   ├── Dockerfile  # Dockerfile for module execution
│   ├── Dockerfile.dev  # Dockerfile for linter/formatter execution
│   ├── Dockerfile.test  # Dockerfile for test execution
│   ├── Makefile  # File defining various commands
│   ├── pyproject.toml  # Configuration file for tools such as linter
│   └── requirements.txt  # List of packages required at runtime
└── .gitlab-ci.yml  # Configuration file for GitLab Runner execution

Source Code to Execute Within the Module

Place the source code that should be launched when the module runs in the src directory.
The primary language is Python, but other languages may be used if necessary.

definition.json

When a module runs in xData‑FL, its definition information is written to
src/definitions/definition.json.
The structure of the definition file is as follows (subject to change):

{
  "name": "<Module name. Use an FQDN format similar to Java so that the module is uniquely identifiable>",
  "description": "<Module description>",
  "supports": "<TBD list of features supported by the module>",
  "definitions": [
    "Comment: <Define parameter types here>",
    {
      "name": "<Type name. Must be unique within definitions>",
      "description": "<Description of the type>",
      "definition": {
        "type": "<Data kind: model, *model_set*, data, others>",
        "form": "<file, directory, STDIN/STDOUT; if type is model_set only directory is allowed>",
        "files": [
          "Comment: <When form is file, define the file format here>",
          {
            "name": "<Name of input/output data (label)>",
            "path": "<File path within the directory; wildcards allowed>",
            "form": "<file or directory>",
            "files": "<When form is directory, define files inside the directory>",
            "format": {
              "": "Comment: <When form is file, define the file format here>",
              "type": "<File type such as CSV, JSON, Movie, etc.>",
              "structure": "<Structure of the file; varies by type>"
            }
          }
        ],
        "format": {
          "": "Comment: <When form is file, define the file format here>",
          "type": "<File type such as CSV, JSON, Movie, etc.>",
          "structure": "<Structure of the file; varies by type>"
        }
      }
    }
  ],
  "scripts": [
    {
      "name": "<Script label. When calling a Web API, provide the module name and this script name to specify the desired operation. The script name must be unique within the module. Combining the module name and script name uniquely identifies the script across modules>",
      "description": "<Human‑readable description of the script (or for an LLM)>",
      "path": "<Relative path from the module's top directory to the script file>",
      "type": "<Script type: training (input model & data, output model), prediction (input model & data, output data [and optional model update]), aggregation (multiple models → model), model_transformation (model → model), data_transformation (data → data)>",
      "input": [
        "Comment: <Define input data here>",
        {
          "name": "<Input/Output data label. Specify the table to retrieve or store from/to when calling a Web API. The name must be unique within the script>",
          "description": "<Human‑readable description of the data>",
          "path": "<Relative path within the input/output directory>",
          "paramtype": "<Name of the parameter type (string) or the definition itself (associative array)>"
        }
      ],
      "output": [
        "Comment: <Define output data here>",
        {
          "name": "<Output/Input data label. Specify the table to retrieve or store from/to when calling a Web API. The name must be unique within the script>",
          "description": "<Human‑readable description of the data>",
          "path": "<Relative path within the input/output directory>",
          "paramtype": "<Name of the parameter type (string) or the definition itself (associative array)>"
        }
      ]
    }
  ]
}

Example of a module definition JSON file:

{
  "name": "module.fed-avg-module.aggregate",
  "description": "FedAvg aggregate module",
  "supports": "",
  "definitions": {
    "the_model": {
      "description": "Model before and after aggregation",
      "definition": {
        "type": "model",
        "form": "directory",
        "files": [
          {
            "name": "model_data",
            "path": "model.dat.xdata_meta",
            "form": "file",
            "format": {
              "type": "json",
              "structure": ""
            }
          },
          {
            "name": "model_file",
            "path": "model.dat",
            "form": "file",
            "format": {
              "type": "state_dict",
              "structure": ""
            }
          }
        ]
      }
    }
  },
  "scripts": {
    "fed_avg_aggregate": {
      "description": "FedAvg aggregate",
      "path": "aggregate",
      "type": "aggregation",
      "input": {
        "input_model_set": {
          "description": "input models",
          "path": "models",
          "paramtype": {
            "type": "model_set",
            "form": "directory",
            "paramtype": "the_model"
          }
        }
      },
      "output": {
        "output_model": {
          "description": "output models",
          "path": "aggregated_model",
          "paramtype": "the_model"
        }
      }
    }
  }
}

Test Code

Place test code for the source files in the tests directory.
Create additional test sources as required to enable CI/CD execution.

.flake8

Configuration file for the Flake8 linter.
Specify it as follows:

[flake8]
max-line-length = 88
extend-ignore = E203,E402

Job Configuration for GitLab Runner (Per Module)

Define YAML files that are included by the parent .gitlab-ci.yml.
File names may be arbitrary. When specifying file paths within jobs, use paths relative to the parent .gitlab-ci.yml.

fed-avg-lint-format-check:
  stage: test
  tags:
    - cpu
  image: python:3.10-slim-bullseye
  script:
    - cd fed-avg-module/
    - pip install --upgrade pip
    - pip install --no-cache-dir flake8 black
    - flake8 .
    - black . --check

fed-avg-unit-test-cpu:
  stage: test
  tags:
    - cpu
  image: python:3.10-slim-bullseye
  variables:
    MODEL_DIR: ./test_models
  script:
    - cd fed-avg-module/
    - pip install --upgrade pip
    - pip install --no-cache-dir -r requirements.txt
    - pip install --no-cache-dir pytest
    - pytest

docker-compose.yml

Docker‑compose file to launch the module container.

version: '3'
services:
  aggregate:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      MODEL_DIR: ./models
    volumes:
      - ./mnist_models:/work/models

docker-compose.dev.yml

Docker‑compose file to launch containers for linting, formatting, and test execution.

version: '3'
services:
  test_aggregate:
    build:
      context: .
      dockerfile: Dockerfile.test
    environment:
      MODEL_DIR: ./models
    volumes:
      - ./src:/work/src
      - ./tests:/work/tests
      - ./test_models:/work/models
  format:
    build:
      context: .
      dockerfile: Dockerfile.dev
    volumes:
      - ./src:/work/src
      - ./tests:/work/tests
    command: black .
  lint:
    build:
      context: .
      dockerfile: Dockerfile.dev
    volumes:
      - ./src:/work/src
      - ./tests:/work/tests
    command: flake8 .

Dockerfile

Dockerfile for module execution and build.

FROM python:3.10-slim-bullseye

WORKDIR /work

COPY ./requirements.txt /work/
RUN pip install --no-cache-dir -r requirements.txt

COPY ./src/aggregate.py /work/

CMD ["python", "aggregate.py"]

Dockerfile.dev

Dockerfile for development tools such as linter and formatter.

FROM python:3.10-slim-bullseye

RUN pip install --no-cache-dir flake8 black
COPY ./pyproject.toml /work/
COPY ./.flake8 /work/

WORKDIR /work

Dockerfile.test

Dockerfile for test execution.

FROM python:3.10-slim-bullseye

COPY ./requirements.txt /work/

WORKDIR /work

RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir pytest

CMD ["pytest"]

Makefile

Convenient commands for development tasks.

dev/test:
    docker compose -f docker-compose.dev.yml run --rm test_aggregate
dev/format:
    docker compose -f docker-compose.dev.yml run --rm format
dev/lint:
    docker compose -f docker-compose.dev.yml run --rm lint

Run commands with make, e.g., make dev/lint.

pyproject.toml

Central configuration file for the Python project.
Example configuration for Black:

[tool.black]
skip-string-normalization = true
target-version = ['py38']

requirements.txt

List of packages required at runtime.

fsspec==2023.12.1
torch
torchvision
torchaudio

Parent .gitlab-ci.yml

The .gitlab-ci.yml located one level above the module development directory.
GitLab Runner reads this file to execute jobs. Include module‑specific job files as follows:

stages:
  - test
  - build
  - deploy

default:
  tags:
    - cpu

include:
  - local: fed-avg-module/fed-avg-ci.yml