Module Development
Note
This document has been machine translated.
Directory Structure
The directory structure of a module should follow the example below.
xdata-algo-modules/
├── module-1/ # Directory of another module project
├── module-2/ # Target module project directory
│ ├── src/ # Source code directory
│ │ ├── definitions/
│ │ │ └── definition.json # File describing module definition information
│ │ └── xxxxx.py # Source code to be executed within the module
│ ├── tests/ # Test code directory
│ │ └── test_xxxxx.py # Test script
│ ├── .flake8 # Flake8 configuration file
│ ├── xxxxx-ci.yml # Job configuration for GitLab Runner (per module)
│ ├── docker-compose.yml # Docker‑compose file to launch the module container
│ ├── docker-compose.dev.yml # Development‑time Docker‑compose file
│ ├── Dockerfile # Dockerfile for module execution
│ ├── Dockerfile.dev # Dockerfile for linter/formatter execution
│ ├── Dockerfile.test # Dockerfile for test execution
│ ├── Makefile # File defining various commands
│ ├── pyproject.toml # Configuration file for tools such as linter
│ └── requirements.txt # List of packages required at runtime
└── .gitlab-ci.yml # Configuration file for GitLab Runner execution
Source Code to Execute Within the Module
Place the source code that should be launched when the module runs in the src directory.
The primary language is Python, but other languages may be used if necessary.
definition.json
When a module runs in xData‑FL, its definition information is written to
src/definitions/definition.json.
The structure of the definition file is as follows (subject to change):
{
"name": "<Module name. Use an FQDN format similar to Java so that the module is uniquely identifiable>",
"description": "<Module description>",
"supports": "<TBD list of features supported by the module>",
"definitions": [
"Comment: <Define parameter types here>",
{
"name": "<Type name. Must be unique within definitions>",
"description": "<Description of the type>",
"definition": {
"type": "<Data kind: model, *model_set*, data, others>",
"form": "<file, directory, STDIN/STDOUT; if type is model_set only directory is allowed>",
"files": [
"Comment: <When form is file, define the file format here>",
{
"name": "<Name of input/output data (label)>",
"path": "<File path within the directory; wildcards allowed>",
"form": "<file or directory>",
"files": "<When form is directory, define files inside the directory>",
"format": {
"": "Comment: <When form is file, define the file format here>",
"type": "<File type such as CSV, JSON, Movie, etc.>",
"structure": "<Structure of the file; varies by type>"
}
}
],
"format": {
"": "Comment: <When form is file, define the file format here>",
"type": "<File type such as CSV, JSON, Movie, etc.>",
"structure": "<Structure of the file; varies by type>"
}
}
}
],
"scripts": [
{
"name": "<Script label. When calling a Web API, provide the module name and this script name to specify the desired operation. The script name must be unique within the module. Combining the module name and script name uniquely identifies the script across modules>",
"description": "<Human‑readable description of the script (or for an LLM)>",
"path": "<Relative path from the module's top directory to the script file>",
"type": "<Script type: training (input model & data, output model), prediction (input model & data, output data [and optional model update]), aggregation (multiple models → model), model_transformation (model → model), data_transformation (data → data)>",
"input": [
"Comment: <Define input data here>",
{
"name": "<Input/Output data label. Specify the table to retrieve or store from/to when calling a Web API. The name must be unique within the script>",
"description": "<Human‑readable description of the data>",
"path": "<Relative path within the input/output directory>",
"paramtype": "<Name of the parameter type (string) or the definition itself (associative array)>"
}
],
"output": [
"Comment: <Define output data here>",
{
"name": "<Output/Input data label. Specify the table to retrieve or store from/to when calling a Web API. The name must be unique within the script>",
"description": "<Human‑readable description of the data>",
"path": "<Relative path within the input/output directory>",
"paramtype": "<Name of the parameter type (string) or the definition itself (associative array)>"
}
]
}
]
}
Example of a module definition JSON file:
{
"name": "module.fed-avg-module.aggregate",
"description": "FedAvg aggregate module",
"supports": "",
"definitions": {
"the_model": {
"description": "Model before and after aggregation",
"definition": {
"type": "model",
"form": "directory",
"files": [
{
"name": "model_data",
"path": "model.dat.xdata_meta",
"form": "file",
"format": {
"type": "json",
"structure": ""
}
},
{
"name": "model_file",
"path": "model.dat",
"form": "file",
"format": {
"type": "state_dict",
"structure": ""
}
}
]
}
}
},
"scripts": {
"fed_avg_aggregate": {
"description": "FedAvg aggregate",
"path": "aggregate",
"type": "aggregation",
"input": {
"input_model_set": {
"description": "input models",
"path": "models",
"paramtype": {
"type": "model_set",
"form": "directory",
"paramtype": "the_model"
}
}
},
"output": {
"output_model": {
"description": "output models",
"path": "aggregated_model",
"paramtype": "the_model"
}
}
}
}
}
Test Code
Place test code for the source files in the tests directory.
Create additional test sources as required to enable CI/CD execution.
.flake8
Configuration file for the Flake8 linter.
Specify it as follows:
[flake8]
max-line-length = 88
extend-ignore = E203,E402
Job Configuration for GitLab Runner (Per Module)
Define YAML files that are included by the parent .gitlab-ci.yml.
File names may be arbitrary. When specifying file paths within jobs, use paths relative to the parent .gitlab-ci.yml.
fed-avg-lint-format-check:
stage: test
tags:
- cpu
image: python:3.10-slim-bullseye
script:
- cd fed-avg-module/
- pip install --upgrade pip
- pip install --no-cache-dir flake8 black
- flake8 .
- black . --check
fed-avg-unit-test-cpu:
stage: test
tags:
- cpu
image: python:3.10-slim-bullseye
variables:
MODEL_DIR: ./test_models
script:
- cd fed-avg-module/
- pip install --upgrade pip
- pip install --no-cache-dir -r requirements.txt
- pip install --no-cache-dir pytest
- pytest
docker-compose.yml
Docker‑compose file to launch the module container.
version: '3'
services:
aggregate:
build:
context: .
dockerfile: Dockerfile
environment:
MODEL_DIR: ./models
volumes:
- ./mnist_models:/work/models
docker-compose.dev.yml
Docker‑compose file to launch containers for linting, formatting, and test execution.
version: '3'
services:
test_aggregate:
build:
context: .
dockerfile: Dockerfile.test
environment:
MODEL_DIR: ./models
volumes:
- ./src:/work/src
- ./tests:/work/tests
- ./test_models:/work/models
format:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
- ./src:/work/src
- ./tests:/work/tests
command: black .
lint:
build:
context: .
dockerfile: Dockerfile.dev
volumes:
- ./src:/work/src
- ./tests:/work/tests
command: flake8 .
Dockerfile
Dockerfile for module execution and build.
FROM python:3.10-slim-bullseye
WORKDIR /work
COPY ./requirements.txt /work/
RUN pip install --no-cache-dir -r requirements.txt
COPY ./src/aggregate.py /work/
CMD ["python", "aggregate.py"]
Dockerfile.dev
Dockerfile for development tools such as linter and formatter.
FROM python:3.10-slim-bullseye
RUN pip install --no-cache-dir flake8 black
COPY ./pyproject.toml /work/
COPY ./.flake8 /work/
WORKDIR /work
Dockerfile.test
Dockerfile for test execution.
FROM python:3.10-slim-bullseye
COPY ./requirements.txt /work/
WORKDIR /work
RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir pytest
CMD ["pytest"]
Makefile
Convenient commands for development tasks.
dev/test:
docker compose -f docker-compose.dev.yml run --rm test_aggregate
dev/format:
docker compose -f docker-compose.dev.yml run --rm format
dev/lint:
docker compose -f docker-compose.dev.yml run --rm lint
Run commands with make, e.g., make dev/lint.
pyproject.toml
Central configuration file for the Python project.
Example configuration for Black:
[tool.black]
skip-string-normalization = true
target-version = ['py38']
requirements.txt
List of packages required at runtime.
fsspec==2023.12.1
torch
torchvision
torchaudio
Parent .gitlab-ci.yml
The .gitlab-ci.yml located one level above the module development directory.
GitLab Runner reads this file to execute jobs. Include module‑specific job files as follows:
stages:
- test
- build
- deploy
default:
tags:
- cpu
include:
- local: fed-avg-module/fed-avg-ci.yml