Module Development

Note

This document has been machine translated.

Directory structure

The following format should be used as a reference for the directory structure of the module.

xdata-algo-modules/
├── module-1/ # Another module project directory
├── module-2/ # module project directory to be developed
│ ├── src/ # source code directory
│ │ ├── definitions/
│ │ │ └── definition.json # File describing module definition information
│ │ └── xxxxx.py # Source code to run in the module
│ ├── tests/ # Test code directory
│ │ └── test_xxxxxxx.py # test code
│ ├── .flake8 # Flake8 configuration file
│ ├── xxxxxxx-ci.yml # Configuration file for jobs run by GitLab Runner (for each module)
│ ├── docker-compose.yml # docker-compose.yml file to launch the container for a module
│ ├── docker-compose.dev.yml # docker-compose.yml file for development
│ ├── Dockerfile # Dockerfile for module execution
│ ├── Dockerfile.dev # Dockerfile for linter and formatter execution
Dockerfile.test # Dockerfile for test code execution
Dockerfile.dev # Dockerfile for executing linter and formatter │ ├── Dockerfile.test # Dockerfile for executing test code
Dockerfile for running test code │ ├── pyproject.toml # Configuration file for linter, etc.
└── requirements.txt # File that describes the list of packages required when executing source code
└── .gitlab-ci.yml # Configuration file to run GitLab Runner.

Source code to run in the module

Place the source code you want to run when the module is executed in the src directory. Basically, it should be written in Python, but it can be written in another language if necessary.

definition.json

The definition information for the module to work with xData-FL is written in src/definitions/definition.json. The format of the definition information is as follows. (*This may change in the future due to many undecided elements.)

{
  "name": "<Name of the module, in the form of FQDN as in java. This allows the module to be uniquely identified>", ‘description’: ‘<module description>’, "<module name
  “description": '<Description of the module>', 'supports': "<TBD
  “supports": '<list the features supported by the TBD module>', 'supports': '<list the features supported by the TBD module>', 'supports': "<list the features supported by the TBD module
  “defisnitions": [
    “comment: <write parameter type definitions here>", 'comments'.
    {
      “name": '<name of the type, must be unique in the definitions>', { 'description': "<type description
      “description": '<Description of the type>', { 'description': "<Description of the type
      “definition": {
        “type": '<type of data, model (model), *model_set (set of models)*, data (data), others (others)>', 'form': '<file (file name)', { 'name': "<type name.
        “form“: '<file (file), directory (directory), STDIN/STDOUT, *only directory is allowed if type is model_set*>', 'form': ["form“: '<file (file), directory (directory), STDIN/STDOUT, *only directory is allowed if type is model_set*>', 'data': ["data": 'data', 'data': "other
        “files": [
          “Comment: <define file format if form is file>“, 'files': ["], “files".
          {
            “name": '<name (label) of input/output data>', { 'path': "<name in directory
            “path": '<path of the file in the directory, wildcards are allowed>', { 'form': '<file (name of the file)', { 'label': "<name of the file
            “form": '<file (file) or directory (directory)>', { 'files': "<forms

            “format": {
              “": 'Comment: <define file format if form is file>', 'comment': '<define file format if form is file>', 'format': {
              “type": 'File format CSV, JSON, Movie, etc.', 'structure': "<file format
              “structure": "<File format, depends on file type>"
            }
          }
        ], }
        “format": {
          “": 'comment: <define file format if form is file>', 'type': 'File format CSV, JSON, Movie, etc.', 'format': {
          “type": 'File format CSV, JSON, Movie, etc.', 'structure': '<File format, depends on file type>', 'format': { '': '<File format definition if form is file>', 'format': "<form is file
          “structure": "<File format, depends on file type>"
        }
      }
    }
  ],
  “scripts": [
    {
      “name": ”<script name (label), specifying the module name to be defined separately when calling Web API and the process to be executed by passing this script name. The script name must be unique within the module. By combining the name of the module and the name of the script, the script can be uniquely identified from an arbitrary set of modules.
      “description": ”<description of script. It is intended to be read by a human (or LLM).
      “path": ‘<relative path of the script, relative to the top directory of the code placed by the developer in the module>’, ‘path’: ‘<relative path of the script, relative to the top directory of the code placed by the developer in the module>’, ‘type’: ”<type of script
      “type": ”<type of script, training (input model and data and output model), prediction (input model and data and output data (and model), a different label for the type that updates the model while predicting) 
      “input": [
        “Comment: <input data definition here>”, ‘input’.
        {
          “name": ”<name (label) of input/output data, specifying which (get from table|store in table) data for each input/output when calling Web API. Script input/output names must be unique>”, ‘description’: ”<name of input/output data, specifying which (get from table|store in table) data for each input/output when calling Web API.
          “description": ”<Description of input/output data. For human (or LLM) reading>”, ‘path’: ”<input/output data
          “path": ‘<relative path in directory for input/output>’, ‘paramtype’: ”<paramtype of parameter
          “paramtype": ”<name of parameter type (string) or parameter type definition itself (associative array)>”
        }
      ],, “output”.
      “output": [
        “comment: <description of output data here>”, ‘output’.
        {
          “name": ”<name (label) of input/output data, specifying which (get from table|store in table) data for each input/output when calling Web API. Script input/output names must be unique>”, ‘description’: ”<name of input/output data, specifying which (get from table|store in table) data for each input/output when calling Web API.
          “description": ”<Description of input/output data. For human (or LLM) reading>”, ‘path’: ”<input/output data
          “path": ‘<relative path in directory for input/output>’, ‘paramtype’: ”<paramtype of parameter
          “paramtype": ”<name of parameter type (string) or parameter type definition itself (associative array)>”
        }
      ]
    }
  ]
}

An example of a JSON file for module definition information is shown below.

{
  "name": "module.fed-avg-module.aggregate",
  "description": "FedAvg aggregate module",
  "supports": "",
  "definitions": {
    "the_model": {
      "description": "Models before and after aggregation",
      "definition": {
        "type": "model",
        "form": "directory",
        "files": [
          {
            "name": "model_data",
            "path": "model.dat.xdata_meta",
            "form": "file",
            "format": {
              "type": "json",
              "structure": ""
            }
          },
          {
            "name": "model_file",
            "path": "model.dat",
            "form": "file",
            "format": {
              "type": "state_dict",
              "structure": ""
            }
          }
        ]
      }
    }
  },
  "scripts": {
    "fed_avg_aggregate": {
      "description": "FedAvg aggregate",
      "path": "aggregate",
      "type": "aggregation",
      "input": {
        "input_model_set": {
          "description": "input models",
          "path": "models",
          "paramtype": {
            "type": "model_set",
            "form": "directory",
            "paramtype": "the_model"
          }
        }
      },
      "output": {
        "output_model": {
          "description": "output models",
          "path": "aggregated_model",
          "paramtype": "the_model"
        }
      }
    }
  }
}

Test Code

Place the test code for the source code in the tests directory. Create test source code as needed to run CI/CD.

.flake8

This is a file to configure Flake8 for linter. It should be written in the following format.

[flake8]
max-line-length = 88
extend-ignore = E203,E402

Configuration files for jobs run by GitLab Runner (for each module)

YAML file to define tasks to be loaded in .gitlab-ci.yml in the parent hierarchy. The file name can be set arbitrarily. An example of job configuration is in the following format. Note that when specifying the file path in the job, you must specify the file path from the parent hierarchy (the path from the .gitlab-ci.yml file in the parent hierarchy).

fed-avg-lint-format-check:
  stage: test
  tags:
    - cpu
  image: python:3.10-slim-bullseye
  script:
    - cd fed-avg-module/
    - pip install --upgrade pip
    - pip install --no-cache-dir flake8 black
    - flake8 .
    - black . --check

fed-avg-unit-test-cpu:
  stage: test
  tags:
    - cpu
  image: python:3.10-slim-bullseye
  variables:
    MODEL_DIR: ./test_models
  script:
    - cd fed-avg-module/
    - pip install --upgrade pip
    - pip install --no-cache-dir -r requirements.txt
    - pip install --no-cache-dir pytest
    - pytest

docker-compose.yml

docker-compose.yml file to start the container for the module. It is written in the following format.

version: '3'
services:
  aggregate:
    build:
      context: .
      dockerfile: Dockerfile
    environment:
      MODEL_DIR: ./models
    volumes:
      - ./mnist_models:/work/models

docker-compose.dev.yml

docker-compose.yml file to launch containers for running linter, formatter and test code. Describe the services for running linter, formatter and test code as follows.

version: '3'
services:
  test_aggregate:
    build:
      context: .
      dockerfile: Dockerfile.test
    environment:
      MODEL_DIR: ./models
    volumes:
      - ./src:/work/src
      - ./tests:/work/tests
      - ./test_models:/work/models
  format:
    build:
      context: .
      dockerfile: Dockerfile.dev
    volumes:
      - ./src:/work/src
      - ./tests:/work/tests
    command: black .
  lint:
    build:
      context: .
      dockerfile: Dockerfile.dev
    volumes:
      - ./src:/work/src
      - ./tests:/work/tests
    command: flake8 .

Dockerfile

Dockerfile for module execution and build. It should be written in the following format.

FROM python:3.10-slim-bullseye

WORKDIR /work

COPY ./requirements.txt /work/
RUN pip install --no-cache-dir -r requirements.txt

COPY ./src/aggregate.py /work/

CMD ["python", "aggregate.py"]

Dockerfile.dev

Dockerfile to launch tools needed during development such as linter and formatter. The following is an example of launching a Python container with flake8 and black installed.

FROM python:3.10-slim-bullseye

RUN pip install --no-cache-dir flake8 black
COPY ./pyproject.toml /work/
COPY ./.flake8 /work/

WORKDIR /work

Dockerfile.test

Dockerfile for launching a container to execute test code. The following is an example of launching a container with pytest installed.

FROM python:3.10-slim-bullseye

COPY ./requirements.txt /work/

WORKDIR /work

RUN pip install --no-cache-dir -r requirements.txt
RUN pip install --no-cache-dir pytest

CMD ["pytest"]

Makefile

This is a file that contains commands necessary for development and other purposes. The following is a list of commands that are frequently used and can be easily accessed.

dev/test:
    docker compose -f docker-compose.dev.yml run --rm test_aggregate
dev/format:
    docker compose -f docker-compose.dev.yml run --rm format
dev/lint:
    docker compose -f docker-compose.dev.yml run --rm lint

The make command is used to execute commands. You can specify the command you want to use in the form of make dev/lint and execute it.

pyproject.toml

This is a file that can summarize the configuration of the entire Python project. You can describe the settings of the corresponding package. The following is an example of black configuration.

[tool.black]
skip-string-normalization = true
target-version = ['py38']

requirements.txt

This file describes the list of packages required for source code execution. It should be written in the following format. Specify the package version, etc. as necessary.

fsspec==2023.12.1
torch
torchvision
torchaudio

Configuration file to run GitLab Runner (.gitlab-ci.yml on the parent level)

This is the .gitlab-ci.yml file located one level above the module development time directory. GitLab Runner reads this file and runs jobs. To read and run a job defined in each module, write the file in the following format.

stages:
  - test
  - build
  - deploy

default:
  tags:
    - cpu

include:
  - local: fed-avg-module/fed-avg-ci.yml