Module definition file

Note

This document has been machine translated.

What is a module definition file?

A module definition file is a file that defines the processing (script) and data structures to be handled by a module, and is responsible for declaring the module interface specifications.
It is created in JSON format and saved as definition.json in the module.
When the module execution Web API provided by xData FL is called, xData FL interprets the parameters appropriately and prepares the data required for the requested processing, interprets the processing results, saves the data, and so on. xData FL is able to obtain the definition.json file. The module definition file structure is as follows.

Structure of module definition file

Overall structure

The module definition file is a JSON file with the following structure.

{
 name: <The name of the module, in the form of an FQDN as in java. This allows the module to be uniquely identified.>,
 description: <Module Description>,
 supports: <List the functions supported by the TBD module>,
# type definition section
 definitions: {
    <Parameter type definitions, see type definitions below>,
    ...
 },
# Script definition section
 scripts: { 
   <Specify the script name (label), the module name to be defined separately when calling the Web API, and the process to be executed by passing this script name. The script name must be unique within the module. By combining the name of the module and the name of the script, a script can be uniquely identified from an arbitrary set of modules.>: {
      description: <Script Description. For human (or LLM) reading.>
      path: <name of the service you want to run, the name of the service defined in docker-compose.yml>, type: <type of script, training (input model and data, output model), prediction (input model and data, output data (and model), update model while predicting)
      type: <type of script, training (input model and data and output model), prediction (input model and data and output data (and model). ), aggregation (input multiple models and output model), model_transformation (input model and output model), data_transformation (input data and output data) >,
      input: { # Define the input required by the script
          <define first input data, see input/output definitions below>, input: { # define the input required by the script
          <define second input data, see input/output definition below>, ...
          ...
      }, ...
      output: { # define the output of the script
          <define first output data, see input/output definition below>, <define second output data, see input/output definition below>, ...
          <define second output data, see input/output definition below>, ...
          ...
      }, }
      ...
   },
   <script name>: {
      type: <script type>, {
      ...
   },
   ...
 }
}

Type definitions

Format of type declarations in definitions

The type declaration part of definitions in the overall structure has the following format.

<type name, must be unique within definitions>:{
  description: <type description>, 
  definition: <definition of the type structure, format to be described later>.
}

Defining the type structure

The type structure is defined in the following format. It is used as the value of definition in the type declaration in definitions and as the value of paramtype in the input/output definition described below.

{
    type: <type, model (model), model_set (set of models), data (data), others (others)>, form: <file (file), directory (directory), STDIN/STDOUT, only directory is allowed if type is model_set
    form: <file (file), directory (directory), STDIN/STDOUT, only directory is allowed if type is model_set>, files: [ # form is a directory
    files: [ # only if form is directory
      <file definition, list files to be placed in the directory>, files: [ # only if form is a directory
      <file definition, list files to be placed in the directory>, ...
      ... 
    ],
    format: <form is other than directory. Enter the definition of the file format described below>,...], format: <form is not a directory, enter the definition of the file format
    paramtype: <type is model_set only. The name of the type defined in the definitions in the definition file, or JSON with the same structure as inside the definition of the type definition in this section, in either case, type must be model>.
}

The format of the type structure definition differs depending on whether type is model, data, others or model_set, and whether form is file or directory.

when type is model, data, others and form is dierctory

{
  type: <type of data, model, data, others
  form: “directory”,
  files: [
    <file definition information>
  ]
}

If type is model, data, others and form is file or STDIN/STDOUT

{
  type: <type of data, model, data, others>, form: <file or STDIN/STDOUT>, others: file or STDIN/STDOUT
  form: <file or STDIN/STDOUT>, format: <file format definition below
  format: <fill in the file format definition below>, form: <file or STDIN/STDOUT>, }
}

If type is model_set

{
  type: "model_set",
  form: "directory",
  paramtype: <name of the parameter definition whose type is model, or the same structure as the definition part of the parameter definition>.
}

file definition

If the form of the type structure is directory, the names and types of files to be placed in the directory must be listed in an array in files. The format is as follows.

{ 
  name: <name of the file (label, not filename)
  path: <file name, relative to the directory, wildcards are allowed>.
  form: <file or directory>
  files: [ # only if form is direcotry. Define files to be placed in the directory.
     <JSON of the definition of the file, recursive to the definition in this section>,  
     <JSON of file definitions, recursively as defined in this section>,. 
     ...
  ], 
  format: <if forms is file, fill in the file format definition described below>, <forms is file>, <forms is JSON, fill in the file format definition described below>
}

file format definition

Declares file format, etc.

{
   type: File format CSV, JSON, Movie, etc.
   structure: <type of file, depending on file type
}

How to specify structure when type is CSV

{
   lineseparator: <line separator, default CR LF>.
   delimiter: <column delimiter, default comma>.
   columns: 
      {
         name: <name of the column (label)>, type: <type of the column, string, number, date, geometory, etc.
         type: <type of the column, string, number, date, geometory, etc
      }, 
      ... Repeat for the number of columns
}

Other file formats will be defined as needed.

How to pass a model

When the type (type) is set to model, the system handles the following two types of files. Both are distinguished by the type specified in the file format definition.

Model metadata. json format file, with “json” in the “type” field of the “format” field.
The main body of the model. For the time being, the format is pytorch's state_dict, with “state_dict” in the format's type.

When defining the input, if the form is a directory, the metadata is stored in the file with the path name if format.type is “json” in files, and the binary data of the model is stored in the file with the path name if format.type is “state_dict”. If format.type is “state_dict”, the binary data of the model is saved in the file with that path name.

When defining output, metadata is read from the json file, and the state_dict file is treated as the body of the model.

input/output definitions

<name (label) of input/output data, specifying which data (retrieved from table|stored in table) to use for each input/output when calling Web API. The name of the input/output script must be unique>: {
  description: <description of input/output data. For human (or LLM) reading>, 
  path: <relative path in the input/output directory>, 
  paramtype: <type name (string) of the parameter or definition of the above type structure>
}

Example of module definition description

The following is an example of a module definition file that implements FedAvg, a model aggregation algorithm for federative learning.

This module contains a script that implements a model aggregation process (a process that takes multiple models as input and outputs the aggregated models). The input data is a set of models obtained under certain conditions, and the output is a single model.

{
    "name": "module.fed-avg-module.aggregate",
    "description": "FedAvg aggregate module",
    "supports": "",
    "definitions": {
        "the_model": {  # This module defines the_model as a common type to handle models in input/output.
            "description": “Models before and after aggregation.”,
            "definition": {
                "type": "model",         # type is “model” because it is a model
                "form": "directory",
                "files": [
                    {
                        "name": "model_data",
                        "path": "model.dat.xdata_meta",
                        "form": "file",
                        "format": {
                            "type": "json",       # Save metadata in files whose type is json in model.
                            "structure": ""
                        }
                    },
                    {
                        "name": "model_file",
                        "path": "model.dat",
                        "form": "file",
                        "format": {
                            "type": "state_dict",  # Save the model itself in the file whose type is state_dict in model.
                            "structure": ""
                        }
                    }
                ]
            }
        }
    },
    "scripts": {
        "fed_avg_aggregate": {   # This module contains a script that implements the aggregation process. It is named fed_avg_aggregate.
            "description": "FedAvg aggregate",
            "path": "aggregate",
            "type": "aggregation",  # Script type is aggregation
            "input": {   # Input data is a set of models
                "input_model_set": {    # The system creates a models directory under the input directory, and creates additional directories under the models directory to generate files defined by “the_model” in the definitions.
                    "description": "input models",
                    "path": "models",
                    "paramtype": {
                        "type": "model_set",
                        "form": "directory",
                        "paramtype": "the_model"
                    }
                }
            },
            "output": {
                "output_model": {  # There is one model output. The model information is stored in a directory named aggregated_model under the output directory according to the structure defined in the_model.
                    "description": "output models",
                    "path": "aggregated_model",
                    "paramtype": "the_model"
                }
            }
        }
    }
}