Skip to content

Data Loader Development Manual (Python version)

This document has been machine translated.

Notes.

  • The commands described in this manual have been tested on CentOS8.

Operating environment

Item Value
Python version 3.x series (does not work with 2.x series)
additional Python libraries psycopg2 (https://pypi.org/project/psycopg2/)
configparser (https://pypi.org/project/psycopg2/)
configparser (https://github.com/python/cpython/blob/3.8/Lib/configparser.py)
geojson (https://pypi.org/project/geojson/)
requests (https://github.com/kennethreitz/requests/tarball/master)

Preparing the Python runtime environment

1. Installing Python

  1. Install Python
[rt-loader@localhost ~]% sudo yum install python36

2. Add libraries.

[rt-loader@localhost ~]% sudo yum install python-3-psycopg2
[rt-loader@localhost ~]% sudo pip3 install configparser
[rt-loader@localhost ~]% sudo pip3 install geojson
[rt-loader@localhost ~]% sudo pip3 install requests

Get the data loader API (Python library)

This procedure assumes that you want to create a sample_loader data loader.

1. Obtain a set of data loader API files.

The name of the parent directory where the data loader API is located is the name of the data loader.

[rt-loader@localhost ~]% mkdir sample_loader
[rt-loader@localhost ~]% cd sample_loader
[rt-loader@localhost sample_loader]% git -b release_v0.6 clone http://<git_host>/<group>/<project>.git

Configure the data loader API (Python library).

sample_loader
|-- bin/
| `-- loader-start.sh <- Data loader startup script
|-- etc/.
| `-- common.ini <- data loader configuration file
| `-- logging.properties <- log configuration file
|-- loader-api/ <- data loader API
| |-- docs/ <- Data loader API manual
| | |-- md/
| | |-- add_record.md
| | |-- api_env.md
| | |-- api_list.md
| | | |-- api_remarks.md
| | | |-- en/
| | | |-- add_record.md
| | | |-- api_env.md
| | | |-- api_list.md
| | | | |-- api_remarks.md
| | | |-- error_code.md
| | | |-- history.md
| | | |-- index.md
| | | |-- init_record.md
| | | | |-- register_loader.md
| | | |-- sample_loader.md
| | | |-- error_code.md
| | | |-- history.md
| | | |-- index.md
| | | |-- init_record.md
| | |-- register_loader.md
| | `-- sample_loader.md
| | `-- misc/
| | `-- mkdocs.yml
| | `-- etc/
| `-- loader_api.ini
| `-- src/
| | -- __init__.py
| -- api_common.py
|-- loader_api.py
| -- loader_ctrl.py
| -- postgresql.py
|-- sample_datasource/ <- HTTP server that provides data to be retrieved by sample_loader
| | -- cgi-bin/
| | `-- datasource.py*
|-- sample_server.py
| | `-- sample_server.sh*
| `-- src/
| `-- loader.py <- main body of data loader
`-- var/
    `-- loader.log <- log file

Data loader development steps.

1. Change the configuration of the data loader.

  • etc/common.ini
[HTTP].
url = http://127.0.0.1:8000/cgi-bin/datasource.py <- URL for data retrieval

[WEB_API]
data_name=testloader_zs001 <- unique name associated with the event table
options=override <- Specify the behavior when the attribute corresponding to the unique key (start_datetime, end_datetime, geometry) is already registered when adding a record.
                    override : overwrite the record
                    skip : Do not register the corresponding line.
                    error : Do not register all lines.

  • bin/loader-start.sh
#! /bin/bash
#!
# proxy settings
#!
#export HTTP_PROXY="<proxy_host>:<proxy_port>" <- set if under proxy environment
#export HTTPS_PROXY="<proxy_host>:<proxy_port>" <- Set if under proxy environment.
#.
# don't change after this
#
BINDIR=$(cd $(dirname $0); pwd)
UPDIR=`/usr/bin/dirname ${BINDIR}`
export LOADER_NAME=`/usr/bin/basename ${UPDIR}`
export LOADER_HOME=$HOME/$LOADER_NAME
export LOADER_SRC=$LOADER_HOME/src
export LOADER_API=$LOADER_HOME/loader-api/src
export PYTHONPATH=$LOADER_API:$LOADER_SRC:$PYTHONPATH
export LOADER_PROGRAM=$LOADER_SRC/loader.py
export LOADER_INI=$LOADER_HOME/etc/common.ini
#trap 'echo ${PID};kill -SIGTERM ${PID}' SIGTERM
trap 'kill -SIGTERM ${PID}' SIGTERM
PROCESS_COUNT=`ps -aux | grep "$LOADER_PROGRAM" | grep -v grep | wc -l`
if [ ${PROCESS_COUNT} = 0 ]; then
    python3 $LOADER_PROGRAM -i $LOADER_INI &
    PID=$!
    wait else
else
    echo "Already running $LOADER_NAME"
fi

2. Check the format of the source data to be collected.

  • We have prepared a sample_datasource to check the operation of sample_loader.
  • The sample_datasource returns the time, location, and a single number.

(Reference) How to run sample_datasource and how to check the data format.

1. Run sample_datasource.

[rt-loader@localhost ~]% cd sample_loader/sample_datasource
[rt-loader@localhost sample_datasource]% . /sample_server.sh
Serving HTTP on 127.0.0.1 port 8000 (http://127.0.0.1:8000/) ...

2. Access sample_datasource.

[rt-loader@localhost ~]% curl http://127.0.0.1:8000/cgi-bin/datasource.py
2020-08-20 21:36:20+09 <- Time
35.000000, 135.000000 <- Location information
574 <- One number (integer)

3. Decompose the original data to create the data for transmission.

def __data_collect(ini_info):
    req = urllib.request.Request(url=ini_info['HTTP']['url'])
    response = urllib.request.urlopen(req)
    data = response.read().decode().splitlines()
    record_list = []
    one_record_dic = {}
    one_record_dic['start_datetime'] = data[0]
    one_record_dic['end_datetime'] = data[0]
    one_record_dic['location'] = {}
    one_record_dic['location']['type'] = "Point"
    one_record_dic['location']['coordinates'] = list(map(float, data[1].split(',')))
    one_record_dic['data'] = data[2]

    record_list.append(one_record_dic)
    return record_list

Start the data loader and post the data.

[rt-loader@localhost ~/sample_loader]% ./bin/sample_loader_start.sh