Skip to content

import_category

This document has been machine translated. Registers category definitions to be used for extracting correlation patterns.

Use JSON-RPC v2.0 as the execution method.

Example request

import_category is a kind of analysis API, and can be executed by specifying api_method="import_category" in the process method of the Provenance API. The following is an example of starting a Provenance session, executing import_category, and then closing the session.

import json
import xdata_prov.client import Api
api = Api()
api.begin_session()

api.process(api_method="import_category", api_params={
    "output_ddc": "ddc:jartic_xrain_category",
    "category_json": json.dumps([.
        { "item": "agg_jartic_max_length", "min": 10, "max": 100, "category": "cl10" },
        { "item": "agg_jartic_max_length", "min": 100, "max": 300, "category": "cl100" },
        { "item": "agg_jartic_max_length", "min": 300, "max": 600, "category": "cl300" },
        { "item": "agg_xrain_max_value", "min": 0.1, "max": 1, "category": "rf01" }
        { "item": "agg_xrain_max_value", "min": 1 , "max": 5, "category": "rf1" },
        { "item": "agg_xrain_max_value", "min": 5 , "max": 10, "category": "rf5" }
    ])
})

api.commit()
api.end_session()

Parameters

When calling the process method with api_method="import_category", api_params will take a dict containing the following keys. Parameters with a blank default value are required.

key description default value
output_ddc registered to ddc
output_mode output mode (overwrite or error) error
category_json category definition (in JSON format)

input data

category definition

The category definition specified by category_json is a JSON array whose elements are JSON objects. Each element of the array must have the following keys.

key type description
item string column name
min number minimum value (including the specified value)
max number maximum value (does not include the specified value)
category string category string

Interpretation of category definition.

Category definitions are used in extract_items. extract_items converts the category definition into symbolic information according to the category definition for each record of the input transaction.

The conversion rules are as follows.

  • If the value of the column specified by `item' is greater than or equal to min and less than max, output the symbol specified by category.
  • If min is not specified, the minimum value will be unrestricted.
  • If max is not specified, the maximum value will be unrestricted.

In the item column, you can extract specific fields from a timestamp type column in the form of %field. For example, start_datetime%hour will apply the conversion to the time (0 to 23) in the start_datetime column. Specifically, this is interpreted as the SQL extract(hour from start_datetime). Other fields than hour can be specified in the SQL extract.

If a category column contains the string %value, that part is replaced by the column value. This notation allows a column specified by item to be a category if it contains discrete values.

Here is an example of defining a category using the % notation

  • { "item": "start_datetime%hour", "max": 12, "category": "am" }
    • Give am category if start_datetime time field is less than 12
  • { "item": "start_datetime%dow", "category": "dow%value" }
    • assign the day of the week of the start_datetime as a category (Sunday: dow0 to Saturday: dow6)

Output data

category definition table.

The "category definition table" will be output to the destination ddc specified by output_ddc. This table will have the following schema.

column name data type description
item text column name
min double precision minimum value (including the specified value)
max double precision maximum value (does not contain the specified value)
category text category string

return value

import_category returns the ddc information of the output destination ddc. This is the behavior defined in the specification of the process method of the Provenance API.