Skip to content

Multi-Party Joint Operation

1. Introduction

This primarily introduces how to define federated learning jobs using FATE Flow.

2. DAG Definition

FATE 2.0 uses a brand new DAG to define a job, including the upstream and downstream dependencies of each component.

3. Job Functional Configuration

3.1 Prediction

dag:
  conf:
    model_warehouse:                        
      model_id: '202307171452088269870'      
      model_version: '0'                    
In dag.conf.model_warehouse, define the model information that the prediction task relies on. This model will be used for prediction in the algorithm.

3.2 Job Inheritance

dag:
  conf:
    inheritance:                  
      job_id: "202307041704214920920"  
      task_list: ["reader_0"]         
In job.conf.inheritance, fill in the job and algorithm component names that need to be inherited. The newly started job will directly reuse the outputs of these components.

3.3 Specifying the Scheduler Party

dag:
  conf:
    scheduler_party_id: "9999"   
In job.conf.scheduler_party_id, you can specify scheduler party information. If not specified, the initiator acts as the scheduler.

3.4 Specifying Job Priority

dag:
  conf:
    priority: 2
In job.conf.priority, specify the scheduling weight of the task. The higher the value, the higher the priority.

3.5 Automatic Retry on Failure

dag:
  conf:
    auto_retries: 2
In job.conf.auto_retries, specify the number of retries if a task fails. Default is 0.

3.6 Resource Allocation

dag:
  conf:
    cores: 4
  task:
    engine_run:
      cores: 2
- Here, dag.conf.cores represents the allocated resources for the entire job (job_cores), and dag.conf.engine_run.cores represents the allocated resources for the task (task_cores). If a job is started with this configuration, its maximum parallelism will be 2. - Task parallelism = job_cores / task_cores

3.7 Task Timeout

dag:
  task:
    timeout: 3600 # s
In dag.task.timeout, specify the task's timeout. When a task is in the 'running' state after reaching the timeout, it triggers an automatic job kill operation.

3.8 Task Provider

dag:
  task:
    provider: fate:2.0.1@local
In dag.task.provider, specify the algorithm provider, version number, and execution mode for the task.

4. Input

Description: Upstream input, divided into two input types: data and models.

4.1 Data Input

  • As parameter input to a component

    dag:
      party_tasks:
        guest_9999:
          tasks:
            reader_0:
              parameters:
                name: breast_hetero_guest
                namespace: experiment
        host_9998:
          tasks:
            reader_0:
              parameters:
                name: breast_hetero_host
                namespace: experiment
    
    The reader component supports directly passing a FATE data table as job-level data input.

  • Input of one component from another component's output

    dag:
      tasks:
        binning_0:
          component_ref: hetero_feature_binning
          inputs:
            data:
              train_data:
                task_output_artifact:
                  output_artifact_key: train_output_data
                  producer_task: scale_0
    
    binning_0 depends on the output data of scale_0.

4.2 Model Input

  • Model Warehouse
    dag:
      conf:
        model_warehouse:                        
          model_id: '202307171452088269870'      
          model_version: '0'  
      tasks:
        selection_0:
          component_ref: hetero_feature_selection
          dependent_tasks:
          - scale_0
            model:
              input_model:
                model_warehouse:
                  output_artifact_key: train_output_model
                  producer_task: selection_0
    

5. Output

The job's output includes data, models, and metrics.

5.1 Metric Output

Querying Metrics

Querying output metrics command:

flow output query-metric -j $job_id -r $role -p $party_id -tn $task_name
- flow output query-metric -j 202308211911505128750 -r arbiter -p 9998 -tn lr_0 - Input content as follows:
{
    "code": 0,
    "data": [
        {
            "data": [
                {
                    "metric": [
                        0.0
                    ],
                    "step": 0,
                    "timestamp": 1692616428.253495
                }
            ],
            "groups": [
                {
                    "index": null,
                    "name": "default"
                },
                {
                    "index": null,
                    "name": "train"
                }
            ],
            "name": "lr_loss",
            "step_axis": "iterations",
            "type": "loss"
        },
        {
            "data": [
                {
                    "metric": [
                        -0.07785049080848694
                    ],
                    "step": 1,
                    "timestamp": 1692616432.9727712
                }
            ],
            "groups": [
                {
                    "index": null,
                    "name": "default"
                },
                {
                    "index": null,
                    "name": "train"
                }
            ],
            "name": "lr_loss",
            "step_axis": "iterations",
            "type": "loss"
        }
    ],
    "message": "success"
}

5.2 Model Output

Querying Models

flow output query-model -j $job_id -r $role -p $party_id -tn $task_name
- flow output query-model -j 202308211911505128750 -r host -p 9998 -tn lr_0 - Query result as follows:
{
    "code": 0,
    "data": {
        "output_model": {
            "data": {
                "estimator": {
                    "end_epoch": 10,
                    "is_converged": false,
                    "lr_scheduler": {
                        "lr_params": {
                            "start_factor": 0.7,
                            "total_iters": 100
                        },
                        "lr_scheduler": {
                            "_get_lr_called_within_step": false,
                            "_last_lr": [
                                0.07269999999999996
                            ],
                            "_step_count": 10,
                            "base_lrs": [
                                0.1
                            ],
                            "end_factor": 1.0,
                            "last_epoch": 9,
                            "start_factor": 0.7,
                            "total_iters": 100,
                            "verbose": false
                        },
                        "method": "linear"
                    },
                    "optimizer": {
                        "alpha": 0.001,
                        "l1_penalty": false,
                        "l2_penalty": true,
                        "method": "sgd",
                        "model_parameter": [
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ],
                            [
                                0.0
                            ]
                        ],
                        "model_parameter_dtype": "float32",
                        "optim_param": {
                            "lr": 0.1
                        },
                        "optimizer": {
                            "param_groups": [
                                {
                                    "dampening": 0,
                                    "differentiable": false,
                                    "foreach": null,
                                    "initial_lr": 0.1,
                                    "lr": 0.07269999999999996,
                                    "maximize": false,
                                    "momentum": 0,
                                    "nesterov": false,
                                    "params": [
                                        0
                                    ],
                                    "weight_decay": 0
                                }
                            ],
                            "state": {}
                        }
                    },
                    "param": {
                        "coef_": [
                            [
                                -0.10828543454408646
                            ],
                            [
                                -0.07341302931308746
                            ],
                            [
                                -0.10850320011377335
                            ],
                            [
                                -0.10066638141870499
                            ],
                            [
                                -0.04595951363444328
                            ],
                            [
                                -0.07001449167728424
                            ],
                            [
                                -0.08949052542448044
                            ],
                            [
                                -0.10958756506443024
                            ],
                            [
                                -0.04012322425842285
                            ],
                            [
                                0.02270071767270565
                            ],
                            [
                                -0.07198350876569748
                            ],
                            [
                                0.00548586156219244
                            ],
                            [
                                -0.06599288433790207
                            ],
                            [
                                -0.06410090625286102
                            ],
                            [
                                0.016374297440052032
                            ],
                            [
                                -0.01607361063361168
                            ],
                            [
                                -0.011447405442595482
                            ],
                            [
                                -0.04352564364671707
                            ],
                            [
                                0.013161249458789825
                            ],
                            [
                                0.013506329618394375
                            ]
                        ],
                        "dtype": "float32",
                        "intercept_": null
                    }
                }
            },
            "meta": {
                "batch_size": null,
                "epochs": 10,
                "init_param": {
                    "fill_val": 0.0,
                    "fit_intercept": false,
                    "method": "zeros",
                    "random_state": null
                },
                "label_count": false,
                "learning_rate_param": {
                    "method": "linear",
                    "scheduler_params": {
                        "start_factor": 0.7,
                        "total_iters": 100
                    }
                },
                "optimizer_param": {
                    "alpha": 0.001,
                    "method": "sgd",
                    "optimizer_params": {
                        "lr": 0.1
                    },
                    "penalty": "l2"
                },
                "ovr": false
            }
        }
    },
    "message": "success"
}

Downloading Models

flow output download-model -j $job_id -r $role -p $party_id -tn $task_name -o $download_dir
- flow output download-model -j 202308211911505128750 -r host -p 9998 -tn lr_0 -o ./ - Download result:
{
    "code": 0,
    "directory": "./output_model_202308211911505128750_host_9998_lr_0",
    "message": "Download success, please check the path: ./output_model_202308211911505128750_host_9998_lr_0"
}

5.3 Output Data

Querying Data Tables

flow output query-data-table -j $job_id -r $role -p $party_id -tn $task_name
- flow output query-data-table -j 202308211911505128750 -r host -p 9998 -tn binning_0 - Query result:
{
    "train_output_data": [
        {
            "name": "9e28049c401311ee85c716b977118319",
            "namespace": "202308211911505128750_binning_0"
        }
    ]
}

Previewing Data

flow output display-data -j $job_id -r $role -p $party_id -tn $task_name
- flow output display-data -j 202308211911505128750 -r host -p 9998 -tn binning_0

Downloading Data

flow output download-data -j $job_id -r $role -p $party_id -tn $task_name -o $download_dir
- flow output download-data -j 202308211911505128750 -r guest -p 9999 -tn lr_0 -o ./ - Result:
{
    "code": 0,
    "directory": "./output_data_202308211911505128750_guest_9999_lr_0",
    "message": "Download success, please check the path: ./output_data_202308211911505128750_guest_9999_lr_0"
}