Skip to content

Commit e7090a6

Browse files
authored
Merge pull request #97 from MyYuan/master
update docs of ipfs&dnn
2 parents 8d80600 + ff2e713 commit e7090a6

File tree

11 files changed

+94
-32
lines changed

11 files changed

+94
-32
lines changed

docs/source/introduction/concepts.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ PaddleDTX中有两类任务:
2828
## 算法
2929
PaddleDTX中的算法,一般指的是经过分布式改造的机器学习算法,即联邦学习算法。
3030

31-
目前开源了**纵向联邦学习**算法,包括**多元线性回归****多元逻辑回归**
31+
目前开源了**纵向联邦学习**算法,包括**多元线性回归****多元逻辑回归****神经网络**
3232

3333
## 训练样本和预测数据集
3434
PaddleDTX中的训练样本和预测数据集都是以文件的形式存储于中心化存储网络,在发布训练任务或者预测任务的时候,由**计算需求节点**指定。

docs/source/others/issues.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,11 @@ A:实际业务应用中,用户可以按照需求搭建节点,存储节点
2626

2727
**Q:当前模型训练算法什么支持哪些?**
2828

29-
A:目前开源的有线性回归、逻辑回归的纵向联邦学习算法,后续会持续开源决策树、深度神经网络等纵向联邦算法,以及横向联邦学习算法,敬请关注。
29+
A:目前开源的有线性回归、逻辑回归、神经网络的纵向联邦学习算法,后续会持续开源决策树纵向联邦算法,以及横向联邦学习算法,敬请关注。
3030

3131
**Q:去中心化存储XuperDB当前支持哪些存储引擎,是否支持IPFS呢?**
3232

33-
A:当前仅支持本地文件系统方式,后续存储节点会支持NAS、NFS、IPFS等,针对IPFS的支持会在下一个版本开源
33+
A:支持,已在2.0版本开源,用户可以通过修改存储节点的storage.mode下的type配置,选择存储方式,当前支持本地文件系统、IPFS方式
3434

3535
**Q:参与模型训练和预测的样本数据从哪来?数据使用需求方如何检索到所需数据?**
3636

docs/source/others/ongoing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
我们即将支持的主要功能如下:
44

5-
1. 支持更多的机器学习算法和对应的分布式改造,主要包括神经网络、决策树等
5+
1. 支持更多的机器学习算法和对应的分布式改造,如决策树算法
66
2. 支持横向联邦学习算法,计划先对多元线性回归和多元逻辑回归进行改造;
77
3. 优化目前使用的加法同态算法Paillier的性能;
88
4. 去中心化存储服务支持负载均衡策略,根据存储节点剩余资源和以往表现情况,在文件分发时找到最优节点列表;

docs/source/projectcases/dnn-paddlefl.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,19 @@
2424
Usage:
2525
./paddledtx_test.sh <mode> [-f <sample files>] [-m <model task id>] [-i <task id>]
2626
<mode> - one of 'upload_sample_files', 'start_vl_linear_train', 'start_vl_linear_predict', 'start_vl_logistic_train'
27-
'start_vl_logistic_predict', 'tasklist', 'gettaskbyid'
27+
'start_vl_logistic_predict','start_vl_dnn_train', 'start_vl_dnn_predict', 'tasklist', 'gettaskbyid'
2828
- 'upload_sample_files' - save linear and logistic sample files into XuperDB
2929
- 'start_vl_linear_train' - start vertical linear training task
3030
- 'start_vl_linear_predict' - start vertical linear prediction task
3131
- 'start_vl_logistic_train' - start vertical logistic training task
3232
- 'start_vl_logistic_predict' - start vertical logistic prediction task
33-
- 'start_vl_dnn_train' - start vertical logistic training task
34-
- 'start_vl_dnn_predict' - start vertical logistic prediction task
33+
- 'start_vl_dnn_train' - start vertical paddlefl-dnn training task
34+
- 'start_vl_dnn_predict' - start vertical paddlefl-dnn prediction task
3535
- 'tasklist' - list task in PaddleDTX
3636
- 'gettaskbyid' - get task by id from PaddleDTX
3737
-f <sample files> - linear or logistic sample files
38+
-e <model evaluation> - whether to perform model evaluation on the training task, default false, if select true, the evaluate rule is 'Cross Validation'
39+
-l <live model evaluation> - whether to perform live model evaluation, default false
3840
-m <model task id> - finished train task ID from which obtain the model, required for predict task
3941
-i <task id> - training or prediction task id
4042

@@ -46,7 +48,7 @@ Usage:
4648
./paddledtx_test.sh start_vl_logistic_train -f b31f53a5-0f8b-4f57-a7ea-956f1c7f7991,f3dddade-1f52-4b9e-9253-835e9fc81901
4749
./paddledtx_test.sh start_vl_logistic_predict -f 1e97d684-722f-4798-aaf0-dffe955a94ba,b51a927c-f73e-4b8f-a81c-491b9e938b4d -m d8c8865c-a837-41fd-802b-8bd754b648eb
4850
./paddledtx_test.sh start_vl_dnn_train -f 34cf2ee3-81b2-4865-907d-a9eab3c5b384,9dc7e0b7-18dd-4d5a-a3a1-6dace6d04fc8,3eaee2ea-4680-4b0b-bde3-ab4a4949159e
49-
./paddledtx_test.sh start_vl_dnn_predict -f c21b367f-2cb8-4859-87d8-18c52d397b13,043b9f55-68f6-4587-be8b-2340ea4432c2,b36442b6-ea3d-4530-910a-ec44291cd66c -m 91d9c0b7-996b-4954-86e8-95048e91a3b8
51+
./paddledtx_test.sh start_vl_dnn_predict -f 25ec6fd0-904e-4737-9bcc-c1cc11df1170,4442acae-90a2-4b92-b05f-cf1503c9d55e,73176b51-07f1-4f50-82c8-2d9d8908849b -m d8c8865c-a837-41fd-802b-8bd754b648eb
5052
./paddledtx_test.sh gettaskbyid -i 9b3ff4be-bfcd-4520-a23b-4aa6ea4d59f1
5153
./paddledtx_test.sh tasklist
5254
```

docs/source/projectcases/linear.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,19 @@
4343
Usage:
4444
./paddledtx_test.sh <mode> [-f <sample files>] [-m <model task id>] [-i <task id>]
4545
<mode> - one of 'upload_sample_files', 'start_vl_linear_train', 'start_vl_linear_predict', 'start_vl_logistic_train'
46-
'start_vl_logistic_predict', 'tasklist', 'gettaskbyid'
46+
'start_vl_logistic_predict','start_vl_dnn_train', 'start_vl_dnn_predict', 'tasklist', 'gettaskbyid'
4747
- 'upload_sample_files' - save linear and logistic sample files into XuperDB
4848
- 'start_vl_linear_train' - start vertical linear training task
4949
- 'start_vl_linear_predict' - start vertical linear prediction task
5050
- 'start_vl_logistic_train' - start vertical logistic training task
5151
- 'start_vl_logistic_predict' - start vertical logistic prediction task
52+
- 'start_vl_dnn_train' - start vertical paddlefl-dnn training task
53+
- 'start_vl_dnn_predict' - start vertical paddlefl-dnn prediction task
5254
- 'tasklist' - list task in PaddleDTX
5355
- 'gettaskbyid' - get task by id from PaddleDTX
5456
-f <sample files> - linear or logistic sample files
57+
-e <model evaluation> - whether to perform model evaluation on the training task, default false, if select true, the evaluate rule is 'Cross Validation'
58+
-l <live model evaluation> - whether to perform live model evaluation, default false
5559
-m <model task id> - finished train task ID from which obtain the model, required for predict task
5660
-i <task id> - training or prediction task id
5761

@@ -62,6 +66,8 @@ Usage:
6266
./paddledtx_test.sh start_vl_linear_predict -f cb40b8ad-db08-447f-a9d9-628b69d01660,2a8a45ab-3c5d-482e-b945-bc45b7e28bf9 -m 9b3ff4be-bfcd-4520-a23b-4aa6ea4d59f1
6367
./paddledtx_test.sh start_vl_logistic_train -f b31f53a5-0f8b-4f57-a7ea-956f1c7f7991,f3dddade-1f52-4b9e-9253-835e9fc81901
6468
./paddledtx_test.sh start_vl_logistic_predict -f 1e97d684-722f-4798-aaf0-dffe955a94ba,b51a927c-f73e-4b8f-a81c-491b9e938b4d -m d8c8865c-a837-41fd-802b-8bd754b648eb
69+
./paddledtx_test.sh start_vl_dnn_train -f 34cf2ee3-81b2-4865-907d-a9eab3c5b384,9dc7e0b7-18dd-4d5a-a3a1-6dace6d04fc8,3eaee2ea-4680-4b0b-bde3-ab4a4949159e
70+
./paddledtx_test.sh start_vl_dnn_predict -f 25ec6fd0-904e-4737-9bcc-c1cc11df1170,4442acae-90a2-4b92-b05f-cf1503c9d55e,73176b51-07f1-4f50-82c8-2d9d8908849b -m d8c8865c-a837-41fd-802b-8bd754b648eb
6571
./paddledtx_test.sh gettaskbyid -i 9b3ff4be-bfcd-4520-a23b-4aa6ea4d59f1
6672
./paddledtx_test.sh tasklist
6773
```

docs/source/quickstart/compile-install.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -131,10 +131,19 @@ PaddleDTX 使用 golang 进行开发,当您使用源码进行编译和安装
131131
userName = "Admin"
132132
orgName = "org1"
133133

134+
[storage.prover]
135+
localRoot = "/root/xdb/data/prove"
136+
134137
[storage.mode]
135-
type = "local"
136-
[storage.mode.local]
137-
rootPath = "./slices"
138+
type = "local"
139+
[storage.mode.local]
140+
rootPath = "/root/xdb/data/slices"
141+
[storage.mode.ipfs]
142+
hosts = [
143+
"127.0.0.1:5001",
144+
"127.0.0.1:5002"
145+
]
146+
timeout = 5000
138147

139148
[storage.monitor]
140149
challengingSwitch = "on"
@@ -145,7 +154,7 @@ PaddleDTX 使用 golang 进行开发,当您使用源码进行编译和安装
145154
level = "debug"
146155
path = "./logs"
147156
```
148-
其中,listenAddress和publicAddress 指定服务监听的地址及对外暴露的地址,blockchain配置中使用区块链网络部署时创建的账户助记词、合约账户及合约名,rootPath指定文件存储的本地路径
157+
其中,listenAddress和publicAddress 指定服务监听的地址及对外暴露的地址,blockchain配置中指定了使用区块链网络部署时创建的账户助记词、合约账户及合约名,storage.mode定义了存储节点采取的存储方式,支持本地文件系统和ipfs方式
149158

150159
启动服务:
151160
```
@@ -246,6 +255,7 @@ PaddleDTX 使用 golang 进行开发,当您使用源码进行编译和安装
246255
请妥善保存您创建的公私钥对,在后续的配置及命令行使用时您将会频繁的用到它。
247256
!!! note ""
248257
注意: 任务发布后时,任务执行节点会向数据持有节点发起文件授权申请,数据持有节点可通过或拒绝样本文件授权申请。
258+
当前开源的多元线性回归、多元逻辑回归算法支持两个任务执行节点,神经网络算法需要三个任务执行节点,如果需要使用神经网络,请部署3个任务执行节点。
249259

250260
1. 准备两个任务执行节点的配置
251261
```
@@ -261,6 +271,10 @@ PaddleDTX 使用 golang 进行开发,当您使用源码进行编译和安装
261271
# executor1
262272
listenAddress = ":8184"
263273
publicAddress = "127.0.0.1:8184"
274+
#定义PaddleFL运行所需的容器地址
275+
paddleFLAddress = "paddlefl-env1:38302"
276+
paddleFLRole = 0
277+
264278
# genkey创建的私钥
265279
keyPath = "./keys"
266280
@@ -293,6 +307,9 @@ PaddleDTX 使用 golang 进行开发,当您使用源码进行编译和安装
293307
# executor2
294308
listenAddress = ":8185"
295309
publicAddress = "127.0.0.1:8185"
310+
#定义PaddleFL运行所需的容器地址
311+
paddleFLAddress = "paddlefl-env1:38303"
312+
paddleFLRole = 2
296313
# genkey创建的私钥
297314
keyPath = "./keys"
298315

docs/source/quickstart/quickstart.md

Lines changed: 24 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,14 @@
1515
$ git clone git@github.com:PaddlePaddle/PaddleDTX.git
1616
$ cd PaddleDTX/scripts
1717
$ sh network_up.sh start
18-
$ # 支持三方的DNN 算法,需要启动 PaddleFL 的节点,执行如下命令代替上述命令
19-
$ # sh network_up.sh start -p true
18+
19+
# 支持启动基于ipfs存储网络的DAI,命令如下:
20+
$ sh network_up.sh start -s ipfs
21+
# 支持三方的DNN 算法,需要启动 PaddleFL 的节点,执行如下命令代替上述命令:
22+
$ sh network_up.sh start -p true
2023
```
2124

25+
2226
使用脚本也可以快速销毁网络:
2327
```
2428
$ sh network_up.sh stop
@@ -30,32 +34,38 @@ $ sh network_up.sh stop
3034

3135
我们推荐用户采用Linux环境安装,若采用Mac启动,需修改docker运行资源限制,设置较高的Cpus(>4)、Memory(>4GB)、Swap(>4GB)。
3236

33-
网络启动成功后,可通过docker ps查看脚本启动的服务,共包含3个区块链节点、2个数据持有节点、3个存储节点、2个可信计算节点
37+
网络启动成功后,可通过docker ps查看脚本启动的服务,共包含3个区块链节点、3个数据持有节点、3个存储节点、3个可信计算节点,如果用户采用`sh network_up.sh start -s ipfs -p true`命令启动,则会再启动一个ipfs节点和3个paddlefl节点
3438

3539
如果用户无需进行模型训练,可以选择只启动去中心化存储网络(Xuperdb),参考 [XuperDB 服务启动和命令使用说明](https://github.com/PaddlePaddle/PaddleDTX/tree/master/xdb/scripts):
3640
``` shell
3741

3842
# 启动基于Xchain的Xuperdb
3943
$ cd PaddleDTX/xdb/scripts
40-
$ sh network_up.sh start
44+
$ sh network_up.sh start -b xchain
4145

4246
# 启动基于Fabric网络的Xuperdb
4347
$ cd PaddleDTX/xdb/scripts
44-
$ sh network_up.sh start fabric
48+
$ sh network_up.sh start -b fabric
49+
50+
# 启动采用ipfs存储网络的Xuperdb
51+
$ cd PaddleDTX/xdb/scripts
52+
$ sh network_up.sh start -b xchain -s ipfs
4553
```
4654

4755
### 1.3 任务发布和执行
4856
./paddledtx_test.sh脚本提供了多种快捷操作,方便用户文件上传、下载、发布训练和预测任务等,快捷命令如下:
4957
``` shell
50-
Usage:
58+
Usage:
5159
./paddledtx_test.sh <mode> [-f <sample files>] [-m <model task id>] [-i <task id>]
5260
<mode> - one of 'upload_sample_files', 'start_vl_linear_train', 'start_vl_linear_predict', 'start_vl_logistic_train'
53-
'start_vl_logistic_predict', 'tasklist', 'gettaskbyid'
61+
'start_vl_logistic_predict','start_vl_dnn_train', 'start_vl_dnn_predict', 'tasklist', 'gettaskbyid'
5462
- 'upload_sample_files' - save linear and logistic sample files into XuperDB
5563
- 'start_vl_linear_train' - start vertical linear training task
5664
- 'start_vl_linear_predict' - start vertical linear prediction task
5765
- 'start_vl_logistic_train' - start vertical logistic training task
5866
- 'start_vl_logistic_predict' - start vertical logistic prediction task
67+
- 'start_vl_dnn_train' - start vertical paddlefl-dnn training task
68+
- 'start_vl_dnn_predict' - start vertical paddlefl-dnn prediction task
5969
- 'tasklist' - list task in PaddleDTX
6070
- 'gettaskbyid' - get task by id from PaddleDTX
6171
-f <sample files> - linear or logistic sample files
@@ -71,26 +81,30 @@ Usage:
7181
./paddledtx_test.sh start_vl_linear_predict -f cb40b8ad-db08-447f-a9d9-628b69d01660,2a8a45ab-3c5d-482e-b945-bc45b7e28bf9 -m 9b3ff4be-bfcd-4520-a23b-4aa6ea4d59f1
7282
./paddledtx_test.sh start_vl_logistic_train -f b31f53a5-0f8b-4f57-a7ea-956f1c7f7991,f3dddade-1f52-4b9e-9253-835e9fc81901
7383
./paddledtx_test.sh start_vl_logistic_predict -f 1e97d684-722f-4798-aaf0-dffe955a94ba,b51a927c-f73e-4b8f-a81c-491b9e938b4d -m d8c8865c-a837-41fd-802b-8bd754b648eb
84+
./paddledtx_test.sh start_vl_dnn_train -f 34cf2ee3-81b2-4865-907d-a9eab3c5b384,9dc7e0b7-18dd-4d5a-a3a1-6dace6d04fc8,3eaee2ea-4680-4b0b-bde3-ab4a4949159e
85+
./paddledtx_test.sh start_vl_dnn_predict -f 25ec6fd0-904e-4737-9bcc-c1cc11df1170,4442acae-90a2-4b92-b05f-cf1503c9d55e,73176b51-07f1-4f50-82c8-2d9d8908849b -m d8c8865c-a837-41fd-802b-8bd754b648eb
7486
./paddledtx_test.sh gettaskbyid -i 9b3ff4be-bfcd-4520-a23b-4aa6ea4d59f1
7587
./paddledtx_test.sh tasklist
7688
```
7789
!!! note "说明"
7890
7991
用户可通过cat ./paddledtx_test.sh查看脚本默认创建的文件存储命名空间、上传文件列表等,如有额外需求,可自定义配置;
8092
81-
脚本执行的 start_vl_linear_train、start_vl_linear_predict、start_vl_logistic_train、start_vl_logistic_train 命令,本质为用户展示了波士顿房价预测与鸢尾花分类的项目案例,参考 [项目案例](../projectcases/linear.md)
93+
脚本执行的 start_vl_linear_train、start_vl_linear_predict、start_vl_logistic_train、start_vl_logistic_train、start_vl_dnn_train、start_vl_dnn_predic 命令,本质为用户展示了多元线性回归、多元逻辑回归和神经网络算法的项目案例,参考 [项目案例](../projectcases/linear.md)
8294
8395
1. 上传训练及预测样本文件
8496
```shell
85-
# upload_sample_files会为数据持有节点A/B创建数据存储的命名空间,并上传任务训练和预测所需的样本文件
86-
# 该命令共上传了8个文件,包括数据持有方A/B发布纵向线性回归、纵向逻辑回归训练和预测任务所需的文件
97+
# upload_sample_files会为数据持有节点A/B/C创建数据存储的命名空间,并上传任务训练和预测所需的样本文件
98+
# 该命令共上传了14个文件,包括数据持有方A/B发布纵向线性回归、纵向逻辑回归训练和预测任务所需的8个样本文件,数据持有方A/B/C发布纵向深度神经网络训练和预测任务所需的6个样本文件
8799
./paddledtx_test.sh upload_sample_files
88100

89101
# 执行后,命令返回:
90102
# Vertical linear train sample files:纵向线性训练任务所需样本ID
91103
# Vertical linear prediction sample files:纵向线性预测任务所需样本ID
92104
# Vertical logistic train sample files:纵向逻辑回归训练任务所需样本ID
93105
# Vertical logistic prediction sample files:纵向逻辑回归预测任务所需样本ID
106+
# PaddleFL train sample files:纵向深度神经网络训练任务所需样本ID
107+
# PaddleFL prediction sample files:纵向深度神经网络预测任务所需样本ID
94108
```
95109
96110
2. 启动纵向线性回归训练任务,$vlLinTrainfiles 取值为 **步骤1** 获取到的 Vertical linear train sample files

docs/source/tutorial/dai-config.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,12 @@ listenAddress = ":8184"
4343
# If your network mode is 'host', it is the machine's ip and the port in [server].listenAddress in before section.
4444
publicAddress = "10.144.94.17:8184"
4545

46+
# PaddleFLAddress is the endpoint of the container which has a runninng environment of PaddleFL.
47+
# Containers belong to different executors constitute a mpc network
48+
paddleFLAddress = "paddlefl-env1:38302"
49+
# PaddleFLRole is the role of the container in paddlefl mpc network.
50+
paddleFLRole = 0
51+
4652
# The private key of the trusted computing server.
4753
# Different key express different identity.
4854
# Only need to choose one from 'privateKey' and 'keyPath', and if both exist, 'privateKey' takes precedence over 'keyPath'
@@ -132,7 +138,7 @@ path = "./logs"
132138

133139
!!! note "配置说明"
134140

135-
1. 任务执行节点中配置了节点启动所需监听的端口、身份等信息;
141+
1. 任务执行节点中配置了节点启动所需监听的端口、身份等信息,paddleFLAddress定义了运行神经网络算法所需的容器地址
136142
2. executor.mode 用于指定节点的计算方式,支持代理和自主计算模式,代理模式用于数据持有节点将样本数据授权给任务执行节点进行代理计算,而自主计算模式则适用于计算节点是数据持有节点的客户端场景;
137143
3. executor.storage 定义了模型、评估结果、预测结果存储的路径,其中预测结果存储支持加密存储到去中心化存储网络;
138144
4. executor.blockchain 定义了任务执行节点操作的区块链网络配置,当前只支持Xchain网络,后续会支持Fabric;

docs/source/tutorial/xdb-config.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -159,12 +159,26 @@ publicAddress = "10.144.94.17:8122"
159159
userName = "Admin"
160160
orgName = "org1"
161161

162-
# The storage mode used by the storage node, currently only supports local file system.
162+
# Prover answers challenges from DataOwner to prove that the node is storing the slices
163+
[storage.prover]
164+
# local storage path to keep temporary data
165+
localRoot = "/root/xdb/data/prove"
166+
167+
# The storage mode used by the storage node, currently supports local file system and IPFS.
163168
[storage.mode]
169+
# Denotes what mode you choose, `local` or `ipfs`.
164170
type = "local"
165171
[storage.mode.local]
166172
# Location of file fragments
167173
rootPath = "/root/xdb/data/slices"
174+
[storage.mode.ipfs]
175+
# Denotes peers in IPFS cluster
176+
hosts = [
177+
"127.0.0.1:5001",
178+
"127.0.0.1:5002"
179+
]
180+
# The timeout for requesting IPFS, in milliseconds
181+
timeout = 5000
168182

169183
# The monitor will query new tasks in blockchain regularly, and trigger the task handler's operations
170184
[storage.monitor]
@@ -190,8 +204,9 @@ path = "./logs"
190204
!!! note "配置说明"
191205

192206
1. storage.blockchain 定义了节点操作区块链网络所需的配置,当前支持Xchain、Fabric网络;
193-
2. storage.mode 用于指定存储节点的存储方式,当前仅支持本地文件系统方式存储,后续持续支持Ipfs、Nas等;
194-
3. storage.monitor 用于存储节点开启心跳检测、配置文件清理时间间隔等;
207+
2. storage.prover 用于指定挑战应答时保存临时数据的本地存储路径;
208+
3. storage.mode 用于指定存储节点的存储方式,当前支持本地文件系统和ipfs方式存储;
209+
4. storage.monitor 用于存储节点开启心跳检测、配置文件清理时间间隔等;
195210

196211

197212
<br>

0 commit comments

Comments
 (0)