Merge pull request #98 from hongyanwang/master

MyYuan · web-flow · commit 735277b4d10d · 2022-06-01T20:04:15.000+08:00
update docs and readme
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-[DOC](https://paddledtx.readthedocs.io/zh_CN/latest) | [中文](./README_CN.md) | English
+[DOC](https://paddledtx.readthedocs.io/zh_CN/v2.0.0) | [中文](./README_CN.md) | English
 
 [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
 
@@ -24,9 +24,12 @@ Currently, XuperChain is the only blockchain framework that PaddleDTX supported.
 ![Image text](./images/architecture.png)
 
 ## Vertical Federated Learning
-The open source version of PaddleDTX supports two-party vertical federated learning(VFL) algorithms, including Linear Regression and Logistic Regression, more algorithms such as two-party Neural Network will be open sourced soon, along with multi-party VFL and multi-party HFL(horizontal federated learning) algorithms. Please refer to [crypto/ml](./crypto/core/machine_learning) for more about background and implementation of these two algorithms. 
+The open source version of PaddleDTX supports vertical federated learning(VFL) algorithms, including two-party Linear Regression, two-party Logistic Regression and three-party DNN(Deep Neural Networks). 
+Please refer to [crypto/ml](./crypto/core/machine_learning) for more about background and implementation of two-party VFL algorithms. 
+The DNN implementation relies on the [PaddleFL](https://github.com/PaddlePaddle/PaddleFL) framework and all neural network models provided by PaddleFL can be used in PaddleDTX.
+More algorithms will be open sourced soon, including multi-party VFL and multi-party HFL(horizontal federated learning) algorithms. 
 
-Training and predicting steps of VFL are shown as follows:
+Take two-party VFL algorithms as an example, training and prediction steps are shown as follows:
 
 ![Image text](./images/vertical_learning.png)
 
diff --git a/README_CN.md b/README_CN.md
@@ -1,4 +1,4 @@
-[DOC](https://paddledtx.readthedocs.io/zh_CN/latest) | [English](./README.md) | 中文
+[DOC](https://paddledtx.readthedocs.io/zh_CN/v2.0.0) | [English](./README.md) | 中文
 
 [![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE)
 
@@ -24,9 +24,11 @@ SMPC是一个支持多个学习过程并行运行的框架，会陆续集成更
 ![Image text](./images/architecture.png)
 
 ## 二、纵向联邦学习
-PaddleDTX 开源部分目前支持两方的纵向联邦学习算法，包括多元线性回归和多元逻辑回归。算法具体原理和实现参见 [crypto/ml](./crypto/core/machine_learning)，未来将支持更丰富的两方纵向联邦学习算法、多方的纵向联邦学习和横向联邦学习算法。
+PaddleDTX 开源部分目前支持纵向联邦学习算法，包括两方的多元线性回归和多元逻辑回归、三方的神经网络。两方纵向联邦学习算法具体原理和实现参见 [crypto/ml](./crypto/core/machine_learning)。
+神经网络算法实现依赖了 [PaddleFL 框架](https://github.com/PaddlePaddle/PaddleFL) ，可以使用 PaddleFL 提供的所有神经网络算法模型。
+PaddleDTX 未来将支持更丰富的纵向联邦学习和横向联邦学习算法。
 
-纵向联邦训练和预测步骤如下：
+以两方为例，纵向联邦训练和预测步骤如下：
 
 ![Image text](./images/vertical_learning.png)
 
diff --git a/crypto/README.md b/crypto/README.md
@@ -27,12 +27,12 @@ The model is based on multivariate linear regression model. It is continuously d
 The closer to 1, the greater the possibility it is the specified value. The training process is to look for optimal coefficients &theta; by iteration to ensure errors on training samples is as small as possible. 
 
 ## Vertical Federated Learning Algorithms
-The project currently supported two-party vertical federated learning protocol. 
+The project currently implemented two-party vertical federated learning protocol. 
 In training process, each party calculates partial gradient and cost using own samples. Intermediate parameters are exchanged and integrated to obtain each party's model without leaking any data confidentiality.
 In prediction process, each party calculate local result using own model and deduce final result by the sum of all partial results.
 
 Two parties' sample numbers in training or prediction process may be different.
-Samples need to be aligned by ID list of each party. Please referr to [psi](./core/machine_learning/linear_regression/gradient_descent/mpc_vertical/psi.go) for more details about sample alignment.  
+Samples need to be aligned by ID list of each party. Please refer to [psi](./core/machine_learning/linear_regression/gradient_descent/mpc_vertical/psi.go) for more details about sample alignment.  
 
 The vertical federated learning steps of linear and logistic regression are shown as follows, suppose sample alignment has already been finished:
 
diff --git a/crypto/README_CN.md b/crypto/README_CN.md
@@ -23,7 +23,7 @@ y = 1 / (1 + e<sup>-&theta;X</sup>)
 该模型是基于线性回归模型变化得到的，模型连续可导，且可以保证目标特征是(0,1)之间的数值，越接近1表明样本是指定值的概率越大。学习过程就是通过迭代找到合适的参数&theta;，使得模型在训练集合的误差尽量小。
 
 ## 二、纵向联邦学习算法
-项目暂支持两方的纵向联邦学习算法，训练过程中，双方利用各自的样本数据计算部分梯度和损失，交换中间参数并进行整合，在保证不泄露隐私的前提下计算各自的模型。预测时利用各自模型计算部分预测值，并利用预测结果之和推导出最终预测结果。
+项目实现了两方的纵向联邦学习算法，训练过程中，双方利用各自的样本数据计算部分梯度和损失，交换中间参数并进行整合，在保证不泄露隐私的前提下计算各自的模型。预测时利用各自模型计算部分预测值，并利用预测结果之和推导出最终预测结果。
 
 训练和预测时，双方样本可能会有数量不一致的情况，需要根据各方数据ID对数据进行对齐，具体原理和实现详见[隐私求交](./core/machine_learning/linear_regression/gradient_descent/mpc_vertical/psi.go)。
 
diff --git a/docs/README.md b/docs/README.md
@@ -10,8 +10,8 @@
 2. 服务启动：
     ```
     cd docs
-    mkdoc server
+    mkdocs serve
     ```
 
-3. View the site on [`localhost:8000`](https://localhost:8000).
+3. View the site on [`localhost:8000`](http://localhost:8000).
 
diff --git a/docs/source/details/DAI.md b/docs/source/details/DAI.md
@@ -16,7 +16,7 @@ PaddleDTX实现的多方安全计算框架，具备以下特征：
 - 可执行模型评估和动态模型评估
 - 以区块链、隐私计算、ACL技术为支撑，保证数据、模型的隐私性和可信性
 
-<img src='../../_static/smpc.png' width = "100%" height = "100%" align="middle"/>
+<img src='../../_static/smpc.png' width = "70%" height = "70%" align="middle"/>
 
 ## 3. 可信联邦学习
 PaddleDTX中，联邦学习分为训练过程和预测过程。计算需求方通过发布训练任务，任务执行节点会向数据持有节点做数据可信性背书，继而触发训练过程，最终得到满足条件的模型。如果有预测需求，计算需求方发布预测任务，任务执行节点会向数据持有节点做数据可信性背书，继而触发预测过程，最终得到预测结果。目前已集成的算法及其原理和实现，在 [crypto](./crypto.md#id2) 部分有更多体现。
diff --git a/docs/source/details/XuperDB.md b/docs/source/details/XuperDB.md
@@ -22,7 +22,7 @@ XuperDB 具备高安全、高可用、可审计的特点：
 ## 3. 架构设计
 XuperDB 系统架构如下图所示：
 
-<img src='../../_static/xdb.png' width = "100%" height = "100%" align="middle"/>
+<img src='../../_static/xdb.png' width = "80%" height = "80%" align="middle"/>
 
 XuperDB 网络由三类节点构成：
 
diff --git a/docs/source/details/crypto.md b/docs/source/details/crypto.md
@@ -16,7 +16,7 @@ PaddleDTX 的 crypto 模块实现了若干机器学习算法和对应的分布
 - **联邦迁移学习**：参与方样本与特征重叠都较少，该场景下不对数据进行切分，而是利用迁移学习来克服数据或标签不足的情况。
 
 ## 2. 机器学习算法
-PaddleDTX 目前已经开源多元线性回归和多元逻辑回归算法，决策树、神经网络等更丰富的机器学习算法即将开源。
+PaddleDTX 目前已经开源多元线性回归和多元逻辑回归、神经网络算法，决策树等更丰富的机器学习算法即将开源。
 
 ### 2.1 多元线性回归
 多元线性回归用来描述一个变量受多个因素影响，且他们的关系可以用多元线性方程表示的场景。如房屋价格受房屋大小、楼层数、周边环境等因素影响。
@@ -36,12 +36,16 @@ y = 1 / (1 + e<sup>-&theta;X</sup>)
 
 该模型是基于线性回归模型变化得到的，模型连续可导，且可以保证目标特征是(0,1)之间的数值，越接近1表明样本是指定值的概率越大。学习过程就是通过迭代找到合适的参数&theta;，使得模型在训练集合的误差尽量小。
 
+### 2.3 神经网络
+神经网络是一种由大量的节点(或称为神经元)相互联接构成的运算模型，理论上可以逼近任意函数。
+在神经网络模型定义和训练的过程中，有很多标准的算法和流程，因此诞生了深度学习算法框架。PaddleDTX的神经网络算法实现依赖了应用广泛的 [PaddleFL 框架](https://github.com/PaddlePaddle/PaddleFL/blob/master/README_cn.md)，可以使用 PaddleFL 提供的所有神经网络算法模型。
+
 ## 3. 纵向联邦学习
-PaddleDTX 目前已经开源两方的纵向联邦学习算法，包括多元线性回归和多元逻辑回归。多方横向联邦学习和多方纵向联邦学习相关算法即将开源，敬请期待。
+PaddleDTX 目前开源了两方的纵向联邦学习算法，包括多元线性回归和多元逻辑回归。多方横向联邦学习和多方纵向联邦学习相关算法即将开源，敬请期待。
 
-纵向联邦训练和预测步骤如下：
+多元线性回归与多元逻辑回归的纵向联邦训练和预测步骤如下：
 
-<img src='../../_static/vertical_learning.png' width = "100%" height = "100%" align="middle"/>
+<img src='../../_static/vertical_learning.png' width = "80%" height = "80%" align="middle"/>
 
 ### 3.1 数据准备
 计算任务会指定参与方的样本数据，数据存在去中心化存储系统(XuperDB)中。任务启动前，任务计算方(即数据持有方)需要从XuperDB获取自己的样本数据。
diff --git a/docs/source/details/framework.md b/docs/source/details/framework.md
@@ -2,7 +2,7 @@
 
 PaddleDTX 主要由计算需求方、任务执行节点、数据持有节点、存储节点和区块链节点组成，部署架构如下图所示：
 
-<img src='../../_static/deployment.png' width = "100%" height = "100%" align="middle"/>
+<img src='../../_static/deployment.png' width = "60%" height = "60%" align="middle"/>
 
 !!! note "说明"
 
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -31,7 +31,7 @@
     <div class="card-holder container">
         <div class="card rocket container">
             <p class="introtitle">应用案例</p>
-            <p class="introcontent">通过测试案例可以评估模型训练、预测的效果，PaddleDTX提供了基于波士顿房价预测的线性回归算法和基于鸢尾花的逻辑回归算法。</p>
+            <p class="introcontent">通过测试案例可以评估模型训练、预测的效果，PaddleDTX提供了基于波士顿房价预测的线性回归、神经网络算法和基于鸢尾花的逻辑回归算法。</p>
             <p class="introdetails"><b><a href="./projectcases/linear">查看详情</a></b></p>
         </div>
     </div>
diff --git a/docs/source/introduction/introduction.md b/docs/source/introduction/introduction.md
@@ -12,7 +12,7 @@ PaddleDTX的主要特征如下:
 ## 架构概览
 PaddleDTX由多方安全计算网络、去中心化存储网络、区块链网络构建而成。
 
-<img src='../../_static/architecture.png' width = "100%" height = "100%" align="middle"/>
+<img src='../../_static/architecture.png' width = "80%" height = "80%" align="middle"/>
 
 ### 1.1 多方安全计算网络
 有预测需求的一方为计算需求节点。可获取样本数据进行模型训练和预测的一方为任务执行节点，多个任务执行节点组成一个SMPC（多方安全计算）网络。计算需求节点将任务发布到区块链网络，任务执行节点确认后执行任务。数据持有节点对任务执行节点的计算数据做信任背书。
diff --git a/docs/source/others/issues.md b/docs/source/others/issues.md
@@ -50,7 +50,7 @@ A：在保护样本数据的安全性方面，纯软方案中采用的是资源
 
 A：DAI任务执行节点的性能与机器性能相关，任务执行的超时时间限制、最大并发数均可以在配置文件中修改，当前默认任务训练超时时间1小时、并发数100。
 
-**Q：PaddleDTX当前实现的两类纵向联邦学习算法，均采用Paillier同态进行加密参数传输，在训练的迭代过程中需多次进行同态加解密运算，而Paiilier算法的性能会大大影响分布式AI的整体性能，后续是否会优化该算法？**
+**Q：PaddleDTX当前实现的纵向线性回归和逻辑回归算法，均采用Paillier同态进行加密参数传输，在训练的迭代过程中需多次进行同态加解密运算，而Paiilier算法的性能会大大影响分布式AI的整体性能，后续是否会优化该算法？**
 
 A：该Topic正在计划中，敬请关注。
 
diff --git a/docs/source/quickstart/quickstart.md b/docs/source/quickstart/quickstart.md
@@ -90,7 +90,7 @@ Usage:
 
     用户可通过cat ./paddledtx_test.sh查看脚本默认创建的文件存储命名空间、上传文件列表等，如有额外需求，可自定义配置；
 
-    脚本执行的 start_vl_linear_train、start_vl_linear_predict、start_vl_logistic_train、start_vl_logistic_train、start_vl_dnn_train、start_vl_dnn_predic 命令，本质为用户展示了多元线性回归、多元逻辑回归和神经网络算法的项目案例，参考 [项目案例](../projectcases/linear.md)
+    脚本执行的 start_vl_linear_train、start_vl_linear_predict、start_vl_logistic_train、start_vl_logistic_predict、start_vl_dnn_train、start_vl_dnn_predic 命令，本质为用户展示了多元线性回归、多元逻辑回归和神经网络算法的项目案例，参考 [项目案例](../projectcases/linear.md)
 
 1. 上传训练及预测样本文件
    ```shell