Skip to content

Commit 7ef02b0

Browse files
Remove legacy Research directory (#4428)
* delete contrib Research and legacy * add acl2019-arnor * add readme.md for PaddleKG * update CoKE
1 parent d7fbd95 commit 7ef02b0

File tree

730 files changed

+15
-356004
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

730 files changed

+15
-356004
lines changed

PaddleKG/CoKE/README.md

Lines changed: 1 addition & 176 deletions
Original file line numberDiff line numberDiff line change
@@ -1,176 +1 @@
1-
# CoKE: Contextualized Knowledge Graph Embedding
2-
## Introduction
3-
4-
This is the [PaddlePaddle](https://www.paddlepaddle.org.cn/) implementation of the [CoKE](https://arxiv.org/abs/1911.02168) model for Knowledge Graph Embedding(KGE).
5-
6-
CoKE is a novel KGE paradigm that learns dynamic, flexible, and fully contextualized entity and relation representations for a given Knowledge Graph(KG).
7-
It takes a sequence of entities and relations as input, and uses [Transformer](https://arxiv.org/abs/1706.03762) to obtain contextualized representations for its components.
8-
These representations are hence dynamically adaptive to the input, capturing contextual meanings of entities and relations therein.
9-
10-
Evaluation on a wide variety of public benchmarks verifies the superiority of CoKE in link prediction (also known as Knowledge Graph Completion, or KBC for short) and path query answering tasks.
11-
CoKE performs consistently better than, or at least equally well as current state-of-the-art in almost every case.
12-
13-
14-
## Requirements
15-
The code has been tested running under the following environments:
16-
- Python 3.6.5 with the following dependencies:
17-
- PaddlePaddle 1.5.0
18-
- numpy 1.16.3
19-
- Python 2.7.14 for data_preprocess
20-
21-
- GPU environments:
22-
- CUDA 9.0, CuDNN v7 and NCCL 2.3.7
23-
- GPU: all the datasets run on 1 P40 GPU with our given configurations.
24-
25-
26-
## Model Training and Evaluation
27-
28-
### step1. Download dataset files
29-
Download dataset files used in our paper by running:
30-
31-
```
32-
sh wget_datasets.sh
33-
```
34-
35-
This will first download the 4 widely used KBC datasets ([FB15k&WN18](http://papers.nips.cc/paper/5071-translating-embeddings-for-modeling-multi-relational-data.pdf),
36-
[FB15k-237](https://www.aclweb.org/anthology/W15-4007/),
37-
[WN18RR](https://arxiv.org/abs/1707.01476))
38-
and 2 path query answering datasets ([wordnet_paths and freebase_paths](https://arxiv.org/abs/1506.01094)) .
39-
40-
Then it organize the train/valid/test files as the following `data` directory:
41-
42-
```
43-
data
44-
├── fb15k
45-
│ ├── test.txt
46-
│ ├── train.txt
47-
│ └── valid.txt
48-
├── fb15k237
49-
│ ├── test.txt
50-
│ ├── train.txt
51-
│ └── valid.txt
52-
├── pathqueryFB #the original data name is: freebase_paths
53-
│ ├── dev
54-
│ ├── test
55-
│ └── train
56-
├── pathqueryWN #the original data name is: wordnet_paths
57-
│ ├── dev
58-
│ ├── test
59-
│ └── train
60-
├── wn18
61-
│ ├── test.txt
62-
│ ├── train.txt
63-
│ └── valid.txt
64-
└── wn18rr
65-
├── test.txt
66-
├── train.txt
67-
└── valid.txt
68-
```
69-
70-
### step2. Data preprocess
71-
Data preprocess commands are given in `data_preprocess.sh`.
72-
It takes raw train/valid/test files as input, and generates CoKE training and evaluation files.
73-
74-
```
75-
sh data_preprocess.sh
76-
```
77-
78-
### step3. Training
79-
80-
Model training commands are given in `kbc_train.sh` for KBC datasets, and `pathquery_train.sh` for pathquery datasets.
81-
These scripts take a configuration file and GPU-ids as input arguments.
82-
Train the model with a given configuration file.
83-
84-
For example, the following commands train *fb15k* and *pathqueryFB* each with a configuration file:
85-
86-
```
87-
sh kbc_train.sh ./configs/fb15k_job_config.sh 0
88-
sh pathquery_train.sh ./configs/pathqueryFB_job_config.sh 0
89-
```
90-
91-
92-
### step4. Evaluation
93-
Model evaluation commands are given in `kbc_test.sh` for KBC datasets, and `pathquery_test.sh` for pathquery datasets.
94-
These scripts take a configuration file and GPU-ids as input arguments.
95-
96-
For example, the following commands evaluate on *fb15k* and *pathqueryFB*:
97-
98-
```
99-
sh kbc_test.sh ./configs/fb15k_job_config.sh 0
100-
sh pathquery_test.sh ./configs/pathqueryFB_job_config.sh 0
101-
```
102-
103-
We also provide trained model checkpoints on the 4 KBC datasets. Download these models to `kbc_models` directory using the following command:
104-
105-
106-
```
107-
sh wget_kbc_models.sh
108-
```
109-
110-
The `kbc_models` contains the following files:
111-
112-
```
113-
kbc_models
114-
├── fb15k
115-
│   ├── models
116-
│   └── vocab.txt #md5: 0720db5edbda69e00c05441a615db152
117-
├── fb15k237
118-
│   ├── models
119-
│   └── vocab.txt #md5: e843936790e48b3cbb35aa387d0d0fe5
120-
├── wn18
121-
│   ├── models
122-
│   └── vocab.txt #md5: 4904a9300fc3e54aea026ecba7d2c78e
123-
└── wn18rr
124-
├── models
125-
└── vocab.txt #md5: c76aecebf5fc682f0e7922aeba380dd6
126-
```
127-
128-
Check that your preprocessed `vocab.txt` files are identical to ours before evaluation with these models.
129-
130-
131-
## Results
132-
Results on KBC datasets:
133-
134-
|Dataset | MRR | HITS@1 | HITS@5 | HITS@10 |
135-
|---|---|---|---|---|
136-
|FB15K | 0.852 | 0.823 |0.868 | 0.904 |
137-
|FB15K237| 0.361 | 0.269 | 0.398 | 0.547 |
138-
|WN18| 0.951 | 0.947 |0.954 | 0.960|
139-
|WN18RR| 0.475 | 0.437 | 0.490 | 0.552 |
140-
141-
Results on path query datasets:
142-
143-
|Dataset | MQ | HITS@10 |
144-
|---|---|---|
145-
|Freebase | 0.948 | 0.764|
146-
|WordNet |0.942 | 0.674 |
147-
148-
## Reproducing the results
149-
150-
Here are the configs to reproduce our results.
151-
These are also given in the `configs/${TASK}_job_config.sh` files.
152-
153-
| Dataset | NetConfig | lr | softlabel | epoch | batch_size | dropout |
154-
|---|---|---|---|---|---| ---|
155-
|FB15K| L=6, H=256, A=4| 5e-4 | 0.8 | 300 | 512| 0.1 |
156-
|WN18| L=6, H=256, A=4| 5e-4| 0.2 | 500 | 512 | 0.1 |
157-
|FB15K237| L=6, H=256, A=4| 5e-4| 0.25 | 800 | 512 | 0.5 |
158-
|WN18RR| L=6, H=256, A=4|3e-4 | 0.15 | 800 | 1024 | 0.1 |
159-
|pathqueryFB | L=6, H=256, A=4 | 3e-4 | 1 | 10 | 2048 | 0.1 |
160-
|pathqueryWN | L=6, H=256, A=4 | 3e-4 | 1 | 5 | 2048 | 0.1 |
161-
162-
## Citation
163-
If you use any source code included in this project in your work, please cite the following paper:
164-
165-
```
166-
@article{wang2019:coke,
167-
title={CoKE: Contextualized Knowledge Graph Embedding},
168-
author={Wang, Quan and Huang, Pingping and Wang, Haifeng and Dai, Songtai and Jiang, Wenbin and Liu, Jing and Lyu, Yajuan and Wu, Hua},
169-
journal={arXiv:1911.02168},
170-
year={2019}
171-
}
172-
```
173-
174-
175-
## Copyright and License
176-
Copyright 2019 Baidu.com, Inc. All Rights Reserved Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
1+
This work has been moved to new address: [CoKE](https://github.com/PaddlePaddle/Research/tree/master/KG/CoKE)

PaddleKG/CoKE/bin/evaluation.py

Lines changed: 0 additions & 180 deletions
This file was deleted.

0 commit comments

Comments
 (0)