Skip to content

Commit 6d4039d

Browse files
committed
Add need to know
1 parent 3c6137b commit 6d4039d

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@ The purpose of this repository is to let people evaluate the quality of datasets
99

1010
* [Google Colab Notebook](https://colab.research.google.com/drive/1c8rWB2gtUrBHQcmmvA_NAXxc7Cexn1vM?usp=sharing)
1111

12+
## Running the app
13+
### Instructions
1214

13-
## Instructions
14-
15-
1.Prerequisites
15+
1. Prerequisites
1616
Note that the code only works `Python >= 3.9` and `streamlit >= 1.23.1`
1717

1818
```
@@ -26,11 +26,16 @@ $ cd HuggingFace-Datasets-Text-Quality-Analysis
2626
$ pip install -r requirements.txt
2727
```
2828

29-
3.Run Streamlit application
29+
3. Run Streamlit application
3030
```
3131
python -m streamlit run app.py
3232
```
3333

34+
### Need to know
35+
36+
When the dataset you download from Hugging Face is too large, running the application may exceed the memory of your machine and causes some errors. Sample the data or refer to some libraries that can run Pandas on a cluster, such as Xorbits, Dask.
37+
38+
3439
## Todos
3540

3641
- [ ] Introduce more dimensions to evaluate the dataset quality

0 commit comments

Comments
 (0)