This repository contains tutorials and experiments for generative models on network trace datasets. This instruction is for AINTEC'2025 tutorial, which is taylored to its specific enviroment and setup. For general purpose use case please refer to the README.md in main branch.
Note
Hardware Requirements: This README.md is for AINTEC'2025 tutorial ONLY.
Step 1: Clone the repo to your server
cd scratch3
git clone https://github.com/netsharecmu/generative-trace-tutorials.git
cd generative-trace-tutorialsStep 2: Use this salloc command to allocate a GPU node to your account:
salloc -p gpu_a100 -q 2c-1h_gpu-a100_1g.10g --gres=gpu:1 --reservation=aintec-workshopWait until a node is allocated to you.
Step 3: Use the module command to load the workshop environment:
module load aintec-2025Then run the following commands in generative-trace-tutorials folder to install the dependencies:
pip install mmh3
pip install -e src/netshareStep 4: Run this command to start the Jupyter Notebook:
srun run-notebookStep 5: While the notebook is starting, open another terminal window run the ssh command printed by the previous command:
# look for the ssh command that looks like this and run it on another terminal window:
ssh -vv -NL <port>:<hostname>:<port> <user>@saliksik.asti.dost.gov.ph
# then enter the same passwordStep 6: Go back to the previous terminal window and use the localhost (127.0.0.1) link (the third link from the screenshot below) to connect to the notebook using a browser. You are now connected to COARE’s Saliksik HPC Cluster through Jupyter Notebook
In this tutorial we have two datasets:
- Tabular Dataset:
data/sample_tabular_data.csv. - Network Dataset:
data/caida-10k.csv.
For each dataset, we have notebooks for training, evaluation and downstream tasks.
- Training: We have
tabular_CTGAN.ipynbusing CTGAN andtabular_RealTabFormer.ipynbusing RealTabFormer. - Evaluation:
tabular_quality_check.ipynbevaluates the data quality using both average JSD and customized queries written by domain experts. - Downstream Task:
tabular_tasks.ipynbusing synthetic data for data augmentation in ML predictor training. We use two different ML predictors (SVM and Dicision Tree) here.
- Training: We have
network_ctgan.ipynbusing CTGAN andnetwork_netshare.ipynbusing NetShare. - Evaluation:
network_quality_check.ipynbevaluates the data quality using both average JSD and customized queries written by domain experts. - Downstream Task:
network_tasks.ipynbusing synthetic data for network measurement system testing. Specifically we test two measurement algorithms (SpaceSaving and Count-Min Sketch+Heap) on their hit rate for the Top-K most frequent flow identification.