Skip to content

netsharecmu/generative-trace-tutorials

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AINTEC Tutorial Instructions

This repository contains tutorials and experiments for generative models on network trace datasets. This instruction is for AINTEC'2025 tutorial, which is taylored to its specific enviroment and setup. For general purpose use case please refer to the README.md in main branch.

Set up environement

Note

Hardware Requirements: This README.md is for AINTEC'2025 tutorial ONLY.

Step 1: Clone the repo to your server

cd scratch3
git clone https://github.com/netsharecmu/generative-trace-tutorials.git
cd generative-trace-tutorials

Step 2: Use this salloc command to allocate a GPU node to your account:

salloc -p gpu_a100 -q 2c-1h_gpu-a100_1g.10g --gres=gpu:1 --reservation=aintec-workshop

Wait until a node is allocated to you.

Step 3: Use the module command to load the workshop environment:

module load aintec-2025

Then run the following commands in generative-trace-tutorials folder to install the dependencies:

pip install mmh3
pip install -e src/netshare

Step 4: Run this command to start the Jupyter Notebook:

srun run-notebook

Step 5: While the notebook is starting, open another terminal window run the ssh command printed by the previous command:

# look for the ssh command that looks like this and run it on another terminal window:
ssh -vv -NL <port>:<hostname>:<port> <user>@saliksik.asti.dost.gov.ph
# then enter the same password

Step 6: Go back to the previous terminal window and use the localhost (127.0.0.1) link (the third link from the screenshot below) to connect to the notebook using a browser. You are now connected to COARE’s Saliksik HPC Cluster through Jupyter Notebook

Run experiments

In this tutorial we have two datasets:

  • Tabular Dataset: data/sample_tabular_data.csv.
  • Network Dataset: data/caida-10k.csv.

For each dataset, we have notebooks for training, evaluation and downstream tasks.

Tabular Dataset

  • Training: We have tabular_CTGAN.ipynb using CTGAN and tabular_RealTabFormer.ipynb using RealTabFormer.
  • Evaluation: tabular_quality_check.ipynb evaluates the data quality using both average JSD and customized queries written by domain experts.
  • Downstream Task: tabular_tasks.ipynb using synthetic data for data augmentation in ML predictor training. We use two different ML predictors (SVM and Dicision Tree) here.

Network Dataset

  • Training: We have network_ctgan.ipynb using CTGAN and network_netshare.ipynb using NetShare.
  • Evaluation: network_quality_check.ipynb evaluates the data quality using both average JSD and customized queries written by domain experts.
  • Downstream Task: network_tasks.ipynb using synthetic data for network measurement system testing. Specifically we test two measurement algorithms (SpaceSaving and Count-Min Sketch+Heap) on their hit rate for the Top-K most frequent flow identification.

About

Tutorials on generative models for synthetic network trace generation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 5