AINTEC Tutorial Instructions

This repository contains tutorials and experiments for generative models on network trace datasets. This instruction is for AINTEC'2025 tutorial, which is taylored to its specific enviroment and setup. For general purpose use case please refer to the README.md in main branch.

Set up environement

Note

Hardware Requirements: This README.md is for AINTEC'2025 tutorial ONLY.

Step 1: Clone the repo to your server

cd scratch3
git clone https://github.com/netsharecmu/generative-trace-tutorials.git
cd generative-trace-tutorials

Step 2: Use this salloc command to allocate a GPU node to your account:

salloc -p gpu_a100 -q 2c-1h_gpu-a100_1g.10g --gres=gpu:1 --reservation=aintec-workshop

Wait until a node is allocated to you.

Step 3: Use the module command to load the workshop environment:

module load aintec-2025

Then run the following commands in generative-trace-tutorials folder to install the dependencies:

pip install mmh3
pip install -e src/netshare

Step 4: Run this command to start the Jupyter Notebook:

srun run-notebook

Step 5: While the notebook is starting, open another terminal window run the ssh command printed by the previous command:

# look for the ssh command that looks like this and run it on another terminal window:
ssh -vv -NL <port>:<hostname>:<port> <user>@saliksik.asti.dost.gov.ph
# then enter the same password

Step 6: Go back to the previous terminal window and use the localhost (127.0.0.1) link (the third link from the screenshot below) to connect to the notebook using a browser. You are now connected to COARE’s Saliksik HPC Cluster through Jupyter Notebook

Run experiments

In this tutorial we have two datasets:

Tabular Dataset: data/sample_tabular_data.csv.
Network Dataset: data/caida-10k.csv.

For each dataset, we have notebooks for training, evaluation and downstream tasks.

Tabular Dataset

Training: We have tabular_CTGAN.ipynb using CTGAN and tabular_RealTabFormer.ipynb using RealTabFormer.
Evaluation: tabular_quality_check.ipynb evaluates the data quality using both average JSD and customized queries written by domain experts.
Downstream Task: tabular_tasks.ipynb using synthetic data for data augmentation in ML predictor training. We use two different ML predictors (SVM and Dicision Tree) here.

Network Dataset

Training: We have network_ctgan.ipynb using CTGAN and network_netshare.ipynb using NetShare.
Evaluation: network_quality_check.ipynb evaluates the data quality using both average JSD and customized queries written by domain experts.
Downstream Task: network_tasks.ipynb using synthetic data for network measurement system testing. Specifically we test two measurement algorithms (SpaceSaving and Count-Min Sketch+Heap) on their hit rate for the Top-K most frequent flow identification.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AINTEC Tutorial Instructions

Set up environement

Run experiments

Tabular Dataset

Network Dataset

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

netsharecmu/generative-trace-tutorials

Folders and files

Latest commit

History

Repository files navigation

AINTEC Tutorial Instructions

Set up environement

Run experiments

Tabular Dataset

Network Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages