Skip to content

Conversation

@Mukesh-P
Copy link
Contributor

Issue link: #136

Description

Add auto-load datasets from directory feature to SemanticModel. Users can now initialize the semantic model by passing a folder path, and it will automatically scan for supported data files (CSV, Parquet, Excel) and load them as datasets using the DuckDB adapter.

Type of Change

✨ New feature (non-breaking change which adds functionality)
Related Issue(s)

Fixes #[HELP WANTED] Feature: Auto-load datasets from directory in SemanticModel

Changes Made

  • Enhanced SemanticModel.init to accept str parameter for directory paths
  • Added _initialize_from_folder method with file extension detection (.csv→csv, .parquet→parquet, .xlsx→xlsx)
  • Implements automatic DataSet creation using DuckDB configuration
  • Added comprehensive error handling for invalid paths and empty directories
  • Maintains full backward compatibility with existing dict/list initialization

Testing
Test Commands:

uv run python test_auto_load.py
uv run pytest tests/semantic_search/ -v
uv run pytest tests/ -v --tb=short
All tests pass: 63 passed, 18 skipped + custom auto-load tests ✅

Checklist

-[x] My code follows the code style of this project
-[x] Unit tests pass locally
-[x] New and existing functionality works
-[x] No breaking changes

Additional Context
This feature simplifies getting started by allowing users to pass a data directory instead of manually constructing datasets. Example: sm = SemanticModel("./my_data_folder") will auto-discover and load all CSV/Parquet/Excel files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant