Feature auto load datasets #165 #171
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue link: #136
Description
Add auto-load datasets from directory feature to SemanticModel. Users can now initialize the semantic model by passing a folder path, and it will automatically scan for supported data files (CSV, Parquet, Excel) and load them as datasets using the DuckDB adapter.
Type of Change
✨ New feature (non-breaking change which adds functionality)
Related Issue(s)
Fixes #[HELP WANTED] Feature: Auto-load datasets from directory in SemanticModel
Changes Made
Testing
Test Commands:
uv run python test_auto_load.py
uv run pytest tests/semantic_search/ -v
uv run pytest tests/ -v --tb=short
All tests pass: 63 passed, 18 skipped + custom auto-load tests ✅
Checklist
-[x] My code follows the code style of this project
-[x] Unit tests pass locally
-[x] New and existing functionality works
-[x] No breaking changes
Additional Context
This feature simplifies getting started by allowing users to pass a data directory instead of manually constructing datasets. Example: sm = SemanticModel("./my_data_folder") will auto-discover and load all CSV/Parquet/Excel files.