Feature auto load datasets #165 #171

Mukesh-P · 2025-11-28T01:49:13Z

Issue link: #136

Description

Add auto-load datasets from directory feature to SemanticModel. Users can now initialize the semantic model by passing a folder path, and it will automatically scan for supported data files (CSV, Parquet, Excel) and load them as datasets using the DuckDB adapter.

Type of Change

✨ New feature (non-breaking change which adds functionality)
Related Issue(s)

Fixes #[HELP WANTED] Feature: Auto-load datasets from directory in SemanticModel

Changes Made

Enhanced SemanticModel.init to accept str parameter for directory paths
Added _initialize_from_folder method with file extension detection (.csv→csv, .parquet→parquet, .xlsx→xlsx)
Implements automatic DataSet creation using DuckDB configuration
Added comprehensive error handling for invalid paths and empty directories
Maintains full backward compatibility with existing dict/list initialization

Testing
Test Commands:

uv run python test_auto_load.py
uv run pytest tests/semantic_search/ -v
uv run pytest tests/ -v --tb=short
All tests pass: 63 passed, 18 skipped + custom auto-load tests ✅

Checklist

-[x] My code follows the code style of this project
-[x] Unit tests pass locally
-[x] New and existing functionality works
-[x] No breaking changes

Additional Context
This feature simplifies getting started by allowing users to pass a data directory instead of manually constructing datasets. Example: sm = SemanticModel("./my_data_folder") will auto-discover and load all CSV/Parquet/Excel files.

…collection creation

Hsekumsti@gmail.com added 2 commits November 28, 2025 07:11

Fix Qdrant cloud indexing: Add keyword index for 'type' field during …

df23e9d

…collection creation

Add auto-load datasets from directory feature to SemanticModel

7644827

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature auto load datasets #165 #171

Feature auto load datasets #165 #171

Uh oh!

Mukesh-P commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature auto load datasets #165 #171

Are you sure you want to change the base?

Feature auto load datasets #165 #171

Uh oh!

Conversation

Mukesh-P commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant