Skip to content

Conversation

@thewebscraping
Copy link
Owner

@thewebscraping thewebscraping commented Nov 29, 2025

Overview

  • Unify logging via Logger with configurable LOG_LEVEL.
  • Replace generic exceptions with specific ones from exceptions.py.
  • Migrate all configuration reads to api_settings.

Changes

  • Logging: Introduce Logger and use across adapters/core; replace logging.getLogger(__name__) with Logger(...).
  • Exceptions: Use MissingFieldError, InvalidFieldError, MissingConfigError, SearchError, etc.; update schema.py and embeddings (gemini, openai).
  • Settings: Replace os.getenv(...) with direct api_settings.* for Astra, Chroma, Milvus, PGVector, and embeddings.
  • Adapters: Align astradb, chroma, milvus, pgvector to consistent patterns (init logs, metrics, connection handling).
  • Engine/Schema: Confirm engine logging approach; update schema exceptions.

Rationale

  • Clearer error semantics and actionable messages.
  • Consistent, centralized logging configuration.
  • Reduced env drift through unified settings access.

Validation

  • Pre-commit hooks pass (ruff, format, whitespace, lint).
  • Tests passing locally (pytest -q).

- Remove wrapper types (UpsertRequest, SearchRequest, VectorStatus, UpsertInput)
- Engine methods now return ABC types directly (VectorDocument instead of dicts)
- Add helper methods: create_from_texts, upsert_from_texts, update_from_texts
- Remove types.py - replace DocumentIds with Union[str, Sequence[str]]
- Remove unused functions: normalize_documents, extract_unique_query
- Remove Document and normalize_documents from public API exports
- Add utils helpers: normalize_texts, normalize_metadatas, normalize_pks
- Enhanced search with offset and where filtering across all adapters
- Remove unique_fields parameter (only used by 1 of 4 adapters)
- Add collection management: add_collection, get_collection, get_or_create_collection
- Updated Quick Start examples to use create_from_texts() helper
- Added PRIMARY_KEY_MODE configuration docs
- Fixed test fixtures to return dict with texts/metadatas/pks
- Updated all test methods to use new API (no more wrapper types)
- Removed test_flexible_input.py (tests removed internal functions)
- Added missing ABC methods to MockDBAdapter (create, get_or_create, update, update_or_create)
- Fixed normalize_pks() to pad list with None values
- Fixed VectorDocument construction to use 'id' parameter instead of 'pk'

All engine tests now pass with the simplified API.
- Rename VectorDocument class (backward compat alias maintained)
- Remove SearchRequest/UpsertRequest wrappers - use direct method calls
- Add private _vector attribute with emb property
- Move generate_pk and helpers from schema to utils
- Reorganize utils.py into logical sections
- Update all docs to reflect new API and PK generation modes
- Fix integration tests to use new engine methods
- Delete obsolete test_schema.py
@thewebscraping thewebscraping changed the title Refactor: Simplify API and Reorganize Core Components refactor: standardize logging, exceptions, and settings across adapters Nov 29, 2025
@thewebscraping thewebscraping force-pushed the standard-dbs branch 8 times, most recently from 6c361b0 to 6d21ca6 Compare November 29, 2025 20:16
- Add Logger class with configurable LOG_LEVEL setting
- Replace all module loggers with Logger class
- Use specific exceptions from exceptions.py instead of generic ValueError/Exception
- Replace os.getenv with direct api_settings access
- Update all adapters (astradb, chroma, milvus, pgvector) with consistent patterns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants