Skip to content

[FEATURE] Implement Intelligent Docstring Injection to Respect Language-Specific ConventionsΒ #24

@Artemonim

Description

@Artemonim

Currently, AgentDocstrings prepends its Table of Contents (ToC) to the beginning of a file's docstring. While this works for files with no prior documentation, it breaks the established documentation conventions for most supported languages when a docstring already exists.

The core issue is the "summary-first" convention, where documentation generators (like Sphinx, Javadoc, Doxygen, etc.) treat the first line or paragraph of a docstring as a summary for quick reference tables. By prepending the ToC, AgentDocstrings effectively replaces the intended summary with the ToC, leading to incorrect documentation generation and conflicts with code formatters like black.

Concrete Example (Python)

A file without a manual summary is formatted correctly:

# test_file_1.py
"""
    --- AUTO-GENERATED DOCSTRING ---
    ...
    --- END AUTO-GENERATED DOCSTRING ---
"""

A file with a manual summary is broken, and subsequently "fixed" incorrectly by formatters:

# utils.py - Before AgentDocstrings
"""Utility functions for the project."""

# After AgentDocstrings runs, it becomes:
"""
    --- AUTO-GENERATED DOCSTRING ---
    ...
    --- END AUTO-GENERATED DOCSTRING ---
Utility functions for the project.
"""
# This is invalid according to PEP 257.

Affected Languages

This is a universal problem affecting most, if not all, supported languages:

  • Python: Breaks PEP 257 convention.
  • Java/Kotlin (Javadoc/KDoc): The ToC becomes the class/method summary.
  • C#/Delphi (XML Docs): The ToC can break the XML structure if not injected carefully (e.g., outside the <summary> tag).
  • JS/TS (JSDoc/TSDoc): Same issue as Javadoc.
  • Go, C/C++ (Doxygen): The ToC becomes the brief description.
  • PowerShell: The ToC can be misplaced before the .SYNOPSIS block.

Proposed Solution: Intelligent Injection Logic

I propose enhancing AgentDocstrings to intelligently inject the ToC instead of simply prepending it. The new logic should be:

  1. Detect Existing Docstring: Check if a module/class/function docstring already exists.
  2. No Docstring: If none exists, create a new one with the ToC as the content.
  3. Existing Docstring:
    • Parse the Summary: Identify and extract the first paragraph (the summary line/block).
    • Identify the Insertion Point: The ideal insertion point is after the summary and the mandatory blank line that follows it, but before the rest of the detailed description.
    • Inject ToC: Insert the --- AUTO-GENERATED DOCSTRING --- block at this identified insertion point.
    • Update ToC: If an old AgentDocstrings block already exists, it should be replaced, leaving the manual summary and other parts of the docstring intact.

Example of Desired Behavior (Python):

Initial Code:

"""This is a summary line for the module.

This is a more detailed description that should be preserved.
"""

Code After Intelligent Injection:

"""This is a summary line for the module.

    --- AUTO-GENERATED DOCSTRING ---
    Table of content is automatically generated by Agent Docstrings v1.3.1
    ...
    --- END AUTO-GENERATED DOCSTRING ---

This is a more detailed description that should be preserved.
"""

This approach would ensure full compatibility with language standards, documentation generators, and code formatters.


Priority Level

😎 High - Would significantly improve my workflow

Implementation Complexity

  • πŸ€” Simple - Minor change or addition
  • πŸ˜‘ Moderate - Requires new parsing logic
  • πŸ› οΈ Complex - Major feature requiring significant development
  • 😎 I don't know

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions