Skip to content

Conversation

@martin-georgiev
Copy link
Owner

@martin-georgiev martin-georgiev commented Apr 20, 2025

Summary by CodeRabbit

  • New Features
    • Added support for PostgreSQL regular expression functions in Doctrine ORM DQL: REGEXP_COUNT, REGEXP_INSTR, REGEXP_SUBSTR, and enhanced REGEXP_REPLACE with variadic argument support.
  • Refactor
    • Updated the implementation of the REGEXP_REPLACE function to support a variable number of arguments.
  • Tests
    • Introduced comprehensive tests for REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, and REGEXP_SUBSTR functions, including validation for correct argument counts and error handling.

…SUBSTR()` and extend support for `REGEXP_REPLACE()`
@coderabbitai
Copy link

coderabbitai bot commented Apr 20, 2025

Walkthrough

This update introduces new Doctrine DQL function classes for PostgreSQL regular expression operations: RegexpCount, RegexpInstr, and RegexpSubstr. The RegexpReplace class is refactored to support variadic arguments, aligning its implementation with the new functions. Each class defines argument count constraints and valid argument patterns, enabling flexible DQL usage. Corresponding PHPUnit test classes are added for each function, verifying SQL translation and argument validation, including negative tests for invalid argument counts. The changes ensure robust support for PostgreSQL regex functions in Doctrine DQL with comprehensive test coverage.

Changes

File(s) Change Summary
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCount.php
RegexpInstr.php
RegexpSubstr.php
Added new classes implementing PostgreSQL's REGEXP_COUNT, REGEXP_INSTR, and REGEXP_SUBSTR as variadic DQL functions, each defining function name, argument count limits, and valid argument patterns.
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php Refactored: Now extends BaseVariadicFunction, adds getNodeMappingPattern(), and splits argument count method into getMinArgumentCount()/getMaxArgumentCount().
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php
RegexpInstrTest.php
RegexpSubstrTest.php
Added PHPUnit test classes for new regex functions, covering valid and invalid argument scenarios, SQL translation, and exception handling for argument count violations.
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php Refactored to use base variadic function test case, added tests for argument count validation, and expanded test coverage for different argument scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant DQL as Doctrine DQL Query
    participant Func as Regexp* Function Class
    participant SQL as PostgreSQL

    DQL->>Func: Parse function call with arguments
    Func->>Func: Validate argument count and pattern
    Func->>SQL: Generate SQL with appropriate REGEXP_* function and arguments
    SQL-->>Func: Return SQL result
    Func-->>DQL: Return result to Doctrine
Loading

Possibly related PRs

Poem

Regexes hop through Doctrine's land,
New functions join the bunny band.
Count, replace, and search with glee—
Arguments checked, as strict as can be!
Tests abound, so bugs can't hide,
In regex fields, these rabbits glide.
🐇✨

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstrTest.php (1)

26-36: Test SQL statements align with PostgreSQL function usage, but consider adding additional test cases.

The tests cover various parameter combinations, from basic usage to the full 6-parameter version. However, PostgreSQL documentation describes more complex use cases for REGEXP_INSTR that could be valuable to test:

  1. Testing return option 1 (returns end position of match instead of start position)
  2. Testing with case-sensitive vs case-insensitive flag differences
  3. Testing matches that span multiple characters

Consider adding additional test cases that demonstrate more complex regex patterns and the effect of the return_option parameter, such as:

 protected function getExpectedSqlStatements(): array
 {
     return [
         'finds position of pattern' => "SELECT regexp_instr(c0_.text1, 'c(.)(...)') AS sclr_0 FROM ContainsTexts c0_",
         'finds position of digits' => "SELECT regexp_instr(c0_.text1, '\\d+') AS sclr_0 FROM ContainsTexts c0_",
         'with start position' => "SELECT regexp_instr(c0_.text1, '\\d+', 1) AS sclr_0 FROM ContainsTexts c0_",
         'with start and occurrence' => "SELECT regexp_instr(c0_.text1, '\\d+', 1, 2) AS sclr_0 FROM ContainsTexts c0_",
         'with start, occurrence and return_option' => "SELECT regexp_instr(c0_.text1, '\\d+', 1, 2, 0) AS sclr_0 FROM ContainsTexts c0_",
         'with all parameters' => "SELECT regexp_instr(c0_.text1, '\\d+', 1, 2, 0, 'i') AS sclr_0 FROM ContainsTexts c0_",
+        'with return_option 1' => "SELECT regexp_instr(c0_.text1, '\\d+', 1, 1, 1) AS sclr_0 FROM ContainsTexts c0_",
+        'case sensitivity test' => "SELECT regexp_instr(c0_.text1, '[A-Z]+') AS sclr_0 FROM ContainsTexts c0_",
+        'case insensitivity test' => "SELECT regexp_instr(c0_.text1, '[A-Z]+', 1, 1, 0, 'i') AS sclr_0 FROM ContainsTexts c0_",
     ];
 }

Make corresponding additions to the getDqlStatements() method as well.

src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCount.php (1)

17-17: Consider expanding the documentation example

While the current example is helpful, consider adding examples that demonstrate all possible parameter variations (3 and 4 parameters) to better illustrate the full functionality.

- * @example Using it in DQL: "SELECT REGEXP_COUNT(e.text, '\d\d\d') FROM Entity e"
+ * @example Using it in DQL: 
+ *   Basic usage: "SELECT REGEXP_COUNT(e.text, '\d\d\d') FROM Entity e"
+ *   With position: "SELECT REGEXP_COUNT(e.text, '\d\d\d', 1) FROM Entity e" 
+ *   With all parameters: "SELECT REGEXP_COUNT(e.text, '\d\d\d', 1, 'i') FROM Entity e"
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php (2)

26-35: Test cases duplicate "with flags" and "with all parameters"

The test cases for "with flags" and "with all parameters" are identical in both expected SQL and DQL. Consider either removing one of these duplicates or replacing it with a different test case to improve test coverage.

- 'with all parameters' => "SELECT regexp_count(c0_.text1, '\\d\\d\\d', 1, 'i') AS sclr_0 FROM ContainsTexts c0_",
+ 'with different flags' => "SELECT regexp_count(c0_.text1, '\\d\\d\\d', 1, 'g') AS sclr_0 FROM ContainsTexts c0_",

37-46: Corresponding DQL statements also have duplication

The DQL statements for "with flags" and "with all parameters" are identical. This should be updated to match any changes made to the expected SQL statements.

- 'with all parameters' => \sprintf("SELECT REGEXP_COUNT(e.text1, '\\d\\d\\d', 1, 'i') FROM %s e", ContainsTexts::class),
+ 'with different flags' => \sprintf("SELECT REGEXP_COUNT(e.text1, '\\d\\d\\d', 1, 'g') FROM %s e", ContainsTexts::class),
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e111dd2 and 45e5358.

📒 Files selected for processing (8)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCount.php (1 hunks)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstr.php (1 hunks)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php (1 hunks)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstr.php (1 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php (1 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstrTest.php (1 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (2 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstrTest.php (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/**/*.php`: Use the PostgreSQL official documentation to verify that tests include comprehensive use cases and example SQL que...

tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/**/*.php: Use the PostgreSQL official documentation to verify that tests include comprehensive use cases and example SQL queries for the tested SQL functions and operators.

  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstrTest.php
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstrTest.php
🧠 Learnings (1)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (2)
Learnt from: martin-georgiev
PR: martin-georgiev/postgresql-for-doctrine#318
File: tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/XmlAggTest.php:1-9
Timestamp: 2025-03-29T03:31:17.114Z
Learning: Tests in the `Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions` namespace extend a custom `TestCase` class from the same namespace (`Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions\TestCase`), rather than PHPUnit's TestCase, and therefore don't need an explicit import.
Learnt from: martin-georgiev
PR: martin-georgiev/postgresql-for-doctrine#318
File: tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/XmlAggTest.php:1-9
Timestamp: 2025-03-29T03:31:17.114Z
Learning: Tests in the `Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions` namespace extend a custom `TestCase` class from the same namespace (`Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions\TestCase`), not PHPUnit's TestCase, and therefore don't need an explicit import statement.
🧬 Code Graph Analysis (2)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (5)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/BaseVariadicFunction.php (1)
  • BaseVariadicFunction (20-135)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/Exception/InvalidArgumentForVariadicFunctionException.php (1)
  • InvalidArgumentForVariadicFunctionException (7-56)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php (1)
  • RegexpReplace (15-41)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/BaseVariadicFunctionTestCase.php (1)
  • BaseVariadicFunctionTestCase (14-42)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/TestCase.php (1)
  • buildEntityManager (116-119)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstrTest.php (4)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/BaseVariadicFunction.php (1)
  • BaseVariadicFunction (20-135)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/Exception/InvalidArgumentForVariadicFunctionException.php (1)
  • InvalidArgumentForVariadicFunctionException (7-56)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstr.php (1)
  • RegexpSubstr (20-46)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/TestCase.php (1)
  • buildEntityManager (116-119)
🔇 Additional comments (15)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstr.php (2)

7-19: Well-documented PostgreSQL function implementation.

The documentation properly explains the PostgreSQL REGEXP_SUBSTR() function with a link to the official documentation and includes a helpful DQL example. The function description accurately reflects PostgreSQL's behavior of returning a substring matching the regular expression pattern or NULL if no match is found.


20-46: Implementation correctly maps to PostgreSQL's function signature.

The class properly extends BaseVariadicFunction and defines the correct argument patterns for PostgreSQL's REGEXP_SUBSTR(). The implementation supports all 6 possible parameters described in the PostgreSQL documentation:

  1. source string (required)
  2. regex pattern (required)
  3. start position (optional)
  4. occurrence (optional)
  5. flags (optional)
  6. subexpression (optional)

The argument types match PostgreSQL's requirements, with strings represented as StringPrimary and numeric values as ArithmeticPrimary.

tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstrTest.php (4)

12-46: Test cases validate SQL generation for all valid argument combinations.

The tests properly verify SQL generation for the REGEXP_SUBSTR() function with various argument combinations, covering basic usage, digit extraction, and all optional parameters. The DQL to SQL transformation is tested correctly.

According to PostgreSQL documentation, the test cases align with expected usage patterns:

  • Basic usage with only string and pattern parameters
  • Extraction of digit patterns with '\d+'
  • Usage with position, occurrence, flags, and subexpression parameters

However, consider adding a few more test cases to cover additional PostgreSQL regex features:

 protected function getExpectedSqlStatements(): array
 {
     return [
         'extracts pattern' => "SELECT regexp_substr(c0_.text1, 'c(.)(...)') AS sclr_0 FROM ContainsTexts c0_",
         'extracts digits' => "SELECT regexp_substr(c0_.text1, '\\d+') AS sclr_0 FROM ContainsTexts c0_",
         'extracts digits with start and N parameters' => "SELECT regexp_substr(c0_.text1, '\\d+', 1, 4) AS sclr_0 FROM ContainsTexts c0_",
         'extracts digits with start and N parameters and flags' => "SELECT regexp_substr(c0_.text1, '\\d+', 1, 4, 'i') AS sclr_0 FROM ContainsTexts c0_",
         'extracts digits with start, N and subexpr parameters and flags' => "SELECT regexp_substr(c0_.text1, '\\d+', 1, 4, 'i', 3) AS sclr_0 FROM ContainsTexts c0_",
+        'uses multiple flags' => "SELECT regexp_substr(c0_.text1, '\\w+', 1, 1, 'inx') AS sclr_0 FROM ContainsTexts c0_",
+        'uses character class alternation' => "SELECT regexp_substr(c0_.text1, '[a-z]+|[0-9]+') AS sclr_0 FROM ContainsTexts c0_",
     ];
 }

With corresponding additions to getDqlStatements().


48-55: Test properly validates minimum argument count.

The test correctly verifies that an exception is thrown when fewer than 2 arguments are provided to the REGEXP_SUBSTR() function, which aligns with PostgreSQL's requirement for at least a string and pattern parameter.


57-64: Test properly validates maximum argument count.

The test correctly verifies that an exception is thrown when more than 6 arguments are provided to the REGEXP_SUBSTR() function, which aligns with PostgreSQL's maximum of 6 parameters (string, pattern, start, occurrence, flags, subexpression).


7-11: Appropriate imports included for the test class.

All necessary classes are imported, including the exception class for validation tests and the entity fixture class used in the DQL queries.

src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstr.php (2)

7-19: Documentation looks accurate and comprehensive.

The docblock properly describes the PostgreSQL REGEXP_INSTR() function with a link to the official documentation. The examples show proper usage in DQL, which provides clear guidance for users.


20-45: Implementation aligns with PostgreSQL's REGEXP_INSTR() function.

The implementation is well-structured:

  • Extends BaseVariadicFunction appropriately
  • Sets correct min (2) and max (6) argument counts
  • Defines node mapping patterns for various valid argument combinations
  • Properly sets function name as 'regexp_instr'

The node mapping patterns accurately reflect the PostgreSQL function signature, supporting the various parameter combinations detailed in the PostgreSQL documentation.

tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstrTest.php (2)

50-57: Test for too few arguments looks good.

The test properly verifies exception handling for insufficient arguments, checking both the exception type and the error message.


59-66: Test for too many arguments looks good.

The test correctly validates exception handling when providing more than the maximum allowed arguments, checking both the exception type and the error message.

src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php (1)

32-40: 🛠️ Refactor suggestion

⚠️ Potential issue

getMin/MaxArgumentCount() should reflect the PostgreSQL limits

Because PostgreSQL allows at most 4 parameters, getMaxArgumentCount() ought to return 4.
Keeping it at 6 masks invalid user input until runtime when PostgreSQL rejects the statement.

-    protected function getMaxArgumentCount(): int
-    {
-        return 6;
-    }
+    protected function getMaxArgumentCount(): int
+    {
+        // PostgreSQL 15/16: 3 mandatory + 1 optional = 4
+        return 4;
+    }

Likely an incorrect or invalid review comment.

src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCount.php (2)

1-43: Implementation correctly aligns with PostgreSQL's REGEXP_COUNT function

The class implementation follows the proper structure for extending Doctrine with custom DQL functions. It correctly:

  • Extends BaseVariadicFunction
  • Defines minimum (2) and maximum (4) argument constraints
  • Specifies supported argument patterns
  • Includes appropriate documentation with PostgreSQL reference

23-26: Validate node mapping pattern matches PostgreSQL specification

The node mapping patterns correctly handle:

  1. Basic form: source string, pattern
  2. Extended form: source string, pattern, position, flags

This matches the PostgreSQL documentation for REGEXP_COUNT function parameters.

tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php (2)

48-55: Test coverage for invalid arguments is appropriate

The test correctly validates the function's behavior when provided with too few arguments, checking both the exception type and message.


57-64: Test coverage for too many arguments is appropriate

The test correctly validates the function's behavior when provided with too many arguments, checking both the exception type and message.

@martin-georgiev
Copy link
Owner Author

martin-georgiev commented Apr 20, 2025

@coderabbitai, you're wrong about regex_replace. Update yourself based on the information published on https://www.postgresql.org/docs/current/functions-matching.html

@coderabbitai
Copy link

coderabbitai bot commented Apr 20, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🌐 Web query:

PostgreSQL REGEXP_REPLACE function documentation

💡 Result:

PostgreSQL REGEXP_REPLACE Function Documentation

Overview

The REGEXP_REPLACE function in PostgreSQL is used to search a string for substrings that match a given POSIX regular expression pattern and replace those matches with a specified replacement string. It is particularly powerful for data cleaning, transformation, and advanced text manipulation tasks, enabling operations that go beyond simple literal search-and-replace[1][3][5][8].


Syntax

REGEXP_REPLACE(source, pattern, replacement [, flags])
  • source: The input string to be searched.
  • pattern: The POSIX regular expression pattern to identify what should be replaced.
  • replacement: The string to use as the replacement. You can use backreferences (e.g., \1, \2) to refer to capturing groups in the pattern.
  • flags (optional): Controls the behavior of the regex matching. Common flags:
    • 'g': Global—replace all occurrences, not just the first.
    • 'i': Case-insensitive matching.
    • 'n': Match any character, including newline[2][3][8].

In PostgreSQL 16 and later, additional optional arguments are available:

  • start: The character position to start searching (default is 1).
  • N: Replace only the Nth occurrence (if 0 or 'g' flag, replaces all occurrences)[1][3].

Functionality and Examples

  • Basic Replacement:

    SELECT REGEXP_REPLACE('apple pie', 'apple', 'banana');
    -- Output: 'banana pie'

    [5]

  • Case-Insensitive Replacement:

    SELECT REGEXP_REPLACE('Hello World', 'world', 'there', 'i');
    -- Output: 'Hello there'

    [5]

  • Using Backreferences:

    SELECT REGEXP_REPLACE('123-456-7890', '(\\d{3})-(\\d{3})-(\\d{4})', '(\\1) \\2-\\3');
    -- Output: '(123) 456-7890'

    [5]

  • Rearranging Names:

    SELECT REGEXP_REPLACE('John Doe', '(.*) (.*)', '\\2, \\1');
    -- Output: 'Doe, John'

    [8]

  • Removing All Alphabets:

    SELECT REGEXP_REPLACE('ABC12345xyz', '[[:alpha:]]', '', 'g');
    -- Output: '12345'

    [2][8]

  • Removing Extra Spaces:

    SELECT REGEXP_REPLACE('PostgreSQL     REGEXP_REPLACE   Function', '\\s+', ' ', 'g');
    -- Output: 'PostgreSQL REGEXP_REPLACE Function'

    [2]


Flags and Replacement Details

  • Use \& in the replacement string to insert the entire matched pattern.
  • Use backreferences like \1, \2, etc., in the replacement string to reuse capture groups from the pattern.
  • Write \\ when you need a literal backslash in the replacement or pattern[1][5].

Comparison of Arguments

Argument Description Required Notes
source String to search Yes Can be a column or literal
pattern POSIX regex pattern Yes Use double backslashes (\\) for escapes
replacement Replacement string Yes Can use backreferences \1, \2, ...
flags Optional, controls match (e.g., 'g', 'i', 'n') No Default: replaces first occurrence only
start (>=16) Position to begin search No Default is 1 (first character)
N (>=16) Which occurrence to replace (0 = all, use 'g' as well) No Must specify start if specifying N

Usage Tips

  • Test your regular expressions with sample data before applying to large tables.
  • Remember that complex regex operations can impact performance on large datasets.
  • Properly escape backslashes in both the pattern and replacement as per SQL string rules[1][5].

Further Reading

You can use REGEXP_REPLACE for a wide range of text transformation tasks, including formatting, data cleaning, and normalization, making it a versatile tool in PostgreSQL[2][3][6].

Citations:


@martin-georgiev Thank you for pointing out the error! I've reviewed the PostgreSQL documentation on REGEXP_REPLACE and here's the correct information:

The REGEXP_REPLACE function in PostgreSQL accepts between 3 and 6 arguments:

REGEXP_REPLACE(source, pattern, replacement [, flags [, start [, N]]])

Where:

  • The first 3 parameters are required (source string, pattern, replacement)
  • flags is optional (e.g., 'g' for global replacement, 'i' for case-insensitive)
  • start (available in PostgreSQL 16+) is the starting position for search
  • N (available in PostgreSQL 16+) specifies which occurrence to replace

This aligns with your implementation in the PR where you changed RegexpReplace to extend BaseVariadicFunction and set minimum argument count to 3 and maximum to 6.

I apologize for the incorrect information in my summary.


✏️ Learnings added
Learnt from: martin-georgiev
URL: https://github.com/martin-georgiev/postgresql-for-doctrine/pull/352

Timestamp: 2025-04-20T22:41:09.046Z
Learning: PostgreSQL's REGEXP_REPLACE function accepts between 3 and 6 arguments: REGEXP_REPLACE(source, pattern, replacement [, flags [, start [, N]]]). The first 3 parameters are required, and the rest are optional. The start and N parameters are available in PostgreSQL 16 and later.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@coderabbitai
Copy link

coderabbitai bot commented Apr 20, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@martin-georgiev Thank you for the correction. You're right to point out the error. After reviewing the official PostgreSQL documentation, I understand now that regexp_replace in PostgreSQL:

  • Has the signature: regexp_replace(source, pattern, replacement [, flags [, start_position [, occurrence]]])
  • Can accept between 3 and 6 parameters
  • Supports flags to control the matching behavior
  • Can specify start position and which occurrence to replace
  • Returns the modified string with matches replaced by the replacement string

This aligns with the implementation in your PR. I appreciate you pointing out this inaccuracy.


🌐 Web query:

PostgreSQL regexp_replace function documentation from https://www.postgresql.org/docs/current/functions-matching.html

💡 Result:

PostgreSQL regexp_replace Function Documentation

Purpose

The regexp_replace function in PostgreSQL is used to search for substrings within a string that match a POSIX regular expression pattern and replace those substrings with a specified replacement string. This function supports complex pattern-based text replacements and is highly useful for data cleaning, normalization, and transformation tasks[1][3][4][8].

Syntax

regexp_replace(source, pattern, replacement [, flags])

With more recent PostgreSQL versions (PostgreSQL 16 and higher), the following extended syntax is available:

regexp_replace(source, pattern, replacement [, start [, N]] [, flags])
  • source: The input string on which to perform the replacement.
  • pattern: The POSIX regular expression pattern to search for.
  • replacement: The string that replaces each matching substring. You can use backreferences (e.g., \1, \2, etc.) to refer to captured groups from the pattern.
  • start (optional, integer ≥1): The position in the source string to start searching from (default is 1). Only available from PostgreSQL 16 onward[2].
  • N (optional, integer): The Nth match (at or after the start position) to replace. If N is 0 or the 'g' flag is set, all matches are replaced. N cannot be specified unless start is also given[1][2][4].
  • flags (optional, text): One or more single-letter flags to modify the regex behavior (see below).

Flags

  • 'g': Replace all occurrences (global replacement). If omitted, only the first occurrence is replaced.
  • 'i': Case-insensitive matching.
  • 'n': Treat the pattern as a newline-sensitive (dot matches newline as well).
  • Additional flags may be available depending on your PostgreSQL version.

Return Value

A new string, with all (or the specified) matches of the pattern replaced by the replacement string[3][4].

Usage Examples

Example Query Description Result
SELECT regexp_replace('apple pie', 'apple', 'banana'); Replace "apple" with "banana" "banana pie"
SELECT regexp_replace('Hello World', 'world', 'there', 'i'); Case-insensitive replace "Hello there"
SELECT regexp_replace('123-456-7890', '(\\d{3})-(\\d{3})-(\\d{4})', '(\\1) \\2-\\3'); Format phone number using groups "(123) 456-7890"
SELECT regexp_replace('foo bar baz', '\\s+', '_', 'g'); Replace all whitespace with underscores "foo_bar_baz"

Notes

  • Be careful to double backslashes in patterns and replacement strings when writing queries in SQL, due to escape sequence handling[5].
  • In older PostgreSQL versions (prior to v16), only the four-argument form is available, so you can't specify the start position or the Nth match to replace[2][4].
  • If a match is not found, the function returns the original string unchanged.

Reference

For the most up-to-date documentation, refer to PostgreSQL's official documentation: 9.7. Pattern Matching[1][4].

Excerpt from the Official Docs:

The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, start [, N ]] [, flags ]). (Notice that N cannot be specified unless start is, but flags can be given in any case.)[1]

This sums up the function’s use and options as described in the official PostgreSQL documentation[1][4].

Citations:


@martin-georgiev Thank you for the correction. You're absolutely right about regexp_replace. According to the official PostgreSQL documentation:

The full syntax is:

regexp_replace(source, pattern, replacement [, start [, N]] [, flags])

Where:

  • source, pattern, and replacement are required
  • start is the position to begin searching (optional, PostgreSQL 16+)
  • N is which occurrence to replace (optional, PostgreSQL 16+)
  • flags are modifiers like 'g' for global replacement (optional)

This means it can accept between 3 and 6 parameters, supporting the more advanced functionality you're implementing in your PR.

I appreciate you pointing me to the correct documentation.


✏️ Learnings added
Learnt from: martin-georgiev
URL: https://github.com/martin-georgiev/postgresql-for-doctrine/pull/352

Timestamp: 2025-04-20T22:41:39.223Z
Learning: PostgreSQL's regexp_replace function has the full syntax: regexp_replace(source, pattern, replacement [, start [, N]] [, flags]). In newer PostgreSQL versions, it supports between 3 and 6 parameters, with the additional parameters allowing for specifying the start position and which occurrence to replace.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (1)

41-46: DQL statements match expected SQL output.

The DQL statements properly correspond to the expected SQL statements defined earlier in the class, testing all the argument variations supported by the RegexpReplace function up to PostgreSQL 16.

However, same PostgreSQL version compatibility concern applies to test cases using 5 and 6 arguments.

🧹 Nitpick comments (1)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (1)

28-36: Test coverage could be enhanced with additional PostgreSQL documentation examples.

While the test covers the different argument variations, it would be beneficial to include more real-world examples from the PostgreSQL documentation to demonstrate the functionality of regexp_replace.

Consider adding test cases that demonstrate:

  1. Basic replacement with backreferences (e.g., regexp_replace('Thomas', '.[mN]a.', 'M'))
  2. Case-insensitive replacement with the 'i' flag
  3. Global replacement with the 'g' flag
  4. Combination of flags

These examples would better align with the PostgreSQL documentation examples and provide more comprehensive test coverage.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45e5358 and 4cde68c.

📒 Files selected for processing (8)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCount.php (1 hunks)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstr.php (1 hunks)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php (1 hunks)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstr.php (1 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php (1 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstrTest.php (1 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (2 hunks)
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstrTest.php (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstr.php
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCount.php
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpInstrTest.php
  • src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstr.php
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpCountTest.php
  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpSubstrTest.php
🧰 Additional context used
📓 Path-based instructions (1)
`tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/**/*.php`: Use the PostgreSQL official documentation to verify that tests include comprehensive use cases and example SQL que...

tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/**/*.php: Use the PostgreSQL official documentation to verify that tests include comprehensive use cases and example SQL queries for the tested SQL functions and operators.

  • tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php
🧠 Learnings (1)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (2)
Learnt from: martin-georgiev
PR: martin-georgiev/postgresql-for-doctrine#318
File: tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/XmlAggTest.php:1-9
Timestamp: 2025-03-29T03:31:17.114Z
Learning: Tests in the `Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions` namespace extend a custom `TestCase` class from the same namespace (`Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions\TestCase`), rather than PHPUnit's TestCase, and therefore don't need an explicit import.
Learnt from: martin-georgiev
PR: martin-georgiev/postgresql-for-doctrine#318
File: tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/XmlAggTest.php:1-9
Timestamp: 2025-03-29T03:31:17.114Z
Learning: Tests in the `Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions` namespace extend a custom `TestCase` class from the same namespace (`Tests\MartinGeorgiev\Doctrine\ORM\Query\AST\Functions\TestCase`), not PHPUnit's TestCase, and therefore don't need an explicit import statement.
🧬 Code Graph Analysis (1)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (5)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/BaseVariadicFunction.php (1)
  • BaseVariadicFunction (20-135)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/Exception/InvalidArgumentForVariadicFunctionException.php (1)
  • InvalidArgumentForVariadicFunctionException (7-56)
src/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplace.php (1)
  • RegexpReplace (19-47)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/BaseVariadicFunctionTestCase.php (1)
  • BaseVariadicFunctionTestCase (14-42)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/TestCase.php (1)
  • buildEntityManager (116-119)
🔇 Additional comments (4)
tests/MartinGeorgiev/Doctrine/ORM/Query/AST/Functions/RegexpReplaceTest.php (4)

8-10: Appropriate imports for new function behavior.

The added imports properly support the newly refactored RegexpReplace function, which now extends BaseVariadicFunction and needs to handle InvalidArgumentForVariadicFunctionException for argument validation.


12-17: Good implementation of BaseVariadicFunctionTestCase requirements.

The test class now correctly extends BaseVariadicFunctionTestCase and implements the required createFixture() method that instantiates the RegexpReplace function with the appropriate name.


50-57: Good validation test for minimum argument count.

This test properly verifies that the RegexpReplace function throws an appropriate exception with the correct error message when fewer than the minimum required arguments (3) are provided.


59-66: Good validation test for maximum argument count.

This test properly verifies that the RegexpReplace function throws an appropriate exception with the correct error message when more than the maximum allowed arguments (6) are provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants