Skip to content

Commit 6bce6d1

Browse files
committed
feat: add automatic REGEXP functions registration for SQLite
- Add registerRegexpFunctions() method to SqliteDialect - Automatically register REGEXP, regexp_replace, and regexp_extract functions using PHP preg_* functions - Add enable_regexp configuration option (default: true) to allow disabling automatic registration - Update SQLite HelpersTests to remove manual REGEXP availability check - Update documentation with enable_regexp option and automatic registration information This eliminates the need for external REGEXP extensions in SQLite by using PHP's built-in regex functions.
1 parent b02f802 commit 6bce6d1

File tree

21 files changed

+1067
-14
lines changed

21 files changed

+1067
-14
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Built on top of PDO with **zero external dependencies**, it offers:
4747
- **Enhanced Error Diagnostics** - Query context, sanitized parameters, and debug information in exceptions
4848
- **Connection Retry** - Automatic retry with exponential backoff
4949
- **PSR-14 Event Dispatcher** - Event-driven architecture for monitoring, auditing, and middleware
50-
- **80+ Helper Functions** - SQL helpers for strings, dates, math, JSON, aggregations, and more (REPEAT, REVERSE, LPAD, RPAD emulated for SQLite)
50+
- **80+ Helper Functions** - SQL helpers for strings, dates, math, JSON, aggregations, and more (REPEAT, REVERSE, LPAD, RPAD emulated for SQLite; REGEXP operations supported across all dialects)
5151
- **Fully Tested** - 1320 tests, 5249 assertions across all dialects
5252
- **Type-Safe** - PHPStan level 8 validated, PSR-12 compliant
5353

documentation/01-getting-started/configuration.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -177,12 +177,14 @@ $db = new PdoDb('sqlite', [
177177
```php
178178
$db = new PdoDb('sqlite', [
179179
// Connection options
180-
'pdo' => null, // Optional: Existing PDO object
181-
'path' => '/path/to/database.sqlite', // Required: Path to SQLite file
182-
// Use ':memory:' for in-memory database
183-
'prefix'=> 'sq_', // Optional: Table prefix
184-
'mode' => 'rwc', // Optional: Open mode (ro, rw, rwc, memory)
185-
'cache' => 'shared' // Optional: Cache mode (shared, private)
180+
'pdo' => null, // Optional: Existing PDO object
181+
'path' => '/path/to/database.sqlite', // Required: Path to SQLite file
182+
// Use ':memory:' for in-memory database
183+
'prefix' => 'sq_', // Optional: Table prefix
184+
'mode' => 'rwc', // Optional: Open mode (ro, rw, rwc, memory)
185+
'cache' => 'shared', // Optional: Cache mode (shared, private)
186+
'enable_regexp' => true // Optional: Enable REGEXP functions (default: true)
187+
// Automatically registers REGEXP, regexp_replace, regexp_extract
186188
]);
187189
```
188190

documentation/07-helper-functions/string-helpers.md

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -264,6 +264,212 @@ $users = $db->find()
264264
->get();
265265
```
266266

267+
## Db::regexpMatch() - Pattern Matching
268+
269+
Check if a string matches a regular expression pattern:
270+
271+
```php
272+
// Find valid email addresses
273+
$users = $db->find()
274+
->from('users')
275+
->where(Db::regexpMatch('email', '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'))
276+
->get();
277+
278+
// MySQL/MariaDB: (email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')
279+
// PostgreSQL: (email ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')
280+
// SQLite: (email REGEXP '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')
281+
// REGEXP functions are automatically registered by PDOdb (can be disabled via enable_regexp config)
282+
```
283+
284+
### Using in WHERE Clause
285+
286+
```php
287+
// Find users with phone numbers containing dashes
288+
$users = $db->find()
289+
->from('users')
290+
->where(Db::regexpMatch('phone', '-'))
291+
->get();
292+
293+
// With negation
294+
$users = $db->find()
295+
->from('users')
296+
->where(Db::not(Db::regexpMatch('email', '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$')))
297+
->get();
298+
```
299+
300+
## Db::regexpReplace() - Pattern-Based Replacement
301+
302+
Replace all occurrences of a pattern with a replacement string:
303+
304+
```php
305+
// Replace dashes with spaces in phone numbers
306+
$users = $db->find()
307+
->from('users')
308+
->select([
309+
'phone',
310+
'formatted' => Db::regexpReplace('phone', '-', ' ')
311+
])
312+
->get();
313+
314+
// MySQL/MariaDB: REGEXP_REPLACE(phone, '-', ' ')
315+
// PostgreSQL: regexp_replace(phone, '-', ' ', 'g')
316+
// SQLite: regexp_replace(phone, '-', ' ')
317+
// REGEXP functions are automatically registered by PDOdb
318+
```
319+
320+
### Multiple Replacements
321+
322+
```php
323+
// Remove all non-alphanumeric characters
324+
$users = $db->find()
325+
->from('users')
326+
->select([
327+
'name',
328+
'clean' => Db::regexpReplace('name', '[^a-zA-Z0-9]', '')
329+
])
330+
->get();
331+
```
332+
333+
## Db::regexpExtract() - Extract Matched Substrings
334+
335+
Extract matched substring or capture group from a string:
336+
337+
```php
338+
// Extract domain from email
339+
$users = $db->find()
340+
->from('users')
341+
->select([
342+
'email',
343+
'domain' => Db::regexpExtract('email', '@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', 1)
344+
])
345+
->get();
346+
347+
// MySQL/MariaDB: REGEXP_SUBSTR(email, '@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', 1, 1, NULL, 1)
348+
// PostgreSQL: (regexp_match(email, '@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})'))[2]
349+
// SQLite: regexp_extract(email, '@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', 1)
350+
// REGEXP functions are automatically registered by PDOdb
351+
```
352+
353+
### Extract Full Match
354+
355+
```php
356+
// Extract username from email (full match)
357+
$users = $db->find()
358+
->from('users')
359+
->select([
360+
'email',
361+
'username' => Db::regexpExtract('email', '^([a-zA-Z0-9._%+-]+)@', 1)
362+
])
363+
->get();
364+
```
365+
366+
### Extract Multiple Groups
367+
368+
```php
369+
// Extract area code and number from phone
370+
$users = $db->find()
371+
->from('users')
372+
->select([
373+
'phone',
374+
'area_code' => Db::regexpExtract('phone', '\\+1-([0-9]{3})', 1),
375+
'number' => Db::regexpExtract('phone', '\\+1-[0-9]{3}-([0-9-]+)', 1)
376+
])
377+
->get();
378+
```
379+
380+
## Dialect Differences
381+
382+
### REGEXP Support
383+
384+
**MySQL/MariaDB:**
385+
- `REGEXP` operator for matching
386+
- `REGEXP_REPLACE()` function for replacement
387+
- `REGEXP_SUBSTR()` function for extraction (MySQL 8.0+, MariaDB 10.0.5+)
388+
389+
**PostgreSQL:**
390+
- `~` operator for case-sensitive matching (`~*` for case-insensitive)
391+
- `regexp_replace()` function for replacement
392+
- `regexp_match()` function returns array, use array indexing for groups
393+
394+
**SQLite:**
395+
- `REGEXP` operator requires extension to be loaded
396+
- `regexp_replace()` and `regexp_extract()` functions require REGEXP extension
397+
- If extension is not available, operations will fail at runtime
398+
399+
### Pattern Syntax
400+
401+
All dialects use POSIX extended regular expressions, but there are some differences:
402+
403+
- **MySQL/MariaDB**: Uses MySQL regex syntax (similar to POSIX)
404+
- **PostgreSQL**: Uses POSIX extended regex syntax
405+
- **SQLite**: Uses POSIX extended regex syntax (REGEXP functions are automatically registered by PDOdb)
406+
407+
### Group Indexing
408+
409+
- **MySQL/MariaDB**: Group index starts at 0 (0 = full match, 1+ = capture groups)
410+
- **PostgreSQL**: Array indexing starts at 1 (1 = first group)
411+
- **SQLite**: Group index starts at 0 (0 = full match, 1+ = capture groups)
412+
413+
PDOdb handles these differences automatically.
414+
415+
## Best Practices
416+
417+
### 1. Validate Email Addresses
418+
419+
```php
420+
// ✅ Use regexpMatch for validation
421+
$validEmails = $db->find()
422+
->from('users')
423+
->where(Db::regexpMatch('email', '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'))
424+
->get();
425+
```
426+
427+
### 2. Extract Structured Data
428+
429+
```php
430+
// Extract components from structured strings
431+
$users = $db->find()
432+
->from('users')
433+
->select([
434+
'phone',
435+
'country_code' => Db::regexpExtract('phone', '^\\+([0-9]+)', 1),
436+
'area_code' => Db::regexpExtract('phone', '\\+[0-9]+-([0-9]+)', 1)
437+
])
438+
->get();
439+
```
440+
441+
### 3. Normalize Data
442+
443+
```php
444+
// Normalize phone numbers by removing formatting
445+
$users = $db->find()
446+
->from('users')
447+
->select([
448+
'phone',
449+
'normalized' => Db::regexpReplace(
450+
Db::regexpReplace('phone', '[^0-9]', ''),
451+
'^1',
452+
''
453+
)
454+
])
455+
->get();
456+
```
457+
458+
### 4. SQLite REGEXP Functions
459+
460+
PDOdb automatically registers REGEXP functions (`REGEXP`, `regexp_replace`, `regexp_extract`) for SQLite connections using PHP's `preg_*` functions. This happens automatically when creating a SQLite connection.
461+
462+
To disable automatic REGEXP registration:
463+
464+
```php
465+
$db = new PdoDb('sqlite', [
466+
'path' => '/path/to/database.sqlite',
467+
'enable_regexp' => false // Disable automatic REGEXP registration
468+
]);
469+
```
470+
471+
**Note**: If you disable automatic registration, you must manually register REGEXP functions or load a REGEXP extension. The automatic registration uses PHP's `preg_match`, `preg_replace`, and `preg_match` functions, so no external extensions are required.
472+
267473
## Next Steps
268474

269475
- [Numeric Helpers](numeric-helpers.md) - Math operations

examples/05-helpers/01-string-helpers.php

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,77 @@
249249
foreach ($users as $user) {
250250
echo "{$user['email']} → user: {$user['email_user']}, domain: {$user['email_domain']}\n";
251251
}
252+
echo "\n";
253+
254+
// Example 14: REGEXP operations
255+
echo "14. REGEXP operations - Pattern matching and extraction...\n";
256+
$driver = getCurrentDriver($db);
257+
258+
// Check if REGEXP is supported (SQLite requires extension)
259+
if ($driver === 'sqlite') {
260+
try {
261+
$db->rawQuery("SELECT 'test' REGEXP 'test'");
262+
} catch (\PDOException $e) {
263+
echo " ⚠ REGEXP extension not available in SQLite, skipping regexp examples\n";
264+
echo "\nString helper functions example completed!\n";
265+
exit(0);
266+
}
267+
}
268+
269+
// Insert test data with various email formats
270+
recreateTable($db, 'contacts', ['id' => 'INTEGER PRIMARY KEY AUTOINCREMENT', 'email' => 'TEXT', 'phone' => 'TEXT']);
271+
$db->find()->table('contacts')->insertMulti([
272+
['email' => 'user@example.com', 'phone' => '+1-555-123-4567'],
273+
['email' => 'admin@test.org', 'phone' => '+44-20-7946-0958'],
274+
['email' => 'invalid-email', 'phone' => '12345'],
275+
]);
276+
277+
// REGEXP_MATCH - Find valid email addresses
278+
echo " a) REGEXP_MATCH - Find valid email addresses:\n";
279+
$validEmails = $db->find()
280+
->from('contacts')
281+
->select(['email'])
282+
->where(Db::regexpMatch('email', '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$'))
283+
->get();
284+
285+
foreach ($validEmails as $contact) {
286+
echo "{$contact['email']}\n";
287+
}
288+
echo "\n";
289+
290+
// REGEXP_REPLACE - Format phone numbers
291+
echo " b) REGEXP_REPLACE - Format phone numbers:\n";
292+
$formattedPhones = $db->find()
293+
->from('contacts')
294+
->select([
295+
'phone',
296+
'formatted' => Db::regexpReplace('phone', '-', ' ')
297+
])
298+
->where(Db::regexpMatch('phone', '-'))
299+
->get();
300+
301+
foreach ($formattedPhones as $contact) {
302+
echo "{$contact['phone']}{$contact['formatted']}\n";
303+
}
304+
echo "\n";
305+
306+
// REGEXP_EXTRACT - Extract domain from email
307+
echo " c) REGEXP_EXTRACT - Extract domain from email:\n";
308+
$domains = $db->find()
309+
->from('contacts')
310+
->select([
311+
'email',
312+
'domain' => Db::regexpExtract('email', '@([a-zA-Z0-9.-]+\\.[a-zA-Z]{2,})', 1)
313+
])
314+
->where(Db::regexpMatch('email', '@'))
315+
->get();
316+
317+
foreach ($domains as $contact) {
318+
if ($contact['domain'] !== null) {
319+
echo "{$contact['email']} → domain: {$contact['domain']}\n";
320+
}
321+
}
322+
echo "\n";
252323

253324
echo "\nString helper functions example completed!\n";
254325

examples/05-helpers/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ String manipulation functions.
2020
- `replace()` - Replace substring
2121
- `reverse()` - Reverse string
2222
- `position()` - Find substring position
23+
- `regexpMatch()`, `regexpReplace()`, `regexpExtract()` - Regular expression operations
2324

2425
### 02-math-helpers.php
2526
Mathematical operations and functions.

src/connection/ConnectionFactory.php

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
use PDO;
99
use Psr\EventDispatcher\EventDispatcherInterface;
1010
use Psr\Log\LoggerInterface;
11+
use tommyknocker\pdodb\dialects\SqliteDialect;
1112
use tommyknocker\pdodb\events\ConnectionOpenedEvent;
1213

1314
/**
@@ -62,6 +63,14 @@ public function create(array $config, ?LoggerInterface $logger): Connection
6263

6364
$dialect->setPdo($pdo);
6465

66+
// Register REGEXP functions for SQLite if enabled (default: true)
67+
if ($driver === 'sqlite' && $dialect instanceof SqliteDialect) {
68+
$enableRegexp = $config['enable_regexp'] ?? true;
69+
if ($enableRegexp) {
70+
$dialect->registerRegexpFunctions($pdo);
71+
}
72+
}
73+
6574
// Use RetryableConnection if retry is enabled
6675
$retryConfig = $config['retry'] ?? [];
6776
$connection = null;

src/dialects/DialectInterface.php

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -592,6 +592,38 @@ public function formatReverse(string|RawValue $value): string;
592592
*/
593593
public function formatPad(string|RawValue $value, int $length, string $padString, bool $isLeft): string;
594594

595+
/**
596+
* Format REGEXP match expression (returns boolean).
597+
*
598+
* @param string|RawValue $value Source string.
599+
* @param string $pattern Regex pattern.
600+
*
601+
* @return string SQL expression that returns boolean (true if matches, false otherwise).
602+
*/
603+
public function formatRegexpMatch(string|RawValue $value, string $pattern): string;
604+
605+
/**
606+
* Format REGEXP replace expression.
607+
*
608+
* @param string|RawValue $value Source string.
609+
* @param string $pattern Regex pattern.
610+
* @param string $replacement Replacement string.
611+
*
612+
* @return string SQL expression for regexp replacement.
613+
*/
614+
public function formatRegexpReplace(string|RawValue $value, string $pattern, string $replacement): string;
615+
616+
/**
617+
* Format REGEXP extract expression (extracts matched substring).
618+
*
619+
* @param string|RawValue $value Source string.
620+
* @param string $pattern Regex pattern.
621+
* @param int|null $groupIndex Capture group index (0 = full match, 1+ = specific group, null = full match).
622+
*
623+
* @return string SQL expression for regexp extraction.
624+
*/
625+
public function formatRegexpExtract(string|RawValue $value, string $pattern, ?int $groupIndex = null): string;
626+
595627
/* ---------------- Original SQL helpers and dialect-specific expressions ---------------- */
596628

597629
/**

0 commit comments

Comments
 (0)