Skip to content

Commit 77d6e23

Browse files
committed
Add BbcodeToDjot
1 parent 8774dca commit 77d6e23

File tree

6 files changed

+943
-28
lines changed

6 files changed

+943
-28
lines changed

README.md

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -98,31 +98,22 @@ https://sandbox.dereuromark.de/sandbox/djot
9898
- [Examples](docs/README.md) - Comprehensive usage examples
9999
- [Syntax Reference](docs/syntax.md) - Complete Djot syntax guide
100100
- [API Reference](docs/api.md) - Classes and methods
101+
- [Converters](docs/converters.md) - Markdown/BBCode to Djot conversion
101102
- [Cookbook](docs/cookbook.md) - Common customizations and recipes
102103
- [Architecture](docs/architecture.md) - Internal design
103104

104105
## Security
105106

106-
**Warning:** When processing untrusted user input, sanitize the output to prevent XSS attacks. We recommend using [HTMLPurifier](http://htmlpurifier.org/):
107+
When processing untrusted user input, enable safe mode for XSS protection:
107108

108109
```php
109-
$converter = new DjotConverter();
110+
$converter = new DjotConverter(safeMode: true);
110111
$html = $converter->convert($untrustedInput);
111-
112-
$config = HTMLPurifier_Config::createDefault();
113-
$config->set('Cache.DefinitionImpl', null);
114-
$config->set('HTML.DefinitionID', 'djot-purifier');
115-
$config->set('HTML.DefinitionRev', 1);
116-
$config->set('HTML.Allowed', 'p,br,strong,em,u,s,del,ins,mark,sub[id],sup[id],a[href|title|class|id],img[src|alt|title],ul,ol,li,dl,dt,dd,blockquote,pre,code[class],h1[id],h2[id],h3[id],h4[id],h5[id],h6[id],table[class|id],thead,tbody,tr,th[align],td[align],hr,div[class|id],span[class|id],section[id]');
117-
$config->set('Attr.EnableID', true);
118-
$config->set('HTML.TargetBlank', true);
119-
$config->set('URI.AllowedSchemes', ['http' => true, 'https' => true, 'mailto' => true]);
120-
121-
$purifier = new HTMLPurifier($config);
122-
$safeHtml = $purifier->purify($html);
123112
```
124113

125-
See [Security Considerations](docs/README.md#security-considerations) for details.
114+
Safe mode automatically blocks dangerous URL schemes (`javascript:`, etc.), strips event handler attributes (`onclick`, etc.), and escapes raw HTML.
115+
116+
See [Security Considerations](docs/README.md#security-considerations) for details and advanced configuration.
126117

127118
## See Also
128119

docs/README.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ This directory contains detailed documentation for djot-php.
88
- [API Reference](api.md) - Classes and methods
99
- [Cookbook](cookbook.md) - Common customizations and recipes
1010
- [Architecture](architecture.md) - Internal design
11+
- [Converters](converters.md) - Markdown/BBCode to Djot conversion
1112

1213
## Examples
1314

@@ -373,9 +374,37 @@ See the [Cookbook](cookbook.md) for more examples including wiki links, hashtags
373374
- Event handler attributes (e.g., `onclick`)
374375
- Raw HTML blocks and inline HTML
375376

376-
### Recommended: Use HTMLPurifier
377+
### Recommended: Use Safe Mode
377378

378-
For user-generated content, sanitize the HTML output using [HTMLPurifier](http://htmlpurifier.org/):
379+
Enable the built-in safe mode for XSS protection:
380+
381+
```php
382+
use Djot\DjotConverter;
383+
384+
// Enable with sensible defaults
385+
$converter = new DjotConverter(safeMode: true);
386+
$html = $converter->convert($userInput);
387+
```
388+
389+
Safe mode automatically:
390+
- Blocks dangerous URL schemes (`javascript:`, `vbscript:`, `data:`, `file:`)
391+
- Strips event handler attributes (`onclick`, `onload`, etc.)
392+
- Escapes raw HTML (or strips it in strict mode)
393+
394+
For stricter protection, use `SafeMode::strict()`:
395+
396+
```php
397+
use Djot\DjotConverter;
398+
use Djot\SafeMode;
399+
400+
$converter = new DjotConverter(safeMode: SafeMode::strict());
401+
```
402+
403+
See the [API Reference](api.md#safe-mode) for full SafeMode configuration options.
404+
405+
### Alternative: Use HTMLPurifier
406+
407+
For maximum control over allowed HTML, you can also use [HTMLPurifier](http://htmlpurifier.org/):
379408

380409
```bash
381410
composer require ezyang/htmlpurifier

docs/api.md

Lines changed: 201 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,15 @@ $html = $converter->convert($djotString);
1717
public function __construct(
1818
bool $xhtml = false,
1919
bool $warnings = false,
20-
bool $strict = false
20+
bool $strict = false,
21+
bool|SafeMode|null $safeMode = null
2122
)
2223
```
2324

2425
- `$xhtml`: When `true`, produces XHTML-compatible output (self-closing tags like `<br />`).
2526
- `$warnings`: When `true`, collects warnings during parsing (see [Error Handling](#error-handling)).
2627
- `$strict`: When `true`, throws `ParseException` on parse errors (see [Error Handling](#error-handling)).
28+
- `$safeMode`: When `true` or a `SafeMode` instance, enables XSS protection (see [Safe Mode](#safe-mode)).
2729

2830
### Methods
2931

@@ -133,6 +135,124 @@ public function clearWarnings(): self
133135

134136
Clears any collected warnings.
135137

138+
#### setSafeMode
139+
140+
```php
141+
public function setSafeMode(bool|SafeMode|null $safeMode): self
142+
```
143+
144+
Enable, disable, or configure safe mode after construction. Pass `true` for defaults, a `SafeMode` instance for custom configuration, or `null`/`false` to disable.
145+
146+
## Safe Mode
147+
148+
Safe mode provides built-in XSS protection for user-generated content.
149+
150+
### Basic Usage
151+
152+
```php
153+
use Djot\DjotConverter;
154+
155+
// Enable with sensible defaults
156+
$converter = new DjotConverter(safeMode: true);
157+
$html = $converter->convert($userInput);
158+
```
159+
160+
### What Safe Mode Does
161+
162+
1. **URL Sanitization**: Blocks dangerous URL schemes in links and images
163+
- Blocked by default: `javascript:`, `vbscript:`, `data:`, `file:`
164+
- Safe URLs like `https:`, `mailto:`, and relative paths are allowed
165+
166+
2. **Attribute Filtering**: Strips event handler attributes
167+
- Blocks attributes starting with `on` (e.g., `onclick`, `onload`, `onerror`)
168+
- Blocks specific dangerous attributes (`srcdoc`, `formaction`)
169+
- Allows safe attributes like `class`, `id`, `data-*`
170+
171+
3. **Raw HTML Handling**: Controls how raw HTML is processed
172+
- `escape` (default): HTML-encodes raw HTML so it displays as text
173+
- `strip`: Removes raw HTML entirely
174+
- `allow`: Passes raw HTML through (not recommended)
175+
176+
### SafeMode Class
177+
178+
```php
179+
use Djot\SafeMode;
180+
181+
// Factory methods
182+
$safeMode = SafeMode::defaults(); // Standard protection
183+
$safeMode = SafeMode::strict(); // Strips raw HTML completely
184+
```
185+
186+
#### Configuration Methods
187+
188+
```php
189+
// URL scheme control
190+
$safeMode->setDangerousSchemes(['javascript', 'vbscript', 'data']);
191+
$safeMode->addDangerousScheme('ftp');
192+
$safeMode->getDangerousSchemes();
193+
194+
// Whitelist approach (only these schemes allowed)
195+
$safeMode->setAllowedSchemes(['https', 'mailto']);
196+
$safeMode->getAllowedSchemes();
197+
198+
// Attribute filtering
199+
$safeMode->setBlockedAttributePrefixes(['on']); // Blocks onclick, onload, etc.
200+
$safeMode->setBlockedAttributes(['srcdoc', 'formaction']);
201+
$safeMode->getBlockedAttributePrefixes();
202+
$safeMode->getBlockedAttributes();
203+
204+
// Raw HTML handling
205+
$safeMode->setRawHtmlMode(SafeMode::RAW_HTML_ESCAPE); // Default
206+
$safeMode->setRawHtmlMode(SafeMode::RAW_HTML_STRIP); // Remove completely
207+
$safeMode->setRawHtmlMode(SafeMode::RAW_HTML_ALLOW); // Pass through
208+
$safeMode->getRawHtmlMode();
209+
```
210+
211+
#### Validation Methods
212+
213+
```php
214+
$safeMode->isUrlSafe('https://example.com'); // true
215+
$safeMode->isUrlSafe('javascript:alert(1)'); // false
216+
217+
$safeMode->isAttributeSafe('class'); // true
218+
$safeMode->isAttributeSafe('onclick'); // false
219+
220+
$safeMode->sanitizeUrl('javascript:alert(1)'); // ''
221+
$safeMode->filterAttributes([
222+
'class' => 'highlight',
223+
'onclick' => 'hack()',
224+
]); // ['class' => 'highlight']
225+
```
226+
227+
### Custom Configuration Example
228+
229+
```php
230+
use Djot\DjotConverter;
231+
use Djot\SafeMode;
232+
233+
// Only allow HTTPS links, strip raw HTML
234+
$safeMode = SafeMode::defaults()
235+
->setAllowedSchemes(['https'])
236+
->setRawHtmlMode(SafeMode::RAW_HTML_STRIP);
237+
238+
$converter = new DjotConverter(safeMode: $safeMode);
239+
```
240+
241+
### Enabling After Construction
242+
243+
```php
244+
$converter = new DjotConverter();
245+
246+
// Enable later
247+
$converter->setSafeMode(true);
248+
249+
// Or with custom config
250+
$converter->setSafeMode(SafeMode::strict());
251+
252+
// Disable
253+
$converter->setSafeMode(false);
254+
```
255+
136256
## Error Handling
137257

138258
The parser can optionally report warnings and errors with line/column information.
@@ -429,25 +549,95 @@ $renderer->setTableCellSeparator(' | ');
429549

430550
### MarkdownRenderer
431551

432-
Renders an AST Document to CommonMark-compatible Markdown.
552+
Renders an AST Document to CommonMark-compatible Markdown. Useful for:
553+
- Converting Djot content to Markdown for systems that only support Markdown
554+
- Migrating content between formats
555+
- Generating Markdown documentation from Djot source
433556

434557
```php
558+
use Djot\DjotConverter;
435559
use Djot\Renderer\MarkdownRenderer;
436560

561+
$converter = new DjotConverter();
562+
$document = $converter->parse($djotText);
563+
437564
$renderer = new MarkdownRenderer();
438565
$markdown = $renderer->render($document);
439566
```
440567

568+
**Conversion Table:**
569+
570+
| Djot | Markdown Output |
571+
|------|-----------------|
572+
| `*strong*` | `**strong**` |
573+
| `_emphasis_` | `*emphasis*` |
574+
| `{-deleted-}` | `~~deleted~~` (GFM) |
575+
| `{+inserted+}` | `<ins>inserted</ins>` |
576+
| `{=highlighted=}` | `<mark>highlighted</mark>` |
577+
| `^superscript^` | `<sup>superscript</sup>` |
578+
| `~subscript~` | `<sub>subscript</sub>` |
579+
| `` `code` `` | `` `code` `` |
580+
| `[text](url)` | `[text](url)` |
581+
| `![alt](src)` | `![alt](src)` |
582+
| `# Heading` | `# Heading` |
583+
| `> quote` | `> quote` |
584+
| `- list` | `- list` |
585+
| `1. ordered` | `1. ordered` |
586+
| `- [ ] task` | `- [ ] task` |
587+
| `:symbol:` | `:symbol:` |
588+
| `[^note]` | `[^note]` |
589+
| `$math$` | `$math$` |
590+
| `$$display$$` | `$$display$$` |
591+
| Tables | GFM tables with alignment |
592+
| Divs | Content only (no wrapper) |
593+
| Spans | Content only (no wrapper) |
594+
| Definition lists | Bold term + `: description` |
595+
| Line blocks | Hard breaks (` \n`) |
596+
| Raw HTML | Passed through |
597+
| Comments | Stripped |
598+
441599
**Behavior:**
442-
- Converts Djot to CommonMark Markdown
443-
- Uses GFM extensions where available (strikethrough with `~~`)
444-
- Falls back to HTML for features without Markdown equivalents:
445-
- Superscript: `<sup>text</sup>`
446-
- Subscript: `<sub>text</sub>`
447-
- Highlight: `<mark>text</mark>`
448-
- Insert: `<ins>text</ins>`
449-
- Preserves table alignment
450-
- Preserves footnotes
600+
- Produces CommonMark-compatible output
601+
- Uses GFM extensions where available (strikethrough, tables, task lists, footnotes)
602+
- Falls back to inline HTML for features without Markdown equivalents
603+
- Escapes special Markdown characters in text content
604+
- Handles nested backticks in code spans and fenced blocks
605+
- Preserves table column alignment
606+
- Normalizes multiple blank lines
607+
608+
**Example:**
609+
610+
```php
611+
$djot = <<<'DJOT'
612+
# Hello *World*
613+
614+
This has {=highlighted=} and {-deleted-} text.
615+
616+
| Name | Score |
617+
|-------|------:|
618+
| Alice | 95 |
619+
DJOT;
620+
621+
$document = $converter->parse($djot);
622+
$markdown = (new MarkdownRenderer())->render($document);
623+
```
624+
625+
Output:
626+
```markdown
627+
# Hello **World**
628+
629+
This has <mark>highlighted</mark> and ~~deleted~~ text.
630+
631+
| Name | Score |
632+
| --- | ---: |
633+
| Alice | 95 |
634+
```
635+
636+
**Limitations:**
637+
- Djot divs (`::: class`) lose their class/attributes (content is preserved)
638+
- Djot spans (`[text]{.class}`) lose their attributes (content is preserved)
639+
- Definition lists are approximated (not native Markdown)
640+
- Some whitespace/formatting may differ from original
451641

452642
## AST Node Types
453643

0 commit comments

Comments
 (0)