Skip to content

Processor Instructions

Matt Farina edited this page Sep 10, 2013 · 7 revisions

The html5 spec does not allow processor instructions. We do. This is a server side library and we believe they are useful.

The Basics

Take the document:

<!DOCTYPE html>
<html>
  <?foo bar?>
</html>

The <?foo bar?> is a processor instruction. Processor instructions start with a <?, are followed with a node name (foo in this case), and close with a ?>.

When this is parsed using \HTML5::loadHTML() the processor instruction node will be one of \DOMProcessingInstruction with a nodeName property of foo and a data property of bar.

With a InstructionProcessor

Processing instructions can be useful when we act on them. For example, manipulating the DOM. The instruction processor takes an instruction and acts on it. An instruction processor is defined by the interface \HTML5\InstructionProcessor with a single method of process. For example, let's create a dummy counter.

<?php

use \HTML5\InstructionProcessor

class foo implements InstructionProcessor {

  public $bar = 0;

  public function process(\DOMElement $element, $name, $data) {
    $this->bar++;

    return $element;
  }
}

This class is really simple. Every time there is a processor instruction a counter is incremented. The element for the processor instruction is returned. The returned element is what is attached to the DOM. If a processing instruction wants to be replaced with a different element, that element should be returned.

The instruction processor needs to be attached to the DOM tree builder to be used. To do this we need a custom parsing function. Because we already have the building blocks this is really quite simple.

function my_parser(\HTML5\Parser\InputStream $input) {

  // Create an instance of the processing instruction.
  $foo = new foo();
  $events = new DOMTreeBuilder();

  // Attach it to the event based DOM tree builder.
  $events->setInstructionProcessor($foo);

  $scanner = new Scanner($input);
  $parser = new Tokenizer($scanner, $events);
  $parser->parse();

  return $events->document();
}

To parse the document use my_parser instead of one of the built in parsers and the instruction processor will be called for each one.

For more details on how this works take a peak inside of \HTML5\Parser\DOMTreeBuilder.

Clone this wiki locally