Personal tools

Tool:IllyriadParser

From Arcanum Illyria

Jump to: navigation, search
Details
IllyriadParser
About
Author HonoredMule (User, Talk)
Type XML Parser
Version 1.0 (2011-01-31)
Description Simplifies parsing datafiles
Dependencies PHP
PHP Scripts - Tools

IllyriadParser is an easy-to-use class-based system specially for Illyriad's data files. Rather than being a finished but also inflexible solution, it's up to you to write the classes that read specific information and post it to whatever data store you like using whatever structure you like. In this manner, it supports all past, present, and likely future formats at any file size while remaining extremely simple to use. The classes can be as complete or as brief as desired.

Contents

How to Use It

Each class you write (like AllianceParser extends RowParser) handles a single table, and each instance of the class handles one row. Each protected method you write (like parse_allianceticker($data, $attrs)) receives the data specific to that node which you can compile as desired into $this->data using whatever structure and format you like. When the information within the handled row is complete, the method processData() will be called--this is the only one you must implement. From there, you can add information from parent classes and push $this->data wherever you like. Because YOU defined the data layout, query builders can easily and automatically submit the data to database tables YOU defined, and other software you write can work with it easily as well, having just what it needs and where it wants it. For added convenience, when you run your PHP through a web server, the classes will spit out lists of the nodes that were found and not parsed, serving as a guide while you're still writing your classes.

The parser classes can be nested as well--for example, RoleParser will parse the roles in an alliance, and has access to the information from its parent AllianceParser. The classes can stack to any depth and still find a particular parent by node name, so all the structural/relational information you need will be preserved. The system is also clever enough to know an alliance node within an alliance node isn't really an alliance within an alliance--the outer one is an alliance object and the inner one is some of the important information. So, when quirks like that are encountered, only one AllianceParser is created, with the inner data-rich node getting handled by parse_alliance($data, $attrs) and the encapsulating one by parse_outer_alliance($data, $attrs).

The example below demonstrates most of the system's capabilities.

Example

Usage

  1. <?php
  2.  
  3. include('IllyriadParser.php');
  4.  
  5. // just a little helper function
  6. function debug($data) {
  7.     echo "<pre>";
  8.     print_r($data);
  9.     echo "</pre>";
  10. }
  11.  
  12. class AllianceParser extends RowParser {
  13.    
  14.     protected function processData() {
  15.         // just showing the data gathered
  16.         echo "DATA: ";
  17.         debug($this->data);
  18.     }
  19.  
  20.     protected function parse_allianceticker($data, $attrs) {
  21.         $this->data['ticker'] = mysql_escape_string($data);
  22.     }
  23.    
  24.     protected function parse_alliance($data, $attrs) {
  25.         $this->data['id'] = (int)$attrs['id'];
  26.         $this->data['name'] = mysql_escape_string($data);
  27.     }
  28.    
  29.     protected function parse_foundedbyplayerid($data, $attrs) {
  30.         $this->data['founder'] = (int)$attrs['id'];
  31.     }
  32.    
  33.     protected function parse_alliancecapitaltownid($data, $attrs) {
  34.         $this->data['capital'] = (int)$attrs['id'];
  35.     }
  36.    
  37.     protected function parse_foundeddatetime($data, $attrs) {
  38.         $this->data['founded'] = strtotime($data);
  39.     }
  40.  
  41. }
  42.  
  43. class RoleParser extends RowParser {
  44.  
  45.     protected function processData() {
  46.         // fetch information encapsulated in a parent parser
  47.         $alliance = $this->illyParser->parentRow('alliance');
  48.         $this->data['alliance'] = $alliance['id'];
  49.        
  50.         // just showing the data gathered
  51.         echo "ROLE: ";
  52.         debug($this->data);
  53.     }
  54.    
  55.     protected function parse_role($data, $attrs) {
  56.         $this->data['id'] = (int)$attrs['id'];
  57.         $this->data['name'] = mysql_escape_string($data);
  58.     }
  59. }
  60.  
  61.  
  62. $handler = new IllyriadParser();
  63. $handler->parseFile('alliances-2010-08-07.xml');
  64.  
  65. ?>

Output

ROLE:
Array
(
    [id] => 1
    [name] => Founder
    [alliance] => 1
)

Unhandled Nodes: heirarchy, outer_role

...SNIP...

ROLE:
Array
(
    [id] => 2
    [name] => New Member
    [alliance] => 1
)

Unhandled Nodes: heirarchy, outer_role

DATA:
Array
(
    [id] => 1
    [name] => Harmless?
    [founder] => 10
    [capital] => 27
    [ticker] => H?
    [founded] => 1267121887
)

Unhandled Nodes: alliancecapitallastmoved, alliancetaxrate, alliancetaxratelastchanged, membercount, totalpopulation, roles, proposedbyalliance, acceptedbyalliance, relationshiptype, relationshipttype, establishedsince, relationship, relationships, outer_alliance

ROLE:
Array
(
    [id] => 27
    [name] => Leader
    [alliance] => 2
)

Unhandled Nodes: heirarchy, outer_role

...SNIP...

The Script

  1. <?php
  2.  
  3. // IllyriadParser.php by HonoredMule
  4. // license: LGPL
  5.  
  6. abstract class RowParser {
  7.  
  8.     protected $data = array();
  9.     protected $illyParser = null;
  10.     private $unhandled_tags = array();
  11.     private $nested_tags = array();
  12.     private $stackData = array();
  13.     private $stackAttrs = array();
  14.     private $depth = 0;
  15.    
  16.     function __construct($parser) {
  17.         $this->illyParser = $parser;
  18.     }
  19.  
  20.     function openElement($element, $attrs) {
  21.         $this->stackAttrs[++$this->depth] = $attrs;
  22.         $depth = (int)@$this->nested_tags[$element];
  23.         $this->nested_tags[$element] = max(0, $depth-1);
  24.     }
  25.    
  26.     function dataElement($data) {
  27.         $this->stackData[$this->depth] = $data;
  28.     }
  29.    
  30.     function closeElement($element) {
  31.         $i = $this->depth;
  32.         $nest = str_repeat('outer_', $this->nested_tags[$element]++);
  33.         $func = 'parse_'.$nest.$element;
  34.         if(is_callable(array($this, $func)))
  35.             $this->$func($this->stackData[$i], $this->stackAttrs[$i]);
  36.         else
  37.             $this->unhandled_tags[$nest.$element] = true;
  38.         unset($this->stackAttrs[$i]);
  39.         unset($this->stackData[$i]);
  40.         if(--$this->depth == 0)
  41.             $this->processData();
  42.         return $this->depth;
  43.     }
  44.    
  45.     function getData() {
  46.         return $this->data;
  47.     }
  48.    
  49.     // Perform any post processing--including fetching data from parent tables using
  50.     // $this->parser->parentRow('nodename')--and commit the data to some data store
  51.     // or (for example) a large SQL query builder
  52.     abstract protected function processData();
  53.    
  54.     function __destruct() {
  55.         echo '<p>Unhandled Nodes: ' . implode(', ', array_keys($this->unhandled_tags)) . '</p>';
  56.     }
  57. }
  58.  
  59. class IllyriadParser {
  60.  
  61.     public $date;
  62.     public $server;
  63.     private $nodeStack = array();
  64.     private $parser;
  65.     private $data;
  66.    
  67.     function __construct() {
  68.         $this->parser = xml_parser_create();
  69.         xml_parser_set_option($this->parser, XML_OPTION_CASE_FOLDING, false);
  70.  
  71.         xml_set_object($this->parser, $this);
  72.         xml_set_element_handler($this->parser,
  73.                                 array(&$this, 'openElement'),
  74.                                 array(&$this, 'closeElement'));
  75.         xml_set_default_handler($this->parser, array(&$this, 'dataElement'));
  76.         xml_set_character_data_handler($this->parser, array(&$this, 'dataElement'));
  77.     }
  78.    
  79.     function parseFile($filepath) {
  80.         if(!($fp = fopen($filepath, "r")));
  81.  
  82.         while($data = fread($fp, 4096))
  83.             xml_parse($this->parser, $data, feof($fp));
  84.        
  85.         fclose($fp);
  86.     }
  87.    
  88.     function openElement($parser, $element, $attrs) {
  89.         $class = strtolower($element.'Parser');
  90.         $topClass = strtolower(@get_class(array_peek($this->nodeStack)));
  91.        
  92.         if(class_exists($class) && $class != $topClass)
  93.             array_push($this->nodeStack, new $class($this));
  94.         $node = array_peek($this->nodeStack);
  95.         if(!empty($node))
  96.             $node->openElement($element, $attrs);
  97.     }
  98.    
  99.     function dataElement($parser, $data) {
  100.         $node = array_peek($this->nodeStack);
  101.         if(!empty($node))
  102.             $node->dataElement($data);
  103.     }
  104.    
  105.     function closeElement($parser, $element) {
  106.         $node = array_peek($this->nodeStack);
  107.         if(empty($node))
  108.             return;
  109.            
  110.         if($node->closeElement($element) === 0)
  111.             array_pop($this->nodeStack);
  112.     }
  113.    
  114.     function findParent($type) {
  115.         if(empty($type))
  116.             return array_peek($this->nodeStack);
  117.         $type = strtolower($type);
  118.         foreach($this->nodeStack as $obj) {
  119.             if(strtolower(get_class($obj)) == $type.'parser')
  120.                 return $obj;
  121.         }
  122.         throw new Exception("Cannot find $type parser on the stack");
  123.     }
  124.    
  125.     function parentRow($type) {
  126.         return $this->findParent($type)->getData();
  127.     }
  128.    
  129. }
  130.  
  131. function array_peek($array, $n = 0) {
  132.     if(!is_array($array))
  133.         return false;
  134.     $n++;
  135.     $l = count($array);
  136.     if($l < $n)
  137.         return false;
  138.     return $array[$l - $n];
  139. }
  140.  
  141. ?>