To escape or not to escape. Part 2.

After testing and playing with the remap_string function on a real project, I've made some changes and fixes:
  1. I've introduced an encoding signature, such that you cannot re-encode by mistake an object that was already encoded. Currently I'm using the ord(ESC)ord(ESC) as signature, which in principle should not affect any future application of this function.
  2. I have changed the multi-byte UTF8 signature to 'y0y'. The previous signature (0y similar to 0x notation for hex chars) was not completely isolated and could have produced wrong decoding under certain conditions.
  3. I have included here the helper functions encode() and decode() which encode strings, arrays and objects. These two functions are currently using a global var "non_encoding_fields" to skip the object or array keys which should not be encoded.

/*
 * Replace all non alpha characters with numeric codes
 * I'm using this to convert all data before storing into mongoDB / mySQL
 */
function remap_string( $text, $decode=false ) {
    if ($decode) {
        if (strpos($text, chr(27) . chr(27)) !== 0) return $text; 
        $text = substr($text, 2);
        return preg_replace_callback(
            array("/(y0y([0-9]{3})+y0y)/","/([0-9]{3})/"),
            function($matches) {
                if (isset($matches[2])) {
                    $code = substr($matches[0],3,strlen($matches[0])-6);
                    $r = '';
                    foreach(str_split($code, 3) as $c) $r .= chr($c);
                    return $r;
                } else return chr($matches[0]);
            },
            $text);
    }
    if (strpos($text, chr(27) . chr(27)) === 0) return $text; 
    return chr(27) . chr(27) . preg_replace_callback(
            "/([^A-Za-z ,.:;-_+=!#()]{1})/u",
            function($matches) {
                $l = strlen($matches[0]);
                if (1==$l) return str_pad(ord($matches[0]), 3, '0', STR_PAD_LEFT);
                for($i=0;$i<$l;$i++) $a.=str_pad(ord($matches[0][$i]), 3, '0', STR_PAD_LEFT);
                return "y0y" . $a . "y0y";
            },
            $text);
}

// Encodes an entire object
function encode(&$obj) {
    if (!is_array($obj)) return remap_string($obj);
    array_walk_recursive( $obj, 
            function(&$item, $key) {
                global $non_encoded_fields;
                if (($key===0) || (!in_array($key, $non_encoded_fields))) {
                    $item = remap_string($item);
                    return true;
                }
            }
    );
    return $obj;
}

// Decodes an entire object
function decode(&$obj) {
    if (!is_array($obj)) return remap_string($obj, true);
    array_walk_recursive( $obj, 
            function(&$item, $key) {
                global $non_encoded_fields;
                if (($key===0) || (!in_array($key, $non_encoded_fields))) {
                    $item = remap_string($item, true);
                    return true;
                }
            }
    );
    return $obj;
}

Comments

Popular posts from this blog

Javascript factorial - performance considerations

Manual pages optimized for search as well as for reading