(note (code cslai))

Random Data Generator and Reader

Another day, another programming assessment test. This time I was asked to generate some random data, then examine them to get their data type. Practically it is not a very difficult thing to do and I could probably complete it in fewer lines. I am pretty sure there are better ways to do this, as usual though.

So in short the generator is supposed to generate a 10MB (assuming MB here means 10^6 instead of 2^20, which is apparently now called a Mebibyte, MiB) file. The file would contain multiple lines, and each line is expected to have 4 fields. These four fields are expected to be one of each in random order.

So this is the generator part of the program,


#!/usr/bin/env php
<?php
define('OUTPUT_LIMIT_FILESIZE', 10 * pow(10, 6));
 
function generator($file_path_output) {
    $output_writer = output_get_writer(file_writable_to_path($file_path_output));
 
    do {
        $output_writer(array(
            string_get_builder(array(array(65, 90), array(97, 122)), rand(1, 20)),
            string_get_builder(array(array(48, 57), array(65, 90), array(97, 122)), rand(1, 20), rand(0, 10), rand(0, 10)),
            function() {
                return rand();
            },
            function() {
                return (rand() / getrandmax()) * rand();
            }));
    } while(file_check_size($file_path_output) < OUTPUT_LIMIT_FILESIZE);
}
 
function string_get_builder(array $ascii_ranges, $string_size, $space_size_before = 0, $space_size_after = 0) {
    return $builder = function($result = '') use(&$builder, $ascii_ranges, $string_size, $space_size_before, $space_size_after) {
        return strlen($result) < $string_size ?
            $builder(sprintf("%s%s", $result, character_get_random($ascii_ranges)))
            : sprintf(
                "%s%s%s",
                implode('', array_fill(0, $space_size_before, ' ')),
                $result,
                implode('', array_fill(0, $space_size_after, ' '))
            );
    };
}
 
function character_get_random(array $ascii_ranges) {
    return chr(call_user_func_array('rand', $ascii_ranges[array_rand($ascii_ranges)]));
}
 
function file_writable_to_path($file_path) {
    try {
        return fopen($file_path, 'w');
    } catch(ErrorException $error) {
        log_output_screen('Error in opening file.', TRUE);
    }
}
 
function file_check_size($file_path) {
    clearstatcache(TRUE, $file_path);

    return filesize($file_path);
}
 
function output_get_writer($file_output) {
    return function(array $content) use($file_output) {
        shuffle($content);
 
        $output_content = implode(array_map('call_user_func', $content), ',');
 
        fprintf($file_output, '%s%s', $output_content, PHP_EOL);
 
        fflush($file_output);
    };
}
 
function log_output_screen($message, $debug = FALSE) {
    file_put_contents(
        $debug ? 'php://stderr' : 'php://stdout',
        sprintf("%s DEBUG: %s%s", date('c'), $message, PHP_EOL));
}
 
error_reporting(E_ALL | E_STRICT);
set_error_handler(function($errno, $errstr, $errfile, $errline) {
    // error was suppressed with the @-operator
    if(0 === error_reporting()) {
        return FALSE;
    }
 
    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});
 
generator($argv[1]);

Nothing much to talk about the code, and there are a lot of things to be improved (for example better error handling). I probably need to find a way to ensure alphanumerical string is always generated instead of relying on the random generator to hopefully pick a number (RNG no like me in Diablo III so I don't see why it would be different here).

Next is the consumer part,


#!/usr/bin/env php
<?php
 
function line_reader($input_reader) {
    $input_line = $input_reader();
 
    while($input_line !== FALSE) {
        array_map('item_print_type', $input_line);
 
        $input_line = $input_reader();
    }
}
 
function item_print_type($item) {
    $type = 'alphanumeric';
 
    if(strpos($item, '.') !== FALSE) {
        $type = 'real numbers';
    } else if(is_numeric($item) !== FALSE) {
        $type = 'integer';
    } else {
        preg_match('/[^\d]*/', $item, $matches);
 
        if(array_shift($matches) == $item) {
            $type = 'alphabetical strings';
        }
    }
 
    printf('%s - %s%s', $item, $type, PHP_EOL);
}
 
function input_get_reader($input_file) {
    return function() use($input_file) {
        $result = trim(fgets($input_file));
 
        return feof($input_file) === FALSE ?
            array_map('trim', explode(',', $result))
            : FALSE;
    };
}
 
function file_readable_from_path($input_file_path) {
    try {
        return fopen($input_file_path, 'r');
    } catch(ErrorException $error) {
        log_output_screen('Error in opening file.', TRUE);
    }
}
 
function log_output_screen($message, $debug = FALSE) {
    file_put_contents(
        $debug ? 'php://stderr' : 'php://stdout',
        sprintf("%s DEBUG: %s%s", date('c'), $message, PHP_EOL));
}
 
error_reporting(E_ALL | E_STRICT);
set_error_handler(function($errno, $errstr, $errfile, $errline) {
    // error was suppressed with the @-operator
    if(0 === error_reporting()) {
        return FALSE;
    }
 
    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});
 
line_reader(input_get_reader(file_readable_from_path($argv[1])));

So the obvious fun part is the data type deducing part. So I got lazy and start by determining if it is a real number by looking for a decimal point in the string. So if it isn't then check whether it is numeric (since it is not a real number, so if it is a number then it is an integer). Then I check if numbers exists at all in the string to determine whether it is an alphanumeric/alphabetical string. This part could use some serious optimization (better rules, better regex etc.).

Regardless of the outcome, I find it just as fun if not more compared to other quizzes (yes, yes I know I failed the previous one REALLY hard).

Exit mobile version