Notes on codes, projects and everything

Random Data Generator and Reader

Another day, another programming assessment test. This time I was asked to generate some random data, then examine them to get their data type. Practically it is not a very difficult thing to do and I could probably complete it in fewer lines. I am pretty sure there are better ways to do this, as usual though.

So in short the generator is supposed to generate a 10MB (assuming MB here means 10^6 instead of 2^20, which is apparently now called a Mebibyte, MiB) file. The file would contain multiple lines, and each line is expected to have 4 fields. These four fields are expected to be one of each in random order.

  • alphanumeric string
  • alphabetical string
  • real number
  • integer

So this is the generator part of the program,


#!/usr/bin/env php
<?php
define('OUTPUT_LIMIT_FILESIZE', 10 * pow(10, 6));
 
function generator($file_path_output) {
    $output_writer = output_get_writer(file_writable_to_path($file_path_output));
 
    do {
        $output_writer(array(
            string_get_builder(array(array(65, 90), array(97, 122)), rand(1, 20)),
            string_get_builder(array(array(48, 57), array(65, 90), array(97, 122)), rand(1, 20), rand(0, 10), rand(0, 10)),
            function() {
                return rand();
            },
            function() {
                return (rand() / getrandmax()) * rand();
            }));
    } while(file_check_size($file_path_output) < OUTPUT_LIMIT_FILESIZE);
}
 
function string_get_builder(array $ascii_ranges, $string_size, $space_size_before = 0, $space_size_after = 0) {
    return $builder = function($result = '') use(&$builder, $ascii_ranges, $string_size, $space_size_before, $space_size_after) {
        return strlen($result) < $string_size ?
            $builder(sprintf("%s%s", $result, character_get_random($ascii_ranges)))
            : sprintf(
                "%s%s%s",
                implode('', array_fill(0, $space_size_before, ' ')),
                $result,
                implode('', array_fill(0, $space_size_after, ' '))
            );
    };
}
 
function character_get_random(array $ascii_ranges) {
    return chr(call_user_func_array('rand', $ascii_ranges[array_rand($ascii_ranges)]));
}
 
function file_writable_to_path($file_path) {
    try {
        return fopen($file_path, 'w');
    } catch(ErrorException $error) {
        log_output_screen('Error in opening file.', TRUE);
    }
}
 
function file_check_size($file_path) {
    clearstatcache(TRUE, $file_path);

    return filesize($file_path);
}
 
function output_get_writer($file_output) {
    return function(array $content) use($file_output) {
        shuffle($content);
 
        $output_content = implode(array_map('call_user_func', $content), ',');
 
        fprintf($file_output, '%s%s', $output_content, PHP_EOL);
 
        fflush($file_output);
    };
}
 
function log_output_screen($message, $debug = FALSE) {
    file_put_contents(
        $debug ? 'php://stderr' : 'php://stdout',
        sprintf("%s DEBUG: %s%s", date('c'), $message, PHP_EOL));
}
 
error_reporting(E_ALL | E_STRICT);
set_error_handler(function($errno, $errstr, $errfile, $errline) {
    // error was suppressed with the @-operator
    if(0 === error_reporting()) {
        return FALSE;
    }
 
    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});
 
generator($argv[1]);

Nothing much to talk about the code, and there are a lot of things to be improved (for example better error handling). I probably need to find a way to ensure alphanumerical string is always generated instead of relying on the random generator to hopefully pick a number (RNG no like me in Diablo III so I don't see why it would be different here).

Next is the consumer part,


#!/usr/bin/env php
<?php
 
function line_reader($input_reader) {
    $input_line = $input_reader();
 
    while($input_line !== FALSE) {
        array_map('item_print_type', $input_line);
 
        $input_line = $input_reader();
    }
}
 
function item_print_type($item) {
    $type = 'alphanumeric';
 
    if(strpos($item, '.') !== FALSE) {
        $type = 'real numbers';
    } else if(is_numeric($item) !== FALSE) {
        $type = 'integer';
    } else {
        preg_match('/[^\d]*/', $item, $matches);
 
        if(array_shift($matches) == $item) {
            $type = 'alphabetical strings';
        }
    }
 
    printf('%s - %s%s', $item, $type, PHP_EOL);
}
 
function input_get_reader($input_file) {
    return function() use($input_file) {
        $result = trim(fgets($input_file));
 
        return feof($input_file) === FALSE ?
            array_map('trim', explode(',', $result))
            : FALSE;
    };
}
 
function file_readable_from_path($input_file_path) {
    try {
        return fopen($input_file_path, 'r');
    } catch(ErrorException $error) {
        log_output_screen('Error in opening file.', TRUE);
    }
}
 
function log_output_screen($message, $debug = FALSE) {
    file_put_contents(
        $debug ? 'php://stderr' : 'php://stdout',
        sprintf("%s DEBUG: %s%s", date('c'), $message, PHP_EOL));
}
 
error_reporting(E_ALL | E_STRICT);
set_error_handler(function($errno, $errstr, $errfile, $errline) {
    // error was suppressed with the @-operator
    if(0 === error_reporting()) {
        return FALSE;
    }
 
    throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
});
 
line_reader(input_get_reader(file_readable_from_path($argv[1])));

So the obvious fun part is the data type deducing part. So I got lazy and start by determining if it is a real number by looking for a decimal point in the string. So if it isn't then check whether it is numeric (since it is not a real number, so if it is a number then it is an integer). Then I check if numbers exists at all in the string to determine whether it is an alphanumeric/alphabetical string. This part could use some serious optimization (better rules, better regex etc.).

Regardless of the outcome, I find it just as fun if not more compared to other quizzes (yes, yes I know I failed the previous one REALLY hard).

Related Posts Plugin for WordPress, Blogger...

leave your comment

name is required

email is required

have a blog?

This blog uses scripts to assist and automate comment moderation, and the author of this blog post does not hold responsibility in the content of posted comments. Please note that activities such as flaming, ungrounded accusations as well as spamming will not be entertained.

Click to change color scheme