Notes on codes, projects and everything

Learning a new language for more than 2 months (feat. Exercism)

Forth Interpreter

About half way doing this, I switched to the 2018 edition as the release is coming soon. Also the fact that I do not have any legacy code that needs porting also helps. This exercise apparently do not compile at all in the 2015 edition. So this is the final version of the interpreter. I did awefully lots of iterations on this, and even rewrote the whole thing twice just to clear up the ownership and mutation.

use std::collections::HashMap;
use std::rc::Rc;

pub type Value = i32;
pub type ForthResult = Result<(), Error>;

#[derive(Debug, PartialEq)]
pub enum Error {
    DivisionByZero,
    StackUnderflow,
    UnknownWord,
    InvalidWord,
}

#[derive(Clone)]
pub enum Token {
    Value(Value),
    BinaryOperator(Rc<dyn Fn(Value, Value) -> Result<Vec<Value>, Error>>),
    UnaryOperator(Rc<dyn Fn(Value) -> Result<Vec<Value>, Error>>),
}

pub struct Forth {
    symbols: HashMap<String, Vec<Token>>,
    stack: Vec<Value>,
}

impl Forth {
    pub fn new() -> Forth {
        Forth {
            symbols: Self::symbols_init(),
            stack: vec![],
        }
    }

    pub fn stack(&self) -> Vec<Value> {
        self.stack.to_vec()
    }

    pub fn eval(&mut self, input: &str) -> ForthResult {
        for incoming in self.tokenize(vec![], Self::input_split(input).as_slice())? {
            match incoming {
                Token::Value(value) => {
                    self.stack.push(value);
                }
                Token::BinaryOperator(operator) => {
                    let y = self.stack.pop().ok_or(Error::StackUnderflow)?;
                    let x = self.stack.pop().ok_or(Error::StackUnderflow)?;

                    self.stack.extend(operator(x, y)?);
                }
                Token::UnaryOperator(operator) => {
                    let x = self.stack.pop().ok_or(Error::StackUnderflow)?;

                    self.stack.extend(operator(x)?);
                }
            }
        }
        Ok(())
    }

    fn symbols_init() -> HashMap<String, Vec<Token>> {
        let mut result = HashMap::new();

        result.insert(
            "+".to_string(),
            vec![Token::BinaryOperator(Rc::new(|x, y| Ok(vec![x + y])))],
        );
        result.insert(
            "-".to_string(),
            vec![Token::BinaryOperator(Rc::new(|x, y| Ok(vec![x - y])))],
        );
        result.insert(
            "*".to_string(),
            vec![Token::BinaryOperator(Rc::new(|x, y| Ok(vec![x * y])))],
        );
        result.insert(
            "/".to_string(),
            vec![Token::BinaryOperator(Rc::new(|x, y| match (x, y) {
                (_, 0) => Err(Error::DivisionByZero),
                (m, n) => Ok(vec![m / n]),
            }))],
        );
        result.insert(
            "dup".to_string(),
            vec![Token::UnaryOperator(Rc::new(|x| Ok(vec![x, x])))],
        );
        result.insert(
            "drop".to_string(),
            vec![Token::UnaryOperator(Rc::new(|_| Ok(vec![])))],
        );
        result.insert(
            "swap".to_string(),
            vec![Token::BinaryOperator(Rc::new(|x, y| Ok(vec![y, x])))],
        );
        result.insert(
            "over".to_string(),
            vec![Token::BinaryOperator(Rc::new(|x, y| Ok(vec![x, y, x])))],
        );

        result
    }

    fn tokenize(&mut self, mut result: Vec<Token>, input: &[String]) -> Result<Vec<Token>, Error> {
        match input.first() {
            None => Ok(result),
            Some(value) => match (i32::from_str_radix(value, 10), self.symbols.get(value)) {
                (Ok(value), _) => {
                    result.push(Token::Value(value));
                    self.tokenize(result, &input[1..])
                }
                (_, Some(tokens)) => {
                    result.extend_from_slice(tokens.as_slice());
                    self.tokenize(result, &input[1..])
                }
                _ => {
                    let slice = self.tokenize_word(None, None, &input)?;
                    self.tokenize(result, slice)
                }
            },
        }
    }

    fn tokenize_word<'a>(
        &mut self,
        name: Option<String>,
        word: Option<Vec<Token>>,
        input: &'a [String],
    ) -> Result<&'a [String], Error> {
        match (input.first(), word.is_none()) {
            (Some(t), true) if t == ":" => self.tokenize_word(None, Some(vec![]), &input[1..]),
            (Some(t), _) if t == ";" => {
                self.symbols.insert(name.unwrap(), word.unwrap());
                Ok(&input[1..])
            }
            (Some(n), false) if name.is_none() => match i32::from_str_radix(n, 10) {
                Ok(_) => Err(Error::InvalidWord),
                Err(_) => self.tokenize_word(Some(n.to_string()), word, &input[1..]),
            },
            (Some(value), _) => match (i32::from_str_radix(value, 10), self.symbols.get(value)) {
                (Ok(value), _) => self.tokenize_word(
                    name,
                    word.map(|mut w| {
                        w.push(Token::Value(value));
                        w
                    }),
                    &input[1..],
                ),
                (_, Some(tokens)) => self.tokenize_word(
                    name,
                    word.map(|mut w| {
                        w.extend_from_slice(tokens);
                        w
                    }),
                    &input[1..],
                ),
                _ => Err(Error::UnknownWord),
            },
            _ => Err(Error::InvalidWord),
        }
    }

    fn input_split(input: &str) -> Vec<String> {
        input
            .split_terminator(|x| "  \u{0000}\u{0001}\r\n\t".contains(x))
            .map(|x| x.to_lowercase())
            .collect()
    }
}

Still not completely satisfied with this 12th revision, but I suppose it is the best I can deliver for now. The idea of this interpreter is to read whatever thrown to it, then first break it into tokens. These tokens are then converted to actual something, a function if it is a word, also actual numbers. All the built-in words/operators are stored in a hashmap as some sort of dictionary that can be replaced (as defined in the unit tests).

Then after the tokens are converted into rust objects, then the code proceeds to evaluate them, from left to right. In this revision, whenever an error is encountered it is thrown back immediately.

That’s it, I think in total I put in at least 3 months instead of a week to complete the whole core exercises, and still don’t feel comfortable being productive with it. Not sure if I want to pursue the remaining exercises after going through these though. A big thanks to all mentors who helped me throughout the whole learning journey, and hopefully more people sign up as a mentor for rust track in future as the backlog is getting so huge these days.

leave your comment

name is required

email is required

have a blog?

This blog uses scripts to assist and automate comment moderation, and the author of this blog post does not hold responsibility in the content of posted comments. Please note that activities such as flaming, ungrounded accusations as well as spamming will not be entertained.

Pings

Click to change color scheme