Notes on codes, projects and everything

Learning a new language for more than 2 months (feat. Exercism)

Anagram

In a lot of ways, I see this problem the same as indexing documents. Just that in this case, we are indexing words. In a lot of document indexing algorithm, words are assumed to be iid, every word is somewhat independent. Therefore, two documents with similar wordings but in different order might end up being indexed similarly.

So in this solution, I break the word into a vector of unicode characters (in string form because Rust doesn’t do magic unicode handling). Just like two documents with different order of words, an anagram is just two words with the same characters in different order. This may not be the most efficient algorithm in solving the exercise, but it is the most intuitive solution to me.

use std::collections::HashSet;

pub fn anagrams_for<'a>(word: &str, possible_anagrams: &[&'a str]) -> HashSet<&'a str> {
    let words = word_vec(&word, &possible_anagrams);
    let dictionary = char_dictionary(&words);

    let encoded = words
        .iter()
        .map(|&x| word_encode(&break_string(&x), &dictionary))
        .collect::<Vec<Vec<usize>>>();

    (1..words.len())
        .filter(|&x| encoded[x as usize] == encoded[0])
        .map(|x| possible_anagrams[x - 1])
        .fold(HashSet::new(), |mut current, incoming| {
            current.insert(&incoming);
            current
        })
}

fn word_encode(broken_word: &Vec<String>, dictionary: &Vec<String>) -> Vec<usize> {
    dictionary
        .iter()
        .map(|x| {
            broken_word
                .iter()
                .filter(|y| y.to_lowercase() == *x)
                .count()
        }).collect()
}

fn word_vec<'a>(word: &'a str, possible_anagrams: &[&'a str]) -> Vec<&'a str> {
    possible_anagrams
        .into_iter()
        .filter(|x| x.to_lowercase() != word.to_lowercase())
        .fold(vec![&word], |mut current, incoming| {
            current.push(incoming);
            current
        })
}

fn char_dictionary(words: &Vec<&str>) -> Vec<String> {
    words
        .into_iter()
        .flat_map(|&x| break_string(x))
        .fold(Vec::new(), |mut current, incoming| {
            if !current.contains(&incoming.to_lowercase()) {
                current.push(incoming.to_lowercase());
            }

            current
        })
}

fn break_string(word: &str) -> Vec<String> {
    let idx: Vec<(usize, char)> = format!("{}\0", word).char_indices().collect();
    let stack: Vec<(usize, char)> = Vec::new();
    let result: Vec<String> = Vec::new();

    word.char_indices()
        .zip(idx[1..idx.len()].iter())
        .enumerate()
        .fold(
            (stack, result),
            |(mut stack, mut result), (j, (start, end))| {
                if end.0 - start.0 == 1 {
                    // we don't know yet, keep it in the stack
                    if let Some(last) = stack.pop() {
                        // pop the previous
                        result.push(last.1.to_string());
                    }

                    if j == word.char_indices().count() - 1 {
                        result.push(start.1.to_string());
                    } else {
                        stack.push(start);
                    }
                } else {
                    // so this is a fat char
                    match stack.pop() {
                        Some(last) => {
                            // merge it with the stack
                            result.push(format!("{}{}", last.1, start.1));
                        }
                        None => {
                            // stack is empty, so just u
                            result.push(start.1.to_string());
                        }
                    }
                }

                (stack, result)
            },
        ).1
}

leave your comment

name is required

email is required

have a blog?

This blog uses scripts to assist and automate comment moderation, and the author of this blog post does not hold responsibility in the content of posted comments. Please note that activities such as flaming, ungrounded accusations as well as spamming will not be entertained.

Pings

Click to change color scheme