Learning a new language for more than 2 months (feat. Exercism)

Anagram

In a lot of ways, I see this problem the same as indexing documents. Just that in this case, we are indexing words. In a lot of document indexing algorithm, words are assumed to be iid, every word is somewhat independent. Therefore, two documents with similar wordings but in different order might end up being indexed similarly.

So in this solution, I break the word into a vector of unicode characters (in string form because Rust doesn’t do magic unicode handling). Just like two documents with different order of words, an anagram is just two words with the same characters in different order. This may not be the most efficient algorithm in solving the exercise, but it is the most intuitive solution to me.

use std::collections::HashSet;

pub fn anagrams_for<'a>(word: &str, possible_anagrams: &[&'a str]) -> HashSet<&'a str> {
    let words = word_vec(&word, &possible_anagrams);
    let dictionary = char_dictionary(&words);

    let encoded = words
        .iter()
        .map(|&x| word_encode(&break_string(&x), &dictionary))
        .collect::<Vec<Vec<usize>>>();

    (1..words.len())
        .filter(|&x| encoded[x as usize] == encoded[0])
        .map(|x| possible_anagrams[x - 1])
        .fold(HashSet::new(), |mut current, incoming| {
            current.insert(&incoming);
            current
        })
}

fn word_encode(broken_word: &Vec<String>, dictionary: &Vec<String>) -> Vec<usize> {
    dictionary
        .iter()
        .map(|x| {
            broken_word
                .iter()
                .filter(|y| y.to_lowercase() == *x)
                .count()
        }).collect()
}

fn word_vec<'a>(word: &'a str, possible_anagrams: &[&'a str]) -> Vec<&'a str> {
    possible_anagrams
        .into_iter()
        .filter(|x| x.to_lowercase() != word.to_lowercase())
        .fold(vec![&word], |mut current, incoming| {
            current.push(incoming);
            current
        })
}

fn char_dictionary(words: &Vec<&str>) -> Vec<String> {
    words
        .into_iter()
        .flat_map(|&x| break_string(x))
        .fold(Vec::new(), |mut current, incoming| {
            if !current.contains(&incoming.to_lowercase()) {
                current.push(incoming.to_lowercase());
            }

            current
        })
}

fn break_string(word: &str) -> Vec<String> {
    let idx: Vec<(usize, char)> = format!("{}\0", word).char_indices().collect();
    let stack: Vec<(usize, char)> = Vec::new();
    let result: Vec<String> = Vec::new();

    word.char_indices()
        .zip(idx[1..idx.len()].iter())
        .enumerate()
        .fold(
            (stack, result),
            |(mut stack, mut result), (j, (start, end))| {
                if end.0 - start.0 == 1 {
                    // we don't know yet, keep it in the stack
                    if let Some(last) = stack.pop() {
                        // pop the previous
                        result.push(last.1.to_string());
                    }

                    if j == word.char_indices().count() - 1 {
                        result.push(start.1.to_string());
                    } else {
                        stack.push(start);
                    }
                } else {
                    // so this is a fat char
                    match stack.pop() {
                        Some(last) => {
                            // merge it with the stack
                            result.push(format!("{}{}", last.1, start.1));
                        }
                        None => {
                            // stack is empty, so just u
                            result.push(start.1.to_string());
                        }
                    }
                }

                (stack, result)
            },
        ).1
}

leave your comment

(note (code cslai))

Learning a new language for more than 2 months (feat. Exercism)

Anagram

post meta

Search

Categories

Pages

Top Posts & Pages

Recent Comments

Meta

Search

Archives

Categories

Pages

Recent Posts

This Day in Archive

Random Posts

browse archive

pages

leave your comment

Pings

(note (code cslai))

Learning a new language for more than 2 months (feat. Exercism)

Anagram

Share this:

post meta

Search

Categories

Pages

Top Posts & Pages

Recent Comments

Meta

Search

Archives

Categories

Pages

Recent Posts

This Day in Archive

Random Posts

browse archive

pages

leave your comment

Pings