Anagram
In a lot of ways, I see this problem the same as indexing documents. Just that in this case, we are indexing words. In a lot of document indexing algorithm, words are assumed to be iid, every word is somewhat independent. Therefore, two documents with similar wordings but in different order might end up being indexed similarly.
So in this solution, I break the word into a vector of unicode characters (in string form because Rust doesn’t do magic unicode handling). Just like two documents with different order of words, an anagram is just two words with the same characters in different order. This may not be the most efficient algorithm in solving the exercise, but it is the most intuitive solution to me.
use std::collections::HashSet; pub fn anagrams_for<'a>(word: &str, possible_anagrams: &[&'a str]) -> HashSet<&'a str> { let words = word_vec(&word, &possible_anagrams); let dictionary = char_dictionary(&words); let encoded = words .iter() .map(|&x| word_encode(&break_string(&x), &dictionary)) .collect::<Vec<Vec<usize>>>(); (1..words.len()) .filter(|&x| encoded[x as usize] == encoded[0]) .map(|x| possible_anagrams[x - 1]) .fold(HashSet::new(), |mut current, incoming| { current.insert(&incoming); current }) } fn word_encode(broken_word: &Vec<String>, dictionary: &Vec<String>) -> Vec<usize> { dictionary .iter() .map(|x| { broken_word .iter() .filter(|y| y.to_lowercase() == *x) .count() }).collect() } fn word_vec<'a>(word: &'a str, possible_anagrams: &[&'a str]) -> Vec<&'a str> { possible_anagrams .into_iter() .filter(|x| x.to_lowercase() != word.to_lowercase()) .fold(vec![&word], |mut current, incoming| { current.push(incoming); current }) } fn char_dictionary(words: &Vec<&str>) -> Vec<String> { words .into_iter() .flat_map(|&x| break_string(x)) .fold(Vec::new(), |mut current, incoming| { if !current.contains(&incoming.to_lowercase()) { current.push(incoming.to_lowercase()); } current }) } fn break_string(word: &str) -> Vec<String> { let idx: Vec<(usize, char)> = format!("{}\0", word).char_indices().collect(); let stack: Vec<(usize, char)> = Vec::new(); let result: Vec<String> = Vec::new(); word.char_indices() .zip(idx[1..idx.len()].iter()) .enumerate() .fold( (stack, result), |(mut stack, mut result), (j, (start, end))| { if end.0 - start.0 == 1 { // we don't know yet, keep it in the stack if let Some(last) = stack.pop() { // pop the previous result.push(last.1.to_string()); } if j == word.char_indices().count() - 1 { result.push(start.1.to_string()); } else { stack.push(start); } } else { // so this is a fat char match stack.pop() { Some(last) => { // merge it with the stack result.push(format!("{}{}", last.1, start.1)); } None => { // stack is empty, so just u result.push(start.1.to_string()); } } } (stack, result) }, ).1 }