I was asked to evaluate fuzzy c-means to find out whether it is a good clustering algorithm for my MPhil project. So I spent the whole afternoon reading through some tutorial to get some basic understanding. Then I thought why not implement it in Clojure because it doesn’t look too complicated (I was so wrong…).
I based my implementation on the tutorial posted here.
However I still don’t understand how the terminate condition is met so I am setting the function to loop only like 10 times for now (until I understand it better). Added the objective function, loop will stop when the next result is higher than the current result.
My current code for fuzzy c-means:
(ns sensei.clustering.fcm) (use 'clojure.set '[incanter core stats]) (defn fcm[data cluster-count fuzziness-index] (let [random-points (take cluster-count (repeatedly #(matrix (take (count (first data)) (repeatedly rand))))) degree-of-membership (fn [point centroids cluster-index] (let [power #(pow % (/ 2 (- fuzziness-index 1)))] (/ 1 (apply + (map #(power (/ (euclidean-distance point (nth centroids cluster-index)) (euclidean-distance point %))) centroids))))) fuzzy-membership (fn [centroids] (map (fn [point] (map #(degree-of-membership point centroids %) (range cluster-count))) data)) cluster-membership (fn [cluster-index membership] (map #(nth % cluster-index) membership)) new-centroid (fn [members] (div (apply plus (map #(mult (pow (nth members %) fuzziness-index) (nth data %)) (range (count members)))) (apply + (map #(pow (nth members %) fuzziness-index) (range (count members)))))) new-centroids (fn [membership] (map #(new-centroid (cluster-membership % membership)) (range cluster-count))) objective (fn [membership centroids] (apply + (map (fn [point-index] (apply + (map #(* (pow (nth (nth membership point-index) %) fuzziness-index) (pow (euclidean-distance (nth data point-index) (nth centroids %)) 2)) (range cluster-count)))) (range (count membership))))) cluster (fn re-cluster ([membership centroids] (re-cluster membership centroids (objective membership centroids))) ([membership centroids objective-value] (let [next-membership (fuzzy-membership centroids) next-centroids (new-centroids next-membership) next-objective (objective next-membership next-centroids)] (if (>= next-objective objective-value) (cons centroids membership) (recur next-membership next-centroids next-objective)))))] (cluster (fuzzy-membership random-points) random-points)))
My attempt to plot a graph showing the relationship between each point and clusters. Only lines that shows the most degree of membership are shown in this example
(ns sensei.core) (use 'sensei.clustering.fcm '[incanter core charts stats]) (let [data (take 500 (repeatedly #(matrix (take 2 (repeatedly rand))))) [centroids & membership] (fcm data 5 5) chart (scatter-plot (map #(nth % 0) data) (map #(nth % 1) data))] (view (reduce (fn [chart point-index] (let [point-membership (nth membership point-index) max-membership (apply max point-membership)] (reduce (fn [chart cluster-index] (if (= max-membership (nth point-membership cluster-index)) (add-lines chart (vector (nth (nth centroids cluster-index) 0) (nth (nth data point-index) 0)) (vector (nth (nth centroids cluster-index) 1) (nth (nth data point-index) 1))) chart)) chart (range (count centroids))))) chart (range (count data)))))