I was asked to evaluate fuzzy c-means to find out whether it is a good clustering algorithm for my MPhil project. So I spent the whole afternoon reading through some tutorial to get some basic understanding. Then I thought why not implement it in Clojure because it doesn’t look too complicated (I was so wrong…).
I based my implementation on the tutorial posted here. However I still don’t understand how the terminate condition is met so I am setting the function to loop only like 10 times for now (until I understand it better). Added the objective function, loop will stop when the next result is higher than the current result.
My current code for fuzzy c-means:
(ns sensei.clustering.fcm)
(use 'clojure.set
'[incanter core stats])
(defn fcm[data cluster-count fuzziness-index]
(let [random-points (take
cluster-count
(repeatedly
#(matrix (take
(count (first data))
(repeatedly rand)))))
degree-of-membership (fn [point centroids cluster-index]
(let [power #(pow % (/ 2 (- fuzziness-index 1)))]
(/ 1 (apply
+
(map
#(power (/ (euclidean-distance point (nth centroids cluster-index))
(euclidean-distance point %)))
centroids)))))
fuzzy-membership (fn [centroids]
(map
(fn [point]
(map
#(degree-of-membership point centroids %)
(range cluster-count)))
data))
cluster-membership (fn [cluster-index membership]
(map #(nth % cluster-index) membership))
new-centroid (fn [members]
(div (apply
plus
(map
#(mult (pow (nth members %) fuzziness-index) (nth data %))
(range (count members))))
(apply
+
(map
#(pow (nth members %) fuzziness-index)
(range (count members))))))
new-centroids (fn [membership]
(map
#(new-centroid (cluster-membership % membership))
(range cluster-count)))
objective (fn [membership centroids]
(apply
+
(map
(fn [point-index]
(apply
+
(map
#(* (pow (nth (nth membership point-index) %) fuzziness-index)
(pow (euclidean-distance (nth data point-index) (nth centroids %)) 2))
(range cluster-count))))
(range (count membership)))))
cluster (fn re-cluster
([membership centroids] (re-cluster membership centroids (objective membership centroids)))
([membership centroids objective-value]
(let [next-membership (fuzzy-membership centroids)
next-centroids (new-centroids next-membership)
next-objective (objective next-membership next-centroids)]
(if (>= next-objective objective-value)
(cons centroids membership)
(recur next-membership next-centroids next-objective)))))]
(cluster (fuzzy-membership random-points) random-points)))
My attempt to plot a graph showing the relationship between each point and clusters. Only lines that shows the most degree of membership are shown in this example
(ns sensei.core)
(use 'sensei.clustering.fcm
'[incanter core charts stats])
(let [data (take
500
(repeatedly
#(matrix (take
2
(repeatedly rand)))))
[centroids & membership] (fcm data 5 5)
chart (scatter-plot
(map #(nth % 0) data)
(map #(nth % 1) data))]
(view (reduce
(fn [chart point-index]
(let [point-membership (nth membership point-index)
max-membership (apply max point-membership)]
(reduce
(fn [chart cluster-index]
(if (= max-membership (nth point-membership cluster-index))
(add-lines
chart
(vector (nth (nth centroids cluster-index) 0) (nth (nth data point-index) 0))
(vector (nth (nth centroids cluster-index) 1) (nth (nth data point-index) 1)))
chart))
chart
(range (count centroids)))))
chart
(range (count data)))))
Example output: