🔥Clustering - Density Based Clustering (DBSCAN) (密度聚類)-Unsupervised Learning

邱之宇 Cosmo Chiou
3 min readNov 5, 2022

--

1. Introduction

  • DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
    - One of the most common clustering algorithms.
    - Works based on density of objects.
  • R (Radius of neighborhood)
    - Radius (R) that if includes enough number of points within, we call it a dense area. (半徑是我們自己設定的)
  • M (Min number of neighbors)
    - The minimum number of data points we want in a neighborhood to define a cluster. (最小樣本數也是自己設定的)

2. Category of Point

◼Core Point

  • Core point: Within R neighborhood of the point, there are at least M points. (在該點的 R 個鄰域內,至少有 M 個點。)

◼Border Point

  • Border point: Its neighborhood contains at least some M data point or it is reachable from some core points.
    (它的鄰域至少包含 M 個數據點,或者它可以從一些核心點到達。)
  • Reachable: It is within R distance from the core point.
    (它在距離核心點的 R 距離內。)

◼Outlier Point

  • Not a core point nor a border point => outlier point

3. Steps of DBSCAN

◼Step1 of DBSCAN

◼Step2 of DBSCAN

Step2: Connect Core Points that are neighbors and put them in the same cluster.
(連接相鄰的核心點並將它們放在同一個集群中。)

Cluster is formed by at least one core point and all reachable border points.
(群集由至少一個核心點和所有可達的邊界點組成。)

4. Advantages of DBSCAN

  1. Arbitrarily shaped clusters.
    (任意形狀的群集。)
  2. Robust to outliers.
  3. Does not require specification of the number of clusters.
    (不需要指定群集數目。)

--

--

No responses yet