Cluster Analysis: see it 1st

Cluster analysis stayed inside academic circles for a long time, but recent “big data” wave made it relevant to BI, Data Visualization and Data Mining users

because big data sets in many cases just an artificial union of almost unrelated to each other big data subsets.

Cluster analysis usually can be defined as method to find groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups. Main reason to use such method is to …reduce the size of large data sets! Some people confuse the clustering with classification, segmentation, partitioning or results of queries – it will be a mistake.

Clustering can be ambigous, like on picture below and depends on type of

clustering (e.g. partitional, separated, center-based, contigous, density-bases, hierarchical) and algorithm (e.g. K-means).

Most popular approach is Partitional K-Means clustering, where each cluster is associated with a centroid (center point), each point is assigned to the cluster with the closest centroid and the number of clusters (which is K !) must be specified. The basic algorithm is very simple:

Select K points as the initial Centroids
REPEAT
Form K clusters by assigning all points to the closest Centroid
Recompute the Centroid for each cluster
UNTIL “The Centroids don’t change or all changes are below predefined threshold”

Image below demonstrates the importance of choosing initial Centroids and 6 Iteration leading to successful K-Means based Clustering:

K-Means algorithm is sensitive to size of clusters, densities of datapoints, non-globular shapes of clusters and of course to outliers, but in combination with proper Data Visualization those problems can be solved in most cases.

Basically Clustering is optimizing the cohesion within the cluster while maximizing the separation between cluster and datapoints outside of the cluster:

Leave a comment Cancel reply

Search for:
Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email Address:

Join 536 other subscribers
Blogposts by Month
Blogposts by Month
SiteMap: Recent Posts
SiteMap: Menu
- Blog
  - Tableau
  - Qlikview
  - Spotfire
  - DV News
  - DV Posts
  - BI News
  - Popular
  - Guest Posts
- Home
- Data
- Visible Data
- DataViews
- Charts
  - Pie
  - Bar
  - Column
  - Line
  - Area
  - Radar
  - Scatter
  - Bubble
  - Gauge
  - Sparkline
  - Motion Chart
  - Funnel
  - HeatMap
  - Map
- Tools
- Market
- DV
- Library
- About
- Near
  - 2009
  - 2010
  - 2011
  - 2012
  - 2013
  - 2014
  - 2015
  - 2016
SiteMap: Pages
May 2024

M T W T F S S

1 2 3 4 5

6 7 8 9 10 11 12

13 14 15 16 17 18 19

20 21 22 23 24 25 26

27 28 29 30 31

« Mar

Blogroll
- Visual BI Stephen Few
- DV Tools Data Visualization Tools
- Tableau
- SSAS and PowerPivot
- Tableau Love Russell Christopher
- VizWiz Andy Kriebel
- Flowing Data Nathan Yau
- DataDoodle Ted Cuzzillo
- Mark Smith
- BI Review Dmitry Gudkov
- Clearly and Simply Robert Mundigl
- Simple Data Visualization Demos Showcase blog of some Data Visualizations
- Information Lab – Blog
- PTS Blog Jon Peltier
- Story Telling with Data Cole Nussbaumer
- Data Revelations Steve Wexler
- Qlikview Notes Rob Wunderlich
- Business intelligence Curt Monash
- Tableau Friction Chris Gerrard
- Tableau Hub
- Qlikview Discovery Blog
- Visualizing Data Andy Kirk
- Stephen Wolfram Blog Stephen Wolfram
- QlikCommunity Blogs 5 QlikCommunity Blogs
- QV Design Matthew Crowther
- David Raab
- Trend and Outliers
Visualization Samples

More Photos
Blog Posts
- Advizor (2)
- Analytics (4)
- BI News (32)
- Big Data (2)
- Cloud (2)
- Comparison (17)
- Consulting (2)
- Datawatch (5)
- DV News (38)
- DV Posts (39)
- Excel (4)
- Guest Posts (9)
- Microstrategy (3)
- Monitoring (1)
- Motion Chart (1)
- Omniscope (1)
- Panopticon (3)
- Popular (35)
- Post (21)
- Qlikview (36)
- readings (9)
- Spotfire (29)
- Tableau (58)
Meta

	Tableau the Leader:… on Year-over-Year Record and Buy…
	Tableau the Leader:… on Motion Map Chart with Tab…
	Tableau the Leader:… on Data Visualization Landscape c…
	Tableau the Leader:… on Tableau 8.1 announced, 8.2 to…
	Tableau the Leader:… on Data to the People: tb8.1 vs q…
	Tableau the Leader:… on 2011-13: Tableau competes
	Tableau the Leader:… on 2008-10: Tableau wins with fre…
	Tableau the Leader:… on Tableau’s self-intro: 20…
	——… on Tableau Server is in the …
	——… on New Tableau 8 Server feat…
	——… on New Tableau 8 Desktop fea…
	——… on Tableau 7.0 has 40+ new featur…
	——… on Tableau 6.1 is released
	——… on Tableau 6 reads local PowerPiv…
	——… on Qlikview.Next has a gift for T…

Data Visualization

Cluster Analysis: see it 1st

Share this:

Leave a comment Cancel reply

Email Subscription

Blogposts by Month

SiteMap: Recent Posts

SiteMap: Menu

SiteMap: Pages

Recent Comments

Blogroll

Visualization Samples

Blog Posts

Meta