Number crunching through clustering

Scatter

Scatter


Separating your data into buckets is useful in a lot of problems especially fraud detection. How do you mathematically ‘cluster’ your data? One statistical way is the K-means clustering.

Without delving into too much statistics, here is a spreadsheet you can use to do this for your own data.

This sheet accepts pairs of two variables – for example age versus
number of sick leave applied, by a group in a year. Thereafter it
categorizes this data into buckets, the number of buckets being
specified by you. Once the sheet gives you bucket classification, you can analyse it for problems. You should see the following cases as worthy of further attention:

  1. too many datapoints falling in a single group
  2. only one or two datapoints in a single group
  3. any point that does not belong to the group its in (this is only
    possible if the data has a subjective background)

This method can be used to sample data for further analysis wherever
there is simply too much data to analyse. In our example it can be
used to isolate people who may be feigning sickness to take leave. In
a test for people with different levels of capability it can be used
to grade scores etc. It may also be used to solve the needle in haystack problem.

Share

―――――――――――X――――――――――

About the Author

One Response to “ Number crunching through clustering ”

  1. [...] In case this is a repetitive scenario, you may want to track how many times each resource has failed to find the needle and penalize the one at the top. If this specific scenario is not repetitive you may want to club with similar scenarios where the group overall is repetitive. This can also be subjected to statistical analysis (more on the statistical analysis in another post here). [...]

Leave a Reply

You can use these XHTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <em> <strong>


Warning: Illegal string offset 'solo_subscribe' in /home/seeingw/public_html/2cblog/wp-content/plugins/subscribe-to-comments.php on line 304

Subscribe without commenting