Making scores comparable

How to compare scores rated under different tests, or by different people? Find out, and use the ready spreadsheet to crunch your own numbers.

Assume that we have feedback on team members from two different project managers.

People-> U V W X Y Z
Managers |
A 70 63 82 91 56 77
B 68 60 80 80 55 60

Can we say that W performs better in team A compared to team B? It looks like yes, he works better but analyse the scores a bit deeper. Project manager B has rated all team members lower than manager A. It may be that he is using tighter scoring.

In a different situation, it may be that two teams (of different people) have gone through two different tests, and we want to compare the people against others. Or, a university might want to compare people passed out in 2001 with those passed out in 2009.

The question is, how do we compare people when the scores that we have do not use the same basis.

Bell Curve - CC Attribution License
Bell Curve - Created using HSB Grapher

The answer is: normalization. We fix the mean score, and the degree of deviation that we would like to see. For a 100 marks test, we may want 50 as the score and a 20% deviation. Now, we will compare this with the actual mean and the deviation of each of the sets, and modify the scores as needed. Please refer to the attached spreadsheet which helps you do this.

 

 

 

In the given example, A has a mean of 73, and a deviation of 13. B has a mean of 67 and a deviation of 11. Let us bring both to a mean of 70 and a deviation of 15.

People-> U V W X Y Z
Managers |
A 66.3 58.1 80.4 91 49.9 74.5
B 71.2 60 87.9 87.9 53.1 60

So we note that A is indeed better in project A, but only slightly. Also from the initial figures we might have concluded that U is better in Project A, but actually the reverse is true.

Mathematically this process is called Normalization and is useful in fitting scores to a bell curve. Read more about it here if you are interested. However the spreadsheet attached is sufficient to get you started.

Share

When NOT to normalise the database

database
When talking of Database Normalisation, textbooks often talk of BCNF, fifth and higher normal forms. However, in practice (in large software/ERPs) I have rarely noticed normalisation beyond Third Normal form. In fact, there is a certain degree of redundancy that is desirable.

While doing database design, I believe there are two critical aspects that should be kept in mind but I see ignored in a lot of common software.

The first is the time aspect of data. First – an example from finance. Consider a company having multicurrency invoicing. The tables can be designed as:

INVOICE: InvoiceID, ..., Currency, BaseCurrency, TransactionDate, ...
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

This is a design having no redundancy. On the basis of the three fields in the INVOICE relation, we can always find out the latest row from the CONVERSIONS table having EffectiveDate less than TransactionDate. Hence we can determine the RateMultiplier.

Consider another design:

INVOICE: InvoiceID, …, Currency, BaseCurrency, TransactionDate, RateMultiplier, …
CONVERSIONS: FromCurrency, ToCurrency, EffectiveDate, RateMultiplier

Here, the system determines the value of the RateMultiplier at the time of invoice creation and records it permanently within the INVOICE table itself. To me this would be more mature design. Why? Because a lot of data in the INVOICE table would actually depend on the RateMultiplier: for example the VAT details. If on 1-JAN-2009 we know that the exchange rate is 1.1. However, on 3-JAN-2009 we come to know that the rate was incorrectly recorded. Someone changes the CONVERSIONS table to reflect the new exchange rate, of 1.2. All the details in the INVOICE table for the invoices created between 1-JAN and 3-JAN become inconsistent since the BaseCurrency is now inconsistent with the RateMultiplier.

Now consider an example from HR appraisal systems. A table stores what stage an appraisal process is at for a particular employee. This is then used to decide what access he has.

STAGE_CURRENT: EmpID, Stage

Note that this has no Date, or Year field. An employee is trying to see records for the previous year appraisals, yet is unable to see some of the data, because current appraisal process is still in initial stage.

The next problem is that of storage of “under calculation” fields. For example, consider the training department maintains the scores of each student trained. The test administered is of 100 marks, but has a weightage 40. Proposed design:

SCORES: CandidateID, TestID, Score, Flag

At the time of recording, the Flag is set to N. Thereafter a process runs that multiplies the score by 0.4 and sets the Flag to Y.

In my opinion a better design would be to retain both the scores even though the pre-weightage score is not relevant to the business process, because a process can also terminate in between due to erroneous data being supplied. Hence if the process ends after setting the flag to Y, and before changing the score; or in reverse order: after changing the score and before setting the flag then we end up with inconsistent data. Improved design:

Scores: CandidateID, TestID, Score, WeightedScore

At the time of recording, Score is entered and WeightedScore is set to zero. Thereafter a process runs that multiplies the Score by 0.4 and stores the value in WeightedScore.

The central idea is to retain all information permanently so that even if the process fails, we know what data existed.

Share

Licensing and information about the blog available here.