## Ratings and Indices: Modeling, Estimation, Computation

The online index calculator is suspended now. Below the content and brief descripton of the service.

## Introduction

This educational resourse is devoted to a problem of an index construction. The index is the most informative description of an object. So, there is a set of objects. A rating is an ordering of the set according to the objects indices.
There are lots of ways to construct indices. However, when algorithms are chosen and some results obtained, the following question arises:
How to show adequacy of the calculated indices?

To answer the question analysts invite experts. The experts express their opinion and then the second question arises:
How to show that expert estimations are valid?

There is a technique that produces valid indices based on measurement data and expert estimates.

## Contents

Here are the following sections:

Workbench when one can calculate its own index using his data and the simple algorithm.

Samples is a model and real-world indices and ratings library.

Help here are number of comments and articles on how ratings and indices could be created.

Support — do not hesitate to contact the index makers!

Note: this is the first version of the resource. We apologize for any mistakes and would be grateful, if you send us your comments.

## How to fill the template

Before making indices you have to prepare the following information.

1. A list of objects. You can use any names. Names must not have too many
characters.
2. A list features.
3. Each feature must have a factor. The factor is either +1 or -1. It
shows whether the bigger feature values correspond to the bigger values of
indices (Factor =  1) or the smaller feature values correspond to
the bigger values of indices (Factor = -1).
4. A data table with
no empty items.

Note the next simple facts

1. Are the objects comparable? If one object has much bigger or smaller values in
one or several features than the others, the indices might be unstable. They
could drastically change if you exclude this object from the list.
2. Are the features completely describes the objects? You cannot make an index of
a person’s intelligence quality using data on his vital statistics.
3. The factor values are indended to ajust data to the principle: “the bigger the better”. If you make sport indices and use the
features: Citius, Altius, Fortius (Faster, Higher, Stronger) then the Factor
values are -1 (i.e. sprint, seconds), +1 (high jump, meters), +1,
(weightlifting, kg), then the bigger values of the result indices will show the
better sportsman.
4. Data values should be linear (not ordinal) scaled. What does it mean? Assume
one feature of the objects. If you feel free to make any arithmetic operation
with object values, the values are in the linear scale.

Example one (feature has linear scale). Ther objects are cars and the feature
is weight. We can say that Car1 is twice lighter than Car2 if the first one has
its value 400 and the other has 800.

Example two (feature has ordinal scale). Objects are the same and
feature is usability. We can say that Car1 is more convenient than Car2,
however, we can not say is twice more convenient.

If one use features like in the second example he should be very careful about
indices he obtains. For this type of scales there are special algorithms like
Pareto slicing.

The base algorithm for indices computation is Principal Components Analysis.

## Using features weights

If one obtain an index of his data table and features weights, he can use the weights as a fixed model for indices computation. To make the next index based on the same set of objects and the same set of features one have to do the following.

Denote the data table as m-by-n matrix A with items aij, where m is the overall objects number and n is the overall features number. Denote features weights as w1,…,wn and objects indices as q1,…,qm.
The index of i-th object is
qi w1ai1+w2ai2+…+wnain.

This model can be used if the source data were lightly changed. The strong side is: if the changes were in a small number of objects, those objects, whose data was not changes will keeps their indices unchanged. The weak side is: if there were drastic changes, the indices will be incorrect. Sometimes the drastic changes can occur even if one object changes its feature. To decide whether one should use the weighting model or not, he have to compute indices using changed data. If one desides that new indices and weights are changed too much, he must decline the old model’s weighs.

SUMMARY

• Use this weighting model if there were be little changes in your data and if you want to keep the most objects indices unchanged.
• Do not use this model if there were drastic changes of your data.

## File operations

To make an index you have to make the following steps:

1. Download a template with object description data. Or create the template yourself. The example is here.
2. Fill the template with data. The data must be numbers.
3. Upload the template back so the index are to be calculated.
4. Check your table in View section and observe the Report section.

## Get a template

There is only one template in the database for now. You can download it in the CSV text format. Also you can copy the data from the examples.

## Put your file to the index calculator

Now you should click the browse button to find the file in your computer and upload it. If the file is well-formatted the system switch this tab to the View tab.

## Template files example

You can upload either XLS-file or CSV-file. The first one has the following format.
The special word “Factor” is reserved. It shows whether the bigger feature values correspond to the bigger
values of indices (Factor=1) or the smaller feature values correspond to the bigger
values of indices (Factor=-1). An XLS table example is below. You can select the whole table, then copy it, open your MS-Excel, click the mouse poiner to the worksheet cell A1 and paste the table.

 Feature1 Feature2 Feature3 Factor 1 -1 1 Object1 0,69 7,53 4,69 Object2 0,69 7,53 7,38 Object3 4,34 0,43 5,69 Object4 8,48 8,37 5,29 Object5 5,31 6,84 4,02 Object6 7,08 0,96 4,02 Object7 4,68 5,5 3,42

A CSV file (Comma Separated Values) is a text file. It contains the following lines.

 , Feature1, Feature2, Feature3 Factor, 1, -1, 1 Object1, 0.69, 7.53, 4.69 Object2, 0.69, 7.53, 7.38 Object3, 4.34, 0.43, 5.69 Object4, 8.48, 8.37, 5.29 Object5, 5.31, 6.84, 4.02 Object6, 7.08, 0.96, 4.02 Object7, 4.68, 5.5, 3.42

The data is the same as in the table above. Note that the decimal points are dots while the value separators are commas. You can select the whole table copy it, open your Notepad or another text editor, and paste the table.

## The indices computation procedure

An index is the most informative description of data. To make an index we use Singular Value Decomposition and Principal Components Analysis. It gives the indices that fit the source data the most precisely.

The first step is the features normalization. We need it to de-scale data. The table below shows the measure data if two scales: the scale of Feature1 and the scale of Feature two. We can not use the source scales since the values of Feature2 are two times bigger than the values of Feature1. In this case our index will pay attention to the Feature2 data and ignore the Feature1 data. So we have to make the both features to be in the same scale.

 Feature1 Feature2 Feature1 Feature2 Factor 1 1 Obj1 2.66 30.00 0.8300 0.0000 Obj2 1.00 44.63 0.0000 0.4877 Obj3 3.00 40.33 1.0000 0.3443 Obj4 1.98 50.50 0.4900 0.6833 Obj5 1.23 60.00 0.1150 1.0000 Obj6 2.49 34.55 0.7450 0.1517 Obj7 1.89 55.50 0.4450 0.8500

Let the scale be the segment [0,1]. The minimal value of a feature will be 0 while the maximal value will be 1 (so each feature has at least one 0-value and at least one 1-value). The formula of the normalization is following. Denote aij a value of i-th object and j-th feature. Then the new value will be

$\bar{a}_{ij}=\frac{a_{ij}-\min\limits_i{a}_{ij}}{\max\limits_i{a}_{ij}-\min\limits_i{a}_{ij}}$

The figure below shows the objects with normalized features data. Objects are marked by asterisks. The main idea of the algorithm is the next. We find a vector called as 1-st Principal Component (red segment on the figure). Projections of the objects give on the 1-st PC gives the maximal distribution. The Projections denoted by dots. The distance from the point (0,0) to an object projections is the object index.

The table below represents the object indices.

 Objects Indices Obj1 0.60 Obj2 0.34 Obj3 0.96 Obj4 0.83 Obj5 0.77 Obj6 0.64 Obj7 0.91

The feature weights are very informative data. The weight table shows that both features have approximately equal importance. However, there is data with different weights values. This information discovers what features have great (or small)impact on indices.

 Features Weights Feature1 0.72 Feature2 0.69

## Thanks

This resource is supported by the Department on mathematical modeling in ecology and medicine of the Dorodnicyn computing center of the Russian academy of sciences.