# Calculating the index

Some thought does need to be applied to how we want the index to be calculated, and this in turn needs to be driven by how we want it to behave. Since we are going to track ‘gone-up’, ‘gone-down’ or ‘stayed-the-same’ for each of the questions, one obvious choice is to use a diffusion index type methodology (ie index value = [# gone-up + 50%* #stayed-the-same]) or some variation of the same.

Another straightforward alternative is to give a weight or assign a numerical score to each type of response, and create an average (weighted or simple) of the responses to arrive at a single numerical score that will be the index. Conceptually this isn’t too different from the diffusion index methodology.

### Range bound vs. random walk

The issue with both the above approaches is that the index will tend to stay range bound, i.e., theoretically they would always be in the 0 -100 range (if diffusion), or something else similarly bounded. These approaches provide a comparison to the previous month alone – and do not provide the kind of perspective we need over time. In other words, these indices are relative – only to the previous month – and do not provide an absolute perspective.

For example, consider the following extreme sequence of events:

- In December 2010 100% of the respondents say ‘gone-up’ to all the questions. The index will be +100.
- Then for the next 6 months (Jan – Jun 2011) 100% of the respondents say ‘gone-down’ to all the questions. The index will show as 0 for each of the six months Jan-Jun.
- In July 2011, if all 100% say ‘gone-up’, the index will flip to +100.

The issue is that the +100 in July 2011 is not the same as the +100 in Dec 2010 because the 6 months of continued decline is ignored for the July 2011 index value.

Comparing to a stock index, this would be akin to seeing the monthly returns, and not the absolute level of the index. In other words, consecutive monthly returns of –20%, +15%, –10%, 0%, +20%, –5% amount to index values (assuming a starting value of 100) of 80, 92, 82.8, 99.36, 94.39 respectively. The question we need to address is whether we want the -20%, +15%, -10%, 0%, +20%, -5% to be the index; or should it be the absolute values of 80, 92, 82.8, 99.36, 94.39 that allow much better comparison over time.

We prefer the latter. To continue our example of Dec being +100; Jan-Jun 2011 being -100; and July being +100 as a diffusion index; and assuming we have set ‘gone-up’ = 15%, stayed-the-same = 0%, and gone-down = -15%; the absolute values would be 100 (beginning Dec), 115 (Dec end), 97.75 (Jan end), 83.09 (Feb end), 70.62 (Mar end), 60.03 (Apr end), 51.03 (May end), 43.37 (Jun end), 49.88 (Jul end).

This is more useful; one can see that the cumulative index value is half at the end of July when compared to the beginning of Dec.

We would also make an additional refinement at this point. Assume we start a month with an index value of 100. We get one month of ‘gone-down’ followed by an immediately following month of ‘gone-up’. We would like to come back to a value of 100 if this happens, as the ‘gone-down’ has been offset by the ‘gone-up’.

But the index will actually go from 100 to 85 [=100*(1 – 15%)]; and then to 97.75 [=85*(1 + 15%)]. Which is not the result we want, because we want the result to come back up to 100. Therefore, we use continuously compounded rates and not discrete rates. If we do that, the index will go from 100 to 86.07 [=e^(-0.15), or exp(-0.15) in Excel)] and back to 100 [=86.07*exp(0.15)]. If you are still reading, thank you.

So the decision points are:

- Should the index to be relative or absolute?
- What arbitrary percentages (the 15% in the above) do we assign to up moves and down moves?
- How many levels of up and downs do we create? Is a 5-level (Likert) scale better than a 3-level one?
- Are the questions weighted equally, or is there an argument for differential weighting?

The ‘absolute’ version of the index provides another advantage – which is that the index is no longer predictable over time and follows a random walk based on a stationary process. Now it lends itself to possibly creating a tradeable security whose price can be anything, like real securities. Each month’s ‘returns’ (ie the responses) are the stationary process, and their integration into the higher level index is calculated as:

Where * S_{t}* represents the month’s responses converted to a percentage (explained below).

*is the ‘Index of Cyber Security’ at time*

**ICS**_{t}*.*

**t**A higher index value indicates a perception of increasing risk, while a lower index value indicates the opposite.

## Planned approach

1. Weigh each question equally and where a question has sub-questions, divide that question's weight equally between them.

2. Use a Likert (5-level) scale; assign a number to each of the possible answers as follows. These may be recalibrated after the first few months of the index.

- -20% (gone down significantly)
- -7.5% (gone down)
- 0% (stayed the same)
- +7.5% (gone up)
- +20% (gone up significantly)

3. Numerically add up the score from each of the questions and divide by the number of questions as to obtain a single score for a respondent.

4. Average all responses received to arrive at a single score for the month, say * s_{t}*.

5. Calculate the new index value as

6. The initial value of the index at t=0 will be considered as 1000.

7. Sub-indices, or information on constituent questions will be released together with the main index.