An Introduction to
Statistics with Klong

Data Sets

A data set is stored in a Klong vector. There are various ways to create data sets. Key them in:

    [30 28 31 30 31 30 31 31 30 31 30 31]

Use the Enumerate or Expand operators:

    5+&10
[5 5 5 5 5 5 5 5 5 5]
    !10
[0 1 2 3 4 5 6 7 8 9]

Or create a data set following a probability distribution with dist. For instance, the following program creates a data set of normally distributed data:

    &dist(ndf;7;[-2 2])
[0 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 6]

The dist function itself returns a frequency distribution which can then be expanded to a data set using Expand:

    dist(ndf;7;[-2 2])
[1 3 6 7 6 3 1]

The parameters of dist are the probability density function (PDF) of the desired distribution, the number of different data points to generate, and the desired range of the PDF. The above example creates 7 standard normally distributed data points (using the ndf function) from −2σ to +2σ.

A data set can be converted (back) to a frequency distribution using the idiom Size-Each Group:

    #'=[0 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 6]
[1 3 6 7 6 3 1]

Fig.1: larger. Klong

Larger data sets are best visualized using the interactive plotter interface. For instance, the following program creates the histogram plot in fig.1 from a frequency distribution.

    X::&dist(ndf;30;[-3 3])
    v.bar(#'=X)

The v.bar function sets up a grid and plots a data set as a bar graph. The Klong program linked below the image does not use the interactive interface, but similar instructions for batch plotting.

A normally distributed random error can be added to a data set by using the err function. For instance:

    1.0+&20
[0.0 0.0 0.0 0.0 0.0
 0.0 0.0 0.0 0.0 0.0
 0.0 0.0 0.0 0.0 0.0
 0.0 0.0 0.0 0.0 0.0]
    err(5;4;it)
[ 0.0 1.0 -2.0 0.0 0.0
 -2.0 0.0 -1.0 1.0 0.0
  1.0 1.0  0.0 0.0 0.0
  0.0 1.0  2.0 0.0 1.0]

The first parameter of err specifies the number of distinct error values, taken from the interval −0.5≤x≤0.5 with equal space between them. The lowest value is always −0.5 and the highest value is 0.5. The second parameter is multiplied with the error values, so the above example generates the error values {−2, −1, 0, +1, +2}.

An x/y set or paired set or map is represented by a vector of tuples. It is normally used to pair the values of two random variables that may be correlated or not. Like a data set it can be keyed in or created using various operators and functions. A data set can be turned into an x/y set by pairing each value in the set with some other value, e.g.:

    [3 6 7 13 17 17 21]
[3 6 7 13 17 18 21]
    (1+!7),'it
[[1 3] [2 6] [3 7] [4 13] [5 17] [6 17] [7 21]]

An x/y set can be divided into two separate data sets using the First-Each and (At-one)-Each idioms:

    XY::[[1 3] [2 6] [3 7] [4 13] [5 17] [6 17] [7 21]]
    *'XY
[1 2 3 4 5 6 7]
    {x@1}'XY
[3 6 7 13 17 17 21]

Fig.2: larger. Klong

Larger x/y sets are best visualized as scatter plots. For example, the program

    v.scatter2((!30),'err(20;20;0.5*!30))

will display a scatter plot like the one shown in fig.2. Of course, the actual values plotted by the program may differ from those in the figure due to the random error added.


contact