An Introduction to
Statistics with Klong

Linear Regression

Here is another x/y set with values rounded to two decimal places. We know that the variables are probably correlated, because of the way in which the set has been created.

    XY::(!30),'rndn(;2)'err(20;20;0.15*!30)
[[ 0 0.0 ] [ 1 4.27] [ 2  1.48] [ 3 -4.26] [ 4 -5.28]
 [ 5 0.75] [ 6 0.9 ] [ 7 -1.89] [ 8 -5.27] [ 9  1.35]
 [10 0.91] [11 1.65] [12  2.39] [13  4.3 ] [14  5.63]
 [15 1.66] [16 0.64] [17  0.2 ] [18  4.46] [19 -3.62]
 [20 0.06] [21 7.27] [22  9.77] [23 -2.43] [24  4.78]
 [25 1.99] [26 3.9 ] [27  3.46] [28  7.73] [29  4.35]]

Fig.9: larger. Klong

The scatter plot in fig.9 also suggests a correlation. The lreg function of nstat can be used to compute the slope and intercept coefficients of a regression line through the x/y set:

    lreg(XY)
[0.201581757508342594 -1.21793548387096761]

The function returns a tuple containing the slope and intercept values of the regression line, but these details do not have to be memorized, because the lr function will take care if it. While lreg fits a model to the data, lr uses the model to predict values of the Y variable of the x/y set given values of the X variable. In fig.10, lr is used to plot the regression line through the set. Its parameters are an independent variable and a linear regression model delivered by lreg.


Fig.10: larger. Klong

The nstat module provides two methods for quantifying the correlation between two variables, the covariance (cov) and the normalized correlation coefficient (Pearson's r, cor). They both expect each variable as a separate data set:

    cov(*'XY;{x@1}'XY)
15.1018333333333333
    cor(*'XY;{x@1}'XY)
0.201581757508342604

Given a model, like linear regression, there are several ways to examine the quality of the predictions made by the model. The nstat module provides the following of them: the residual sum of squares (RSS), the residual squared error (RSE), the mean squared error (MSE), and the coefficient of determination (r2):

    L::lreg(XY)
[0.201581757508342594 -1.21793548387096761]
    rss(XY;lr(;L))
304.98152685205783
    rse(XY;lr(;L))
3.30033292071777115
    mse(XY;lr(;L))
10.1660508950685943
    r2(XY;lr(;L))
0.230445406440760124
 

contact