home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Archive Magazine 1995
/
ARCHIVE95.iso
/
discs
/
pipeline
/
6_12
/
ReadMe
< prev
next >
Wrap
Text File
|
1993-07-25
|
10KB
|
224 lines
%OP%VS4.13 (28-Apr-92), Gerald L Fitton, R4000 5966 9904 9938
%OP%DP0
%OP%IRY
%OP%PL0
%OP%HM0
%OP%FM0
%OP%BM0
%OP%LM4
%OP%PT1
%OP%PDPipeLine
%OP%WC1026,2262,648,1748,0,0,0,0
%CO:A,72,72%
%C%Curve Fitting
%C%by Gerald L Fitton
Keywords:
Best Fit Fitton
Introduction
For over a year now I have been asked to explain how to use the
PipeDream functions linest(y,x) and trend({c,m},x). I kept promising
myself that I ought to get around to it but it has taken a problem sent
to me by John Nottage to finally make my mind up that now the time has
surely come.
I have decided that, in addition to describing the 'line of best fit'
functions linest(y,x) and the trend({c,m},x) function, I could use the
opportunity to show you how using Arrays and Names often make formulae
much more usable and readable.
Best Straight line
Let's start by creating the spreadsheet shown in the screen shot,
figureá1, below.
The formula for a straight line is yá=ám*xá+ác. The parameter c is
called 'the constant term' and m is called 'the gradient'. In the
range A12A17 I have inserted half a dozen values of x and, in the range
B12B17, the corresponding values of y have been calculated by putting c
and m (from B6 and B7) into the formula yá=ám*xá+ác. This formula is
used in many engineering and financial applications. An example is
"totalácostá=ámarginalácostáof a unitá*ánumberáofáunitsá+áfixedácost".
The "marginal cost of a unit" is the gradient, m, and in this example,
it is the extra cost of producing one more unit. The "fixed cost", c,
is the cost you incur just getting ready to make some units (such as
machinery) but it doesn't include the "marginal costs" such as raw
material and labour.
Names
In slot B12 I could have typed B$7*A12+B$6 and then replicated it down
through the range B12B17 - but I haven't done that. In my version slot
B12 contains the formula set_value(y,m*x+c). I suggest that my version
makes the spreadsheet much more readable but at the expense of a little
more effort on the part of the writer of the application.
It may help you follow the next paragraph if you refer to the screen
shot, figureá2 below.
The four letters within the set_value(y,m*x+c) function, y, m, x and c,
are all PipeDream Names. If, in the final version of this speadsheet,
you were to place the pointer over the italic f (just to the left of
the tick and cross and to the right of the PipeDream logo - near the
formula line) and click the mouse select (left) button then a sub menu
called 'functions' will open and, at the bottom of the list, you will
see 'Edit name'. If you were to run the pointer through the arrow to
the right of 'Edit name' you would see that there are seven names
defined. These are x, y, y_est, m, c, c_est, m_est. You could run the
pointer through, say, c, and you would see that the definition of the
name c is the content of the slot B6. Similarly you could establish
that the name x is the range of slots A12A17. This range contains the
values of x!
To define a name such as x as the range of slots A12A17 you click on
the italic f, run the pointer through 'Define name', type x in the
dialogue box alongside 'Name:', you type A12A17 into the dialogue box
alongside 'Refers to:' and finally click on the OK box.
The names I have defined are: xáasáA12A17, yáasáB12B17,
y_estáasáD12D17, máasáB7, cáasáB6, c_estáasáD6 and m_estáasáD7.
The set_value(,) function
The function set_value(,) is a simple way of evaluating a function many
times and 'poking' the answers into a range of slots. It is
particularly simple when used with names. Yes, I know it takes time to
define the names, but, for large ranges particularly, it is worth it.
The function in slot B12 is set_value(y,m*x+c). Having defined the
names as slots or ranges of slots set_value(y,m*x+c) calculates all
the y values from m*x+c and 'pokes' the answers into the range of slots
defined as y. What is 'clever' about PipeDream is that each value in
the y range is individually calculated using its corresponding x and
the result 'poked' into the 'correct' y slot.
The set_value(,) formula is written only once and does not have to be
replicated (or copied) down the range of slots. Using names in this
way reduces the size of the PipeDream file in memory (the function
appears in only one slot) and on the disc. Because the file is smaller
the loading time and the time to recalculate is also reduced. I
recommend using names and set_value(,) to you as a way of reducing
memory usage, disc space and speeding up loading, saving and
recalculation.
The linest(y,x) function
In slot D6 I have entered the formula index(linest(y,x),1,1). Let's
deal with linest(y,x) first.
The function linest(y,x) takes as its arguments only two ranges of
values, in our case y and x. Please note that the y range is the first
argument. In our simple example x is a single column of values, a more
complex version (which we'll deal with another day) uses one y range
and a many column x range! When used in this simple way linest(y,x)
returns an array of two numbers in a single slot. The two numbers are
the row array {c_est,m_est} where c_est and m_est are the 'constant
term' and the 'gradient' of a straight line which is the 'best'
straight line for the set of points. Strictly it is the 'least squares
regression line of y on x'!
On page 250 of the First Edition of the PipeDreamá4 Reference Guide the
function linest(,,,,) is shown with five arguments. All too often I
have been asked how to use the last three arguments. The answer is
that you can't because, in PipeDream version 4.13 they don't exist and,
as far as I know, they are never likely to. The function linest(y,x)
can take only two arguments but the second argument, x, can consist of
many columns. Let's deal with what is called 'multivariate regression'
another day. For now let it suffice that the arguments stats,
constant_parameters and measured_errors of the First Edition do not and
will not exist.
The index(array,col,row) function
To 'split up' the array returned by linest(y,x) into its two separate
terms, {c_est,m_est}, we need to use the index(array,col,row) function.
The index function I have entered into slot D6 is
index(linest(y,x),1,1). Since linest(y,x)á=á{c_est,m_est},
array, index(linest(y,x),1,1)á=á(index({c_est,m_est},1,1).
The array {c_est,m_est} has only one row and so the last (third)
argument of index({c_est,m_est},1,1) must be 1. The second argument,
1, selects the first element of the array which is c_est. Hence
index(linest(y,x),1,1)á=ác_est.
The value in D6 is our best estimate of the constant term c_est for the
'least squares regression line of y on x'!
Slot D7 contains index(linest(y,x),2,1). The only difference is the
second argument, 2, which selects the gradient, m_est. The value in D7
is our best estimate of the gradient, m_est.
A point you should realise here is that I've 'cheated' by using 'exact'
values of y calculated from the formula yá=ám*xá+ác so that, if I have
the 'right' formulae in column D, then my 'estimated' values for c_est
and m_est are bound to be 'exact' and equal to the values of c and m in
B6 and B7!
The trend({c,m},x) function
From the screen shot, figureá1, you will see that slot D12 contains the
formula set_value(y_est,trend({c_est,m_est},x)) - and we've dealt with
set_value(,). Note that y_est is the name of the range D12D17, so
let's have a look at trend({c_est,m_est},x).
The function trend({c_est,m_est},x) returns a column array when x is a
column (it is in this case) or a row array when x is a row. The column
array returned by trend({c_est,m_est},x) is exactly the same array as
that given by the formula {m_est*xá+ác_est}. We saw how {m*xá+ác}
returned an array when we looked at slot B12. I think that, when I
know the values of c and m (separately), then I would prefer to enter
m_est*xá+ác_est (without the curly brackets) instead of
trend({c_est,m_est},x); it's shorter and it does exactly the same
thing. On another occasion when we have a look at the function
logest(y,x) to produce log linear and log log graphs we shall see that
this second approach has its advantages. The function trend(,) has the
advantage if you don't want to use the values of {c,m} separately. In
those cases you can write trend(linest(y,x),x); and we'll find that we
can't do that with log linear and log log curve fitting.
To summarise where we are up to now. In the spreadsheet of figureá1
there are only five slots containing formula. These are:
B12á=áset_value(y, m*x+c)
C12á=áset_value(C12C17,x)
D12á=áset_value(y_est,trend({c_est,m_est},x))
D6 = index(linest(y,x),1,1)
D7 = index(linest(y,x),2,1)
There are seven PipeDream Names which have been defined. These are:
xá=áA12A17
yá=áb12B17
y_está=áD12D17
má=áB7
cá=áB6
m_está=áD7
c_est =áD6
You can alter the values in B6 and B7 and in the range A12A17 but, for
now, you should change nothing else.
Best Straight line
Up to now the y values in B12B17 have been generated from a formula
which uses the values c and m in B6 and B7.
Place the cursor in slot B12, delete the formula and replace it with
the number 1. Nothing much should happen but the values of y are now
no longer calculated from the formula using c and m from B6 and B7.
Enter a few y values which are different from but approximately the
same as those generated by the formula and, when the sheet has
recalculated, you will find that y and y_est do not quite match. If
you plot the values of y against x you will no longer have a straight
line (unless you're lucky) but plotting y_est against x will give you a
perfect straight line, the 'best' straight line for the x and y values.
Don't save the modified sheet unless you change the name!
Charting
To help you visualise what is going on you might like to try creating a
chart with two sets of data. The first set should use the x and y
values in A12B17 and you can plot it in red. The second set should
use the data in C12D17 and you can plot it in blue.
You won't see the blue points until you modify the y data as described
in the previous paragraph.