Linearizing Exponential Data; Introduction to Polynomials

Class Notes Precalculus I, 11/05/98

Linearizing Exponential Data; Introduction to Polynomials

The first quiz problem was to obtain a model of a form y = A b^x for the two (x, y) data points (8, 12), (15, 9).

Substituting the values of x and y from the two data points, we obtain the equations 12 = A b^8 and 9 = A b^15.

We could solve these equations by any of several methods.

We could for example solve the first equation for A and substitute the resulting expression into the second equation.

We would then substitute our solution of this equation back into the first, to obtain an equation we could solve for the second.

Or we could solve one of the equations for b and substitute the result into the second, and then proceed similarly.

In this case we will divide the first equation by the second; it will quickly become apparent that this strategy will eliminate A from the system.

The steps are shown below.

Between the fourth and fifth lines, we see that A / A = 1, so A b^8 / (A b^15) = b^-7.

We then take the -1/7 power of both sides to obtain b.

We proceed to substitute this value of b into the first of the original equations and solve for A.

By straightforward steps, shown below, we obtain A = 16.6 (approximately).

We thus have A = 16.6 and b = .96.

It follows that our model is y(x) = A b^x = 16.6 (.96) ^ x.

The quiz asked us determine the value of y when x = 30 and of x when y = 100.

To obtain the value of y when x = 30, we evaluate y(30), obtaining an approximate value of 5, as shown below.

To obtain the value of x when y = 100, we first substitute 100 for y to obtain the equation y = 16.6 * .96^x (incorrectly written below as y = 16.9 * .96 ^ x).

To solve this equation for x, we first divide both sides by 16.6 to obtain .96 ^ x = 5.9 (approximately).

We then take a base .96 logarithm of both sides to obtain x = log (5.9) / log(.96), which we can easily evaluate using a calculator.

Video file #01

http://youtu.be/oixX06DucEI

Fitting an exponential function to data

As we have seen previously, we can fit an exponential function to two data points if we know the value the function approaches as an asymptote.

However, even if we know what the asymptote should be, if we have a large number of data points we must still choose two points to represent all the data.
Our model is therefore subject to our choice, as well as to errors in the data.
When we choose only two points, we do not effectively average the behavior of all the points.

A better approach would be to fit a function to all the data, as we have done using DERIVE with linear, quadratic and power function data.
However, DERIVE does not allow us to use a fit function that has the variable in the exponent (or within any other function except mostly those made up from a sum of power functions).

Consider the y vs. x data below.

Before they were scratched out, we can see that the y values had a common ratio of 2.
Since the x values changed by 1 each time, this is tells us that y is in fact an exponential function with base 2.

We have replaced the scratched-out values with values which are close to, but not equal to those values.

Since the 'new' y data are nearly exponential, we expect that we can find an exponential function which does a good job of approximating those values.

If we did not know in advance that this data should be nearly exponential, we would have a harder time seeing that the behavior is exponential.

For example, a quick glance at the data won't tell us whether an exponential function is more appropriate than a power function.

In such a case we might try both exponential and power-function models.

If we choose to attempt an exponential model, we can proceed as follows:

We attempt to linearize our function by an appropriate transformation.

We will then obtain a linear fit to our transformed data.

Finally we will iinverse-transform our linear function to obtain a model for the original data.

It appears that the y values are approaching 0, so we will assume asymptote 0.

To linearize an exponential function we use the transformation y -> log(y), in which we replace all our y values with log(y). The table below shows the resulting log(y) vs. x data.

First observe what would have happened with our original perfectly exponential data:

Had we applied the transformation y -> log(y) to our original scratched-out data, we would have obtained the scratched-out column for log (y) in the figure below.

You should validate this, then figure out for yourself why it is obvious that this table (using the scratched-out data) would be perfectly linear.

The table with the scratched-out data is is perfectly linear because the x data are separated by equal intervals and because the log(y) column is also separated by equal intervals (in this case the interval is very nearly .3).

A linear function will therefore fit our scratched-out data

Now we transform the actual 'noisy' data. We obtain the second (red) log(y) column, as you should verify.

The log (y) data now has differences .33, .28 and .31 corresponding to equally spaced x values.

Thus log(y) vs. x is not perfectly uniform but is nearly so.

We are therefore encouraged to go ahead and find a linear model for log (y) vs. x.

Our linearized model will thus be a model for log(y) vs. x.

This model is represented by the log (y) vs. x graph below.

As usual, after plotting our points we choose two points on the approximate best-fit straight-line and use them to determine the equation which models the data.
We obtain the estimated equation log (y) = .305 x + .48 (approximately).

Alternatively we could have used an appropriate calculator or a computer algebra system to obtain the equation for the best-fit line.

We now have an equation which models our transformed data, log(y) vs. x. We need an equation to model our original y vs. x data.

We return our model to the original y vs. x by inverting the transformation y -> log(y).

To invert the transformation, we solve our equation for y.

We start with

log (y) = .305 x + .48.

Recalling the law of logarithms that states that log(y) = x is equivalent to 10^x = y, which constitutes the original inverse-function definition of the logarithm function, we obtain the equivalent equation

y = 10 ^ (.305 x + .48).

By the laws of exponents (x ^ (a + b) = x^a * x^b) we see that the right-hand side can be expressed as 10^.305x * 10^.48; since 10 ^ (.305 x) = (10 ^.305) ^ x = 2.02 ^ x and 10 ^.48 = 3.02, we finally obtain

y = 3.02 *2.02 ^ x.

We now compare the values obtained for the given x values using the function y = 3.02 *2.02 ^ x to the original 'noisy' data from which the model was obtained.

We hope that our function values are close to the original data, and that there is no clear pattern to the residuals. If this is the case we have some confidence in the quality of our model.

We evaluate our function for x = 1, 2 ,3, and 4, and compare these values to the corresopnding data values 5.8, 12.3, 23.7, and 48..
We obtain the values shown in the table below.

For x = 1 we obtain y = 3.02 * 2.02 ^ 1 = 6.1 (approximately).

For x = 2 we obtain y = 3.02 * 2.02 ^ 2 = 12.2 (approximately)

We could then obtain the x = 3 and x = 4 values which you should calculate yourself and compare to the data values.

The function fits the data very well, sometimes being somewhat greater and sometimes somewhat less than the data but never differing by more than a few tenths of a unit from the data, and with no clear progression in the residuals.

The figure below shows the entire process.

The process starts with the original data in the upper left-hand corner of the figure, moves to the transformed data in the upper right, then to the linear fit at lower right and the application of the inverse function 10^x to 'undo' the logarithmic transformation, and finally back to the upper right where the function values are compared to the data values.

Video file #02

http://youtu.be/M_X7J3AOrQ4

Had we not known that the data was obtained by inserting random errors (sometimes called 'noise') into exponential data, we might also have attempted a power-function transformation.

For example, we might for some reason have attempted a y -> y^.6 transformation.
The result would have been the y ^ .6 vs. x table shown in the middle of the figure below.

Our transformed data would not be particularly linear the graph of the y ^ .6 vs. x points exhibits a gradual and consistent upward curvature.

A y ^ .5 or y ^ .4 transformation would probably have been more nearly linear.

However, it is possible that random errors in data could result in such a slight linearity even though the behavior being observed should in fact be linearized by this transformation.
So we proceed to sketch a straight line attempting to model the data points (the straight line sketched below doesn't do a particularly good job of this; a DERIVE fit would do a much better job, as would anyone with a good eye and a straightedge).

The estimated fit for the transformed data gives us y^.6 vs. x.

The linear functoin seems to have slope 2 and y-intercept 3.
The transformed function is therefore y ^ .6 = 2 x + 3.
We solve this for y by taking the 1/.6 power of both sides, obtaining the model y = (2 x + 3) ^ 1.7 (approximately).

We compare our model to the original data:

We create a column for the function y = (2x + 3) ^ 1.7 in our original table of y vs. x, and evaluate this function for x = 1, 2, 3 and 4.
The table shows the values we obtain.
We see that the values give us a reasonable but not overly accurate approximation to the observed data.

Had we attempted both the preceding logarithmic and the present .6-power transformation of our data, with the resulting models y = 3.02 * 2.02^x and y = (2x + 3) ^ 1.7, we would be interested in which model fits the functon better.

The near-linearity of the graph of log (y) vs. x, the better fit of the resulting function to the original data, and the apparent pattern of the residuals in the second model would clearly indicate that the first of these models is by far the better.
We would therefore choose the exponential model.

Sometimes we have to choose between competing models without a clue as to which model should in fact be the better.

Often, however, we have a good idea what sort of function would be expected to model to behavior of the system we are observing, and this can also be helpful in deciding which model to try first and in choosing between different models.

Video file #03

http://youtu.be/qSGJdKPSCsM

As we saw with the temperature function, exponential functions do not always have the horizontal axis as their asymptotes, and must therefore be represented by functions of the form y = A b^t + c, with c not equal to 0.

Such a function cannot be linearized by a logarithmic transformation, since log ( A b^t + c) cannot be simplified to a linear function of t (the laws of logarithms don't permit us to do anything with the log of the sum of two quantities).

In this case, if we know the value of the horizontal asymptote y = c, we can form the function yDiff = y - c = A b^t and fit an exponential to this function.

Having obtained a model for yDiff, we can add back the quantity c to obtain a model for our original data.
This was in fact the strategy we followed for the temperature vs. clock time data obtained for the Brussels sprout.

We will often wish to determine whether the yDiff = A b^t function is appropriate.

Recall that the ratios of an exponential sequence are constant.

If our y vs. t data are in fact nearly exponential, if the t values are equally spaced the y values will form a sequence whose difference sequence should have a constant ratio.

If the data is 'noisy', with random errors, then if the errors are not too drastic compared to the differences in the t values, the difference sequence will have a nearly constant ratio.

A brief introduction to Polynomial Functions and their behavior

Consider the function y = (x - 3) (x + 5) (x -7).

We observe that this function is composed of linear factors--linear functions multiplied by one another--and that when the linear functions of this model are multiplied together using the distributive law we obtain a function of the form y = a x^3 + b x^2 + c x + d.
Such a function is called a polynomial function.

This present polynomial function is a sum of the power functions a x^3, bx^2, c x^1 and d x^0.
The powers involved are all integers and are all either positive or 0.

A polynomial function is a function formed by adding power functions with non-negative integer powers.

A product of linear factors always gives us a polynomial function.

Any function which is a sum of power functions for non-negative integer powers of the variable is a polynomial function.
The highest power represented is said to be the degree of the function.
The polynomial function in the above example is of degree three.

It will turn out that any polynomial function is a product of linear factors and irreducible quadratic factors

Irreducible quadratic factors are quadratic factors of the form a x^2 + b x + c for which there is no solution to the equation a x^2 + bx + c = 0; we will see shortly what this means.

We attempt to graph the present function.

We begin by asking where the zeros of the function are.

It should be clear that, for example, if x = 7 we will have y = 0.

This is because when x = 7, the factor (x - 7) will be 0, which will make the entire product zero.

We see similarly that x = 3 and x = -5 are also zeros, and we indicate these zeros on the x axis of the graph in the figure below.

We observe also that no other value of x can possibly give us a zero, since for any other value of x none of the factors will be 0 and their product will hence not be 0.

We next find the y intercept of the graph.

The y intercept occurs when x = 0.

Substituting x = 0 into the definition of the function we obtain y = 105.

We also observe that whenever x is a very large negative number, each of the factors (x - 3), (x + 5), and (x - 7) is a large negative number and their product will therefore be a really, really, really large negative number.

This tell us that the graph must approach the 0 at x = -5 from the left through large negative values, indicated by the purple arrow pointing downward at the lower left of the graph.

A similar observation related to very large positive numbers tells us that for large positive numbers, the product of the factors will be a very large positive number, indicated by the purple arrow pointing upward toward the upper right of the graph.

When these behaviors are combined, we obtain a graph much like that in the figure below.
Actually, this graph isn't the best possible graph, because it is very unlikely that the y intercept (0,105) will coincide with a 'peak' of the function.
Plotting a variety of such functions using DERIVE will give you some intuition about how these functions behave.

The function y = (x - 3) (x ^ 2 + 2 x + 12) is also a polynomial of degree three, since it multiplies out to the form a x^3 + b x^2 + c x + d.

However, the graph of this function is somewhat different than the graph of the function in the preceding example.

This is because, while the factor (x - 3) still gives us a zero at x = 3, the other factor does not give us any zeros.

We don't get any zeros from the quadratic factor because the zeros of this factor are x = [ -2 +- `sqrt(2^2 - 4 * 1 * 12) ] / 2(1); the quantity of which we are taking the square root is in this case -44, so there are no real zeros.

We thus say that the quadratic factor in this function is irreducible.

We use this terminology because if we can factor, or reduce, a quadratic expression, we will get two linear factors and hence two zeros.

Since there are no zeros associated with the irreducible quadratic, we cannot factor it.

The graph of this function will therefore contain its only zero at x = 3.

The y intercept is easily found to be at y = -36.
The behavior at extremely large negative values of x and at extremely large positive values of x is seen to be the same as before.
We therefore obtain a graph something like the one shown below.

Video file #04

http://youtu.be/mU9_L-VlHUw

http://youtu.be/QHx5CbAwwg4