is either very large (larger than the squareroot of the overflow threshold) or very small (smaller than the squareroot of the underflow threshold). Is it true that both the real andimaginary parts of the complex product are always computed to highrelative accuracy?Proof.
For a complex calculation, we only need to calculate its Re and Imaccurately. Let . In order toprove that there exist a such that, we only need to prove that,
1)For addition , our algorithm is
Now let's analyse its error.
For the relative accuracy, let be the relativeaccuaracy,
Re and Im have high relative accuracy.
2)For substraction , our algorithm is
Now let's analyse its error
For the relative accuracy, let be the relative accuaracy,
Re and Im have high relative accuracy.
3)For multiplication , our algorithm is
Now let's analyse its error
For relative accuracy
When have different signs, while and have the same signs, the relative accuracy of Remight be low, similar results works for Im.
4)For division , our algorithm is
Now let's analyse its error, we only analyse thecase ,
For relative accarucy,
Cause we won't multiple two big number, the problemof overflow won't happend. is easy to see from the analyseabove.
This is properly tested on python code as the following.
>>> def complex_divide(a1, a2, b1, b2): ... if abs(b1) < abs(b2): ... return [(b1/b2*a1+a2)/(b1/b2*b1+b2), (b1/b2*a2-a1)/(b1/b2*b1+b2)] ... else: ... return [(a1+b2/b1*a2)/(b1+b2/b1*b2), (a2-b2/b1*a1)/(b1+b2/b1*b2)] ... >>> complex_divide(2**666, 2**666, 2**666, 2**666) [1.0, 0.0] >>> complex_divide(2**-666, 2**-666, 2**-666, 2**-666) [1.0, 0.0] >>> complex_divide(2**-666, 2**666, 2**-666, 2**666) [1.0, -0.0] >>> complex_divide(2**666, 2**-666, 2**666, 2**-666) [1.0, 0.0] Question 1.13Question. Prove Lemma 1.3. Let (or ) and be an inner product. Then there isan -by- s.p.d. matrix such that .Conversely if is s.p.d., then is an inner product.
Proof.
We only prove the case in , the case in isthe special case of the former one.
The matrix form of inner product. Let be the basis of , then we can rewrite the inner product as
We shall show that A is s.p.d.,
which means . , , , iff , iff .
If A is s.p.d, we shall show that fit the four properties ofinner product.
1) .
2).
3) .
4)for A is h.p.d, , iff .
Question 1.14Question. Prove lemma 1.5: ,
Proof.
Question 1.15Question. Prove Lemma 1.6:
An operator norm is a matrix norm.
Proof.
1)
If , then , . If ,then , which means .
2)
3)
Question 1.16Question. Prove all parts except 7 of Lemma 1.7. Hint for part 8:Use the fact that if and are both -by- , then and have the same eigenvalues. Hint for part 9: Use the fact that a matrixis normal if and only if it has a complete set of orthonormaleigenvectors.
Part 1. Prove that for avector norm and its corresponding operator norm, or the vectortwo-norm and matrix Frobenius norm is True.
Proof. For operator norm, this can be soon obtained by
if for any nonzero vector , we take .
For matrix Frobenius norm, if we rewrite ,its row vector representation, we have
The last step is to sum the square of all the inequalities above,i.e.
Part 2. Prove that forany operator norm or the Frobenius norm. In other words, anyoperator norm (or the Frobenius norm) is mutually consistent withitself.
Proof. For operator norm,
For Frobenius norm, this can be soon obtained byrewrite , its column vector representation, andfollowing what we have done in part 1.
Part 3. Prove that the max norm and Frobenius norm are not operatornorm.
Proof. For max norm, we may show that it violates part 2 ofLemma 1.7 by the counter example,
It canbe easily shown thatFrobenius norm is not operator norm, as . (Note: Here we only considerthe case that , ifyou know the proof of the case which is more extensive, you cancontact us).
Part 4. if areorthogonal or unitary for the Frobenius norm and for the operatornorm induced by . This is really just thePythagorean theorem.
Proof. Only to Prove when orthogonal, since unitary is the same.For operator norm induced by
since when is orthogonal.
For Frobenius norm, it is the Pythagorean theorem. if isits row vector representation, is the columnvector representation for orthogonal matrix. Then for any,its orthogonal decomposition, and is the( )-th element of . Square both sides of the equality, andsum it through 1 to , we have on the leftand on the right. This fact is the same forleft matrix product.
Part 5. Prove that,the maximum absolute row sum.
Proof. this can be obtained by take where each sign of the elements can take plus or minus. The maximum is then reached from the vectors. Obviously, the infinity norm is selected from the maximum of each absolute row sum.
Part 6. Prove that,the maximum absolute column sum.
Proof. This is trivial from the view of functional analysis, as , which is defined on , is theadjoint operator of , and .
Part 8. Prove that .
Proof. This is the same as the proof of part 6.
Part 9. Prove that if is normal, i.e. .
Proof. Remember that is normal is equivalent to that thereexists unitary matrix , is diagonal. We have
This is true for of course.
Part 10. If is -by- , then .
Proof. There exists ,
wherethe last inequality is a corollary from the inequality of means.There exists ,Part 11. If is -by- , then .
Proof. This can be soon obtained from parts 6, 8, and 10.
Part 12. If is -by- , then .
Proof. This can be soon obtained from parts 10 and 11.
Part 13. If is -by- , then .
Proof. For the first part, find one, then define the -th column vector of ,
The second part is true, since
Question 1.17Question. We mentioned that on a Cray machine the expression caused an error, because round-off caused to exceed 1. Show that this is impossible usingIEEE arithmetic, barring overflow or underflow. Hint: You will need touse more than the simple model with small. Think about evaluating , and show thatbarring overflow or underflow, exactly; in numericalexperiments done by A. Liu, this failed about 5% of the time on a CrayYMP. You might try some numerical experiments and explain them. Extracredit: Prove the same result using correctly rounded decimalarithmetic. (The proof is different.)
Proof.
Now consider the case that we want to calculate . In IEEEarithmetic, the exponent part, and the fraction part is separated, so weare actually dealing with where and is a rationalnumber with maximum of 52 decimals in its binary representation (defined as for the convention).
For any , there exists an even (positive) number ,s.t. . Here is the machineaccuracy. Then, firstly we have the conclusion that
As and will always contribute noless than when rounding to any precision.
Define Andobviously,
We want to prove that Since is non-decreasing, we shall only to prove the case for and .
...(Wait for completion. If you can do it, please contact us)
When calculates exactly as it should, this assures that .
Question 1.18Question. Suppose that and are normalized IEEE doubleprecision floating point numbers, and consider the following algorithm,running with IEEE arithmetic:
if , swap and
Prove the following facts:
Barring overflow or underflow, the only round-off error committed inrunning the algorithm is computing . In other words,both subtractions and are computed exactly.
exactly. This means that is actually theround-off error committed when rounding the exact value of toget .
Thus, this program in effect simulates quadruple precision arithmetic,representing the true sum as the higher-order bits ( ) and thelower-order bits ( ).
Using this and similar tricks in a systematic way, it is possible toefficiently simulate all four basic floating-point operations inarbitrary precision arithmetic, using only the underlying floating pointinstructions and no "bit-fiddling". 128-bit arithmetic is implementedthis way on the IBM RS6000 and Cray (but much less efficiently on theCray, which does not have IEEE arithmetic).
Proof. We shall prove the following lemma: If floating point numbers and satisfy , then ,i.e. is an exact floating-point number.
Proof of lemma. Remember we are in the binary system, and meansshift 1 digit of to the left. If has the same exponent as ,then the proof is trivial. When has the same exponent as , will have exponent less(equal) than(to) since the highestdigit of and shall be subtracted. Notice that ,and this concludes the lemma.
Assume , , their floating pointrepresentation. For the case , (and similarly for ), notice that Hence, iscomputed exactly by the lemma. And for , one of the followingholds,
When , ( automatically holds); or and both hold, this will be trivial.
When , but , round(up) occurs. exact, and is in .
When , might cause rounding up or roundingdown. Rounding up means , while rounding downmeans has the same exponent as . Both conditionsindicate exact computing.
For the case , (or similarly for ),...(If you can do this, please contact us)
Question 1.19Question. This question illustrates the challenges in engineeringhighly reliable numerical software. Your job is to write a program tocompute the two-norm given . The most obvious (and inadequate algorithm is)
for to end for
This algorithm is inadequate because it does not have the followingdesirable properties:
It must compute the answer accurately (i.e., nearly all the computeddigits must be correct) unless is (nearly)outside the range of normalized floating-point numbers.
It must be nearly as fast as the obvious program above in mostcases.
It must work on any "reasonable" machine, possibly including ones not running IEEE arithmetic. This means it may not cause an errorcondition unless is (nearly) larger than the largest floating-point number.
To illustrate the difficulties, note that the obvious algorithm failswhen and is larger than the square root of the largestfloating-point number (in which cases overflows, and the programreturns in IEEE arithmetic and halts in most non-IEEEarithmatics) or when and is smaller than the square root ofthe smallest normalized floating-point number (in which case underflows, possibly to zero, and the algorithm may return zero).Scaling the by dividing them all by does not haveproperty 2) because division is usually many times more expensive thaneither multiplication or addition. Multiplying by risks overflow in computing , even when .
This routine is important enough that it has been standardized as aBasic Linear Algebra Subroutine, or BLAS, which should be availableon all machines.
Answer.
To avoid overflow, one can separate out the exponent part of a floating-point number, and compute the scaled squares. One result can be like
import numpy as np def frexp(x): """fake one to separate double precision floating point number's exponent and tail""" b = '{:0>64s}'.format(bin(np.float64(x).view(np.uint64))[2:]) exp = int(b[1:12], base=2) - 1023 if exp > -1023: return exp, int(b[12:], base=2) * (2**-52) + 1 else: extra = b[1:].find('1') return exp - extra + 11, int(b[extra:], base=2) * (2**(extra-62)) def norm2(x): if len(x) == 0: return 0 if len(x) == 1: return abs(x[0]) s = 0 size = -1076 for i in range(len(x)): x_exp, x_frac = frexp(x[i]) # 獲取指數部分和尾數部分 if size < x_exp: s *= 2 ** (size - x_exp) size = x_exp else: x_frac *= 2 ** (x_exp - size) s += x_frac ** 2 return s**0.5 * 2**size This code is tested for
>>> norm2([2**-1024]) == 2**-1024 True >>> norm2([2**-1054, 2**-1055]) == 2**-1055 * 5**0.5 True >>> norm2([0.+2**1020]) == 2**1020 True >>> norm2([0.+2**1020, 0.+2**1020]) == 2**1020 * 2**0.5 True >>> norm2([2**-1054, 2**1020]) == 2**1020 True >>> norm2([2**1020, 2**-1054]) == 2**1020 TrueThough for the convenience of coding, we cannot apply the built-infunction <frexp> in C code and implement it in a slow way, theperformance can be still evaluated. Assume the 'frexp' operation can beperformed in a proper way (in machine level, separate the exponent partand the tail part), and calculated as three operations, totally thisalgorithm cost operations at most (includingaddition(subtraction), multiplication, moving bits), about 4 timeslonger than , the provided inadequate algorithm.
Further, the code is implemented in C code. (Please go see attachment ingitee)
https://gitee.com/j7168908jx/shuzhidaishu/blob/master/code/c1-19/code.c
Test results are given in table 11. norm2simple is the baseline algorithm given inthe question, norm2lapack is the one from LAPACK in Fortran code.norm2 and norm22 are the algorithms designed in the answer, wherenorm22 uses more efficient but probably harder to read operations, andmight not work with denormalized floating-point numbers. The vector is generated from the standard normal distribution.
During the completion of the C code, conclusions are obtained. One ofthe most important for the problem is that sometimes it is not soexpensive to do the division, rather than using other techniques thatmight add more complexity to the code and the running time. Limited CPUstructure(register cache, variable storage) is to blame.
Table 1. Test result for question 1.19Iterations, length( )norm2simplenorm2lapacknorm22norm2 11.25s13.45s20.49s22.38sQuestion 1.20Question. We will use a Matlab program to illustrate how sensitivethe roots of a polynomial can be to small perturbations in thecoefficients. Polyplot takes an input polynomial specified by its rootsr and then adds random perturbations to the polynomial coefficients,computes the perturbed roots, and plots them. The inputs are
vector of roots of the polynomial, maximum relative perturbation to make to each coefficient of the polynomial, number of random polynomials to generate, whose roots are plotted.
The first part of your assignment is to run this program for the following inputs. In all cases choose m high enough that you get afairly dense plot but don't have to wait too long. m = a few hundred or perhaps 1000 is enough. You may want to change the axes of the plot if the graph is too small or too large.
Also try your own example with complex conjugate roots. Which rootsare most sensitive?
r=(1:10);e=1e-3,1e-4,1e-5,1e-6,1e-7,1e-8,
r=(1:20);e=1e-9,1e-11,1e-13,1e-15,
r=[2,4,8,16,...,1024];e=1e-1,1e-2,1e-3,1e-4.
The second part of your assignment is to modify the program tocompute the condition number c(i) for each root. In other words, arelative perturbation of e in each coefficient should change rootr(i) by at most about e*c(i). Modify the program to plot circlescentered at r(i) with radii e*c(i), and confirm that these circlesenclose the perturbed roots (at least when e is small enough thatthe linearization used to derive the condition number is accurate).You should turn in a few plots with circles and perturbed roots, andsome explanation of what you observe.
In the last part, notice that your formula for c(i) "blows up" if . This condition means that r(i) is a multiple root of . We can still expect some accuracy in the computed value ofa multiple root, however, and in this part of the question, we willask how sensitive a multiple root can be: First, write , where and is themultiplicity of the root r(i). Then compute the roots nearestr(i) of the slightly perturbed polynomial ,and show that they differ from r(i) by . Sothat if , for instance, the root r(i) is perturbed by , which is much larger than if . Higher values of yield even largerperturbations. If is around machine epsilon andrepresents rounding errors in computing the root, this means an -tuple root can lose all but -th of its significant digits.
Answer.
1.Here we let m =1000.
1)The case r=1:10, let e=1e-3,1e-4,1e-5,1e-6,1e-7,1e-8. See figures below.
Figure. When r=1:102)The case r=1:20,let e=1e-9,1e-11,1e-13,1e-15, See figures below.
Figure. When r=2:103)The case r=[2,4,8,...,1024],let e=1e-1,1e-2,1e-3,1e-4, See figures below.
Figure. When r=[2,4,8,...,1024]2.Let's start from the theoretical analysis. Let, are the roots.Let,
where notes the perturbation of the coefficient. Withoutloss of generality, we focus on . Here we assume that theperturbation is small enough so the perturbation will besmall enough for our analysis. Then
So
For our numerical experiment, we assume that is muchsmaller than the minimum distance between the roots. And we show case 1)with e = 1e-7, see the figure below.
r=1:10,e=1e-7We can see that the circle in the middle is muchlarger than the circle on the edeg, but the circle of the roots 5 or 6is not the largest . It is easy to see that increases monotonically with but the derivative of at is much smaller in the middle than (see the figure below), and from estimationof , we know that the largest circle will appear on the secondhalf of the roots.
derivative3.The same trick as 2, notice that k-th derivative of p(x) will be 0 if , then
See the code appendix c1-20
https://gitee.com/j7168908jx/shuzhidaishu/tree/master/code/c1-20
Question 1.21Question. Apply Algorithm 1.1, Bisection, to find the roots of , where is evaluated using Horner's rule. Use theMatlab implementation, or else write your own. Confirm that changing theinput interval slightly changes the computed root drastically. Modifythe algorithm to use the error bound discussed in the text to stopbisecting when the round-off error in the computed value of getsso large that its sign cannot be determined.
Answer.
The bisection is implemented as
import numpy as np from matplotlib import pyplot as plt def bisect(coeff, a, b, tol): low = a high = b poly = np.poly1d(coeff, r=True) p_low = poly(low) p_high = poly(high) if abs(p_low) < tol: return low, low elif abs(p_high) < tol: return high, high while high - low > 2 * tol: mid = (low + high) / 2 p_mid = poly(mid) if p_mid * p_low < 0: high = mid p_high = p_mid elif p_mid * p_high < 0: low = mid p_low = p_mid else: low = high = mid p_low = p_high = p_mid return low, high result = [bisect([2] * 9, 1, h, 1e-5) for h in np.linspace(3, 3.5, 100)] plt.plot(np.linspace(3, 3.5, 100), result) plt.xlabel("search interval: [1, x]") plt.ylabel('result') plt.title("different bisection result when initial value varies") plt.show() and the great difference in result can be inherited from figure listed below.
result for question 1.21Modified bisection search code is listed below.
def bisect(roots, a, b, tol): low = a high = b poly = np.poly1d(roots, r=True) coeff = poly.coef poly2 = np.poly1d(np.abs(coeff)) p_low = poly(low) p_high = poly(high) if abs(p_low) < tol: return low, low elif abs(p_high) < tol: return high, high def stop(mi, pmi): err = 2 * len(roots) * poly2(np.abs(mi)) * 2**-52 if pmi > 0: return pmi - err <= 0 else: return pmi - err > 0 while True: mid = (low + high) / 2 p_mid = poly(mid) if p_mid * p_low < 0: high = mid p_high = p_mid elif p_mid * p_high < 0: low = mid p_low = p_mid else: low = high = mid p_low = p_high = p_mid if high - low < 2 * tol: break if stop(mid, p_mid): low = high = mid p_low = p_high = p_mid break return low, highRespectively, the outcome of code, when given different initial value,is shown in the figure below.
result for question 1.21有什麼問題歡迎在下方留言哦~ <