@whuber wskazał ci trzy dobre odpowiedzi, ale być może nadal mogę napisać coś wartościowego. Wasze wyraźne pytanie, jak rozumiem, brzmi:
Biorąc pod uwagę, my wyposażona y^i=m^xi+b^ (zauważ, że dodana 'kapeluszy') , i przy założeniu, że reszty moi zazwyczaj rozproszone, , można przewidzieć, że dotychczas zauważony reakcji, r n e w , o znanej wartości czynnikiem, x n e w , będzie mieścić się w przedziale ( Y - σ e , y + σN(0,σ^2e)ynewxnew , z prawdopodobieństwem 68%?(y^−σe,y^+σe)
Intuitively, the answer seems like it should be 'yes', but the true answer is maybe. This will be the case when the parameters (i.e., m,b, & σ) are known and without error. Since you estimated these parameters, we need to take their uncertainty into account.
tdf errort
y^new±t(1−α/2, df error)s, instead of y^new±z(1−α/2)s, and go about our merry way? Unfortunately, no. The bigger issue is that there is uncertainty about your estimate of the conditional mean of the response at that location due to the uncertainty in your estimates m^ & b^. Thus, the standard deviation of your predictions needs to incorporate more than just serror. Because variances add, the estimated variance of the predictions will be:
s2predictions(new)=s2error+Var(m^xnew+b^)
Notice that the "
x" is subscripted to represent the specific value for the new observation, and that the "
s2" is correspondingly subscripted. That is, your prediction interval is contingent on the location of the new observation along the
x axis. The standard deviation of your predictions can be more conveniently estimated with the following formula:
spredictions(new)=s2error(1+1N+(xnew−x¯)2∑(xi−x¯)2)−−−−−−−−−−−−−−−−−−−−−−−−√
As an interesting side note, we can infer a few facts about prediction intervals from this equation. First, prediction intervals will be narrower the more data we had when we built the prediction model (this is because there's less uncertainty in
m^ &
b^). Second, predictions will be most precise if they are made at the mean of the
x values you used to develop your model, as the numerator for the third term will be
0. The reason is that under normal circumstances, there is no uncertainty about the estimated slope at the mean of
x, only some uncertainty about the true vertical position of the regression line. Thus, some lessons to be learned for building prediction models are: that more data is helpful, not with finding 'significance', but with improving the precision of future predictions; and that you should center your data collection efforts on the interval where you will need to be making predictions in the future (to minimize that numerator), but spread the observations as widely from that center as you can (to maximize that denominator).
Having calculated the correct value in this manner, we can then use it with the appropriate t distribution as noted above.