Penerapan SAS/IML Algoritma EM, Pembangkitan sebaran normal ganda, dan Bootstrap τρ
Application EM Algorithm Complete-data specification f(x Φ) f x Φ = b x eφt(x) a(φ) E-step : Estimate the complete data sufficient statistics t(x) by finding t (p) = E t(x) y, Φ (p) M-step : Determine Φ (p+1) as solution of the equations E t x Φ = t (p)
In Regression Analysis estimation of missing data So, assume the regression model behave y = Xβ + ε; ε~n(0,1) E-step M-step : estimation of missing data, in criteria t(x) Χ, for independent variable, and t(x) Y, for dependent variable : the maximum likelihood estimator for β is equal to find the β for maximize the equations below L = n 2 ln 2π n 2 lnσ2 1 2σ 2 y Xβ y Xβ So, the result is β = X X 1 X y
Latihan 1 Suppose we have data for regression analysis below Y X 313 31 384 38 484 48 523 52 633 63 673 67 754 75 844 84 894 89 993 99 When we have incomplete data, estimate the missing value using EM algorithm Y X 313 31 384 38 484 48 523. 633 63 673. 754 75 844 84 894. 993 99 Y = b0 + b1 X
Jawaban 1 data reg; input y x; cards; 313 31 384 38 484 48 523. 633 63 673. 754 75 844 84 894. 993 99 ; run; proc iml; use reg; read all var{y} into y; read all var{x} into x; x0 = {100,100,100}; n = nrow(y);e = 10;x1 = x;i = 1; do while(e > 0.00001); x[loc(x1=.)]=x0; *tahapan E; xb = J(n,1,1) x; beta = inv(xb`*xb)*xb`*y; *tahapan M; xbaru = (y - beta[1])/beta[2]; x2 = xbaru[loc(x1=.)]; e = sum(abs(x2-x0)); x0 = x2; i = i+1; end; print x2 i;
Sebaran Normal Ganda (Multivariate Normal) Suatu vektor peubah acak Y menyebar normal ganda dituliskan dengan Y~MVN(μ, Σ) Y = Y 1 Y p ; μ = μ 1 μ p ; Σ = 2 σ 1 σ 1p σ p1 σ2 p
Sifat Sebaran MVN Kombinasi linier dari semua komponen peubah x juga menyebar normal. Jika X MVN (, ), maka kombinasi linear : a X = a 1 X 1 + a 2 X 2 +...+ a p X p menyebar MVN(a, a a) Jika X N p (, ) maka semua anak gugus dari X juga menyebar normal Jika X 1 dan X 2 saling bebas, dan menyebar MVN 1 ( 1, 11 ) dan MVN 2 ( 2, 22 ) maka sebaran bersyarat [X 1 X 2 ] adalah normal ganda : 1 11 0 MVN1 2, 2 0 22
Penerapan pembangkitan MVN Algoritma pembangkitan Y~MVN(μ, Σ): Bangkitkan Z i ~ N(0,1); i = 1,2, p dengan cov(z i,z j ) = 0 dan simpanlah sebagai vektor z Carilah matriks T sehingga Σ = T T Hitunglah Y = μ + T z maka Y~MVN(μ, Σ)
Ilustrasi 2 Misalkan akan dibangkitkan vektor peubah acak Y~MVN(μ, Σ) dengan μ = 1 4 10 ; Σ = 2 0.5 1 0.5 1 0.2 1 0.2 1
Jawaban 2 data data1; input s1 s2 s3 means; cards; 2-0.5-1 1-0.5 1 0.2 4-1 0.2 1 10 ; run; proc iml; use data1; read all var{s1 s2 s3} into sigma; read all var{means} into mu; p = nrow(sigma); n = 1000; *banyaknya bil acak yg dibangkitkan; T = half(sigma); do i=1 to n; z = rannor(j(p,1,1)); y = mu + T`*z; y1 = y`; hasily = hasily//y1; end; create datamvn from hasily; append from hasily; quit; proc means data=datamvn; run;
Ilustrasi 3 Misalkan akan dibangkitkan vektor peubah acak Y~MVN(μ, Σ) dengan σ 1 =2, σ 2 =5, σ 3 =10 μ = 100 40 10 ; ρ = 1 0.5 0.9 0.5 1 0.2 0.9 0.2 1
Jawaban 3 data data2; input r1 r2 r3 means sd; cards; 1-0.5-0.9 100 2-0.5 1 0.2 40 5-0.9 0.2 1 10 10 ; run; proc iml; use data2; read all var{r1 r2 r3} into R; read all var{means} into mu; read all var{sd} into sd; p = nrow(r); n = 1000; D = diag(sd); DRD = D*R*D`; *sigma; T = half(drd); do i=1 to n; z = rannor(j(p,1,1)); y = mu+t`*z; y1 = y`; hasily = hasily//y1; end; create datamvn2 from hasily; append from hasily; quit; proc corr data=datamvn2; run;
Bootstrap
Definition method for assigning measures of accuracy to sample estimates allows estimation of the sampling distribution of almost any statistic using only very simple methods Generally, it falls in the broader class of resampling methods
Situations where bootstrapping is useful When the theoretical distribution of a statistic of interest is complicated or unknown When the sample size is insufficient for straightforward statistical inference When power calculations have to be performed, and a small pilot sample is available
Types of bootstrap scheme Case Resampling Bootstrap is generally useful for estimating the distribution of a statistic (e.g. mean, variance) without using normal theory (e.g. z-statistic, t- statistic)
Ilustrasi 4 Diketahui data3 yang berisi contoh 120 tinggi badan yang menyebar N(160,100) Lakukanlah proses bootstrap untuk memperoleh sebaran empirik bootsrap dari rataan contohnya.
Jawaban 4 data data3; do i=1 to 120; tinggi = 160+10*rannor(1); output; end; run; proc iml; use data3; read all var{tinggi} into tinggi; n = 120;ulangan = 1000; contoh = ceil(120*ranuni(j(n,ulangan,1))); do i = 1 to ulangan; ci = contoh[,i]; tinggis = tinggi[ci,]; tinggiss= tinggiss tinggis; end; M = tinggiss[:,]; create meanb var{m}; append; quit; proc univariate data=meanb;histogram M;run; proc means data=meanb mean std;var M;run;
Terima kasih