m{xi,yi}xi∈Rdyi∈Rhθ
hθ(xi)=σ(ωTxi)=σ(zi)=11+e−zi,
ω∈Rdzi=ωTxi
l(ω)=∑i=1m−(yilogσ(zi)+(1−yi)log(1−σ(zi)))
1−σ(z)=1−1/(1+e−z)=e−z/(1+e−z)=1/(1+ez)=σ(−z)
এটিও নোট করুন
∂∂zσ(z)=∂∂z(1+e−z)−1=e−z(1+e−z)−2=11+e−ze−z1+e−z=σ(z)(1−σ(z))
l(ω)∇⃗ 2l(ω)∂z∂ω=xTω∂ω=xT∂z∂ωT=∂ωTx∂ωT=x
li(ω)=−yilogσ(zi)−(1−yi)log(1−σ(zi))
∂logσ(zi)∂ωT∂log(1−σ(zi))∂ωT=1σ(zi)∂σ(zi)∂ωT=1σ(zi)∂σ(zi)∂zi∂zi∂ωT=(1−σ(zi))xi=11−σ(zi)∂(1−σ(zi))∂ωT=−σ(zi)xi
এটা এখন তুচ্ছ দেখানোর জন্য
∇⃗ li(ω)=∂li(ω)∂ωT=−yixi(1−σ(zi))+(1−yi)xiσ(zi)=xi(σ(zi)−yi)
হালকা
আমাদের শেষ পদক্ষেপ হেসিয়ান গণনা করা
∇⃗ 2li(ω)=∂li(ω)∂ω∂ωT=xixTiσ(zi)(1−σ(zi))
m∇⃗ 2l(ω)=∑mi=1xixTiσ(zi)(1−σ(zi)). This is equivalent to concatenating column vectors xi∈Rd into a matrix X of size d×m such that ∑mi=1xixTi=XXT. The scalar terms are combined in a diagonal matrix D such that Dii=σ(zi)(1−σ(zi)). Finally, we conclude that
H⃗ (ω)=∇⃗ 2l(ω)=XDXT
A faster approach can be derived by considering all samples at once from the beginning and instead work with matrix derivatives. As an extra note, with this formulation it's trivial to show that l(ω) is convex. Let δ be any vector such that δ∈Rd. Then
δTH⃗ (ω)δ=δT∇⃗ 2l(ω)δ=δTXDXTδ=δTXD(δTX)T=∥δTDX∥2≥0
since D>0 and ∥δTX∥≥0. This implies H is positive-semidefinite and therefore l is convex (but not strongly convex).