I'll give it a try on this. I'm going to use Yao's original notation. This way it will be easier to contrast with his paper and his definitions.
Let II be a finite set of inputs, and let A0A0 be a finite set of deterministic algorithms that may fail to give a correct answer for some inputs. Let also ϵ(A,x)=0ϵ(A,x)=0 if AA gives the correct answer for xx, and ϵ(A,x)=1ϵ(A,x)=1 otherwise. Also denote by r(A,x)r(A,x) the number of queries made by AA on input xx, or equivalently, the depth of AA's decision tree.
Average Cost: Given a probability distribution dd on II, the average cost of an algorithm A∈A0A∈A0 is C(A,d)=∑x∈Id(x)⋅r(A,x)C(A,d)=∑x∈Id(x)⋅r(A,x).
Distributional Complexity: Let λ∈[0,1]λ∈[0,1]. For any distribution dd on the inputs, let β(λ)β(λ) be the subset of A0A0 given by β(λ)={A:A∈A0,∑x∈Id(x)⋅ϵ(A,x)≤λ}β(λ)={A:A∈A0,∑x∈Id(x)⋅ϵ(A,x)≤λ}. The distributional complexity with error λλ for a computational problem PP is defined as F1,λ(P)=maxdminA∈β(λ)C(A,d)F1,λ(P)=maxdminA∈β(λ)C(A,d).
λλ-tolerance: A distribution qq on the family A0A0 is λλ-tolerant if maxx∈I∑A∈A0q(A)⋅ϵ(A,x)≤λmaxx∈I∑A∈A0q(A)⋅ϵ(A,x)≤λ.
Expected Cost: For a randomized algorithm RR, let qq be a probability distribution that is λλ-tolerant on A0A0. The expected cost of RR for a given input xx is E(R,x)=∑A∈A0q(A)⋅r(A,x)E(R,x)=∑A∈A0q(A)⋅r(A,x).
Randomized Complexity: Let λ∈[0,1]λ∈[0,1]. The randomized complexity with error λλ is F2,λ=minRmaxx∈IE(R,x)F2,λ=minRmaxx∈IE(R,x).
Now we are ready to go into business. What we want to prove is given a distribution dd on the inputs and a randomized algorithm RR (i.e., a distribution qq on A0A0)
Yao's Minimax Principle for Montecarlo Algorithms maxx∈IE(R,x)≥12minA∈β(2λ)C(A,d)
maxx∈IE(R,x)≥12minA∈β(2λ)C(A,d)
for λ∈[0,1/2]λ∈[0,1/2].
I will follow an approach given by Fich, Meyer auf der Heide, Ragde and Wigderson (see Lemma 4). Their approach does not yield a characterization for Las Vegas algorithms (only the lower bound), but it is sufficient for our purposes. From their proof, it easy to see that for any A0A0 and II
Claim 1. maxx∈IE(R,x)≥minA∈A0C(A,d)maxx∈IE(R,x)≥minA∈A0C(A,d).
To get the correct numbers there, we'll do something similar. Given that the probability distribution qq given by the randomized algorithm RR is λλ-tolerant on A0A0 we have that
λ≥maxx∈I{∑A∈A0q(A)⋅ϵ(A,x)}≥∑x∈Id(x)∑A∈A0q(a)⋅ϵ(A,x)=∑A∈A0q(a)∑x∈Id(x)⋅ϵ(A,x)≥minA∈A0{∑x∈Id(x)⋅ϵ(A,x)}.
λ≥maxx∈I{∑A∈A0q(A)⋅ϵ(A,x)}≥∑x∈Id(x)∑A∈A0q(a)⋅ϵ(A,x)=∑A∈A0q(a)∑x∈Id(x)⋅ϵ(A,x)≥minA∈A0{∑x∈Id(x)⋅ϵ(A,x)}.
If we replace the family
A0A0 with
β(2λ)β(2λ) we see that
λ≥maxx∈I{∑A∈A0q(A)⋅ϵ(A,x)}≥maxx∈I{∑A∈β(2λ)q(A)⋅ϵ(A,x)}≥∑x∈Id(x)∑A∈β(2λ)q(a)⋅ϵ(A,x)=∑A∈β(2λ)q(a)∑x∈Id(x)⋅ϵ(A,x)≥minA∈β(2λ){12∑x∈Id(x)⋅ϵ(A,x)},
λ≥maxx∈I{∑A∈A0q(A)⋅ϵ(A,x)}≥maxx∈I⎧⎩⎨∑A∈β(2λ)q(A)⋅ϵ(A,x)⎫⎭⎬≥∑x∈Id(x)∑A∈β(2λ)q(a)⋅ϵ(A,x)=∑A∈β(2λ)q(a)∑x∈Id(x)⋅ϵ(A,x)≥minA∈β(2λ){12∑x∈Id(x)⋅ϵ(A,x)},
where the second inequality follows because β(2λ)⊆A0β(2λ)⊆A0, and the last inequality is given by the definition of β(2λ)β(2λ) where the summation divided by 2 cannot be greater than λλ. Hence,
maxx∈I{∑A∈A0q(A)⋅ϵ(A,x)}≥12minA∈β(2λ){∑x∈Id(x)⋅ϵ(A,x)}.
maxx∈I{∑A∈A0q(A)⋅ϵ(A,x)}≥12minA∈β(2λ){∑x∈Id(x)⋅ϵ(A,x)}.
By noting that ϵϵ maps to {0,1}{0,1} and rr maps to NN and Claim 1 above, now we can safely replace the function ϵϵ in the inequality above by r(A,x)r(A,x) to obtain the desired inequality.