লুকা, যেহেতু এক বছর কেটে গেছে, আপনি সম্ভবত নিজের উত্তরটি নিয়ে গবেষণা করেছেন। আমি এখানে আপনার কয়েকটি প্রশ্নের উত্তর দিচ্ছি কেবল রেকর্ডের জন্য। আপনার উল্লিখিত সমস্যার জন্য আমি কিছু ল্যাঙ্গরজিয়ান-রিলাক্সেশন অ্যালগরিদম পর্যালোচনা করি এবং শেখার সংযোগটি স্কেচ করি (বিশেষত বিশেষজ্ঞের পরামর্শ অনুসারে)। আমি এখানে এসডিপি অ্যালগরিদমে মন্তব্য করি না।
নোট করুন যে নির্দিষ্ট অ্যালগরিদমগুলি আপনি উল্লেখ করেছেন তা প্রায় রৈখিক সময়ে চলবে না। (সেখানে জন্য একটি প্রায় রৈখিক সময় আলগোরিদিম
স্পষ্টভাবে দেওয়া প্যাকিং বা সমস্যা আচ্ছাদন। দেখুন
ফ্র্যাকশনাল প্যাকিং এবং প্রাবরণ লিনিয়ার প্রোগ্রামের জন্য সিমপ্লেক্স পিটিয়ে ।) আলগোরিদিম আপনি মনে আছে সাধারণত একটি প্রায় রৈখিক সংখ্যা যে run রূপগুলো আছে পুনরাবৃত্তিও , কিন্তু প্রতিটি পুনরাবৃত্তির জন্য সাধারণত কমপক্ষে রৈখিক সময়ও প্রয়োজন। আমি নীচে এই কয়েকটি অ্যালগরিদম নিয়ে আলোচনা করি।
কিছু দরকারী ফাংশন
আমরা শুরু করার আগে, এখানে কয়েকটি ফাংশন রয়েছে যা আমরা প্রমাণ স্কেচগুলিতে ব্যবহার করব। (আপনি আলগোরিদিম আগ্রহী, কিন্তু না এমন প্রমাণ বিবরণ, আপনি এগিয়ে এড়িয়ে যেতে পারেন।) কোন ভেক্টর জন্য Yy , নির্ধারণ Lmax ( Y )Lmax(y) হতে Ln Σ আমি Exp ( Y আমি )ln∑iexp(yi) । এই ফাংশনটি সর্বোচ্চ i y i এরmaxiyi উপরের একটি আবদ্ধ :
সর্বোচ্চ i y i ≤ Lmax ( y ) ≤ সর্বোচ্চ i y i + ln m । maxiyi ≤ Lmax(y) ≤ maxiyi+lnm.
আনুষাঙ্গিকভাবে, সংজ্ঞায়িত করুনLmin ( y )Lmin(y)হতে - Lmax ( - Y )−Lmax(−y) , উপর একটি নিম্ন আবদ্ধ মিনিট আমি Y আমিminiyi ।
এরপরে যা সুবিধে হয় তার সুবিধার জন্য, আমরা লিমিনের গ্রেডিয়েন্ট ∇ লিমিন ( y ) বোঝাতে g ( y )g(y) ব্যবহার করি। আমরা গ্রেডিয়েন্ট den বোঝাতে G ( y ) ব্যবহার করি ∇∇Lmin(y)G(y)Lmax Lmax ( y )∇Lmax(y)করি।
স্পষ্টভাবে ছ আমি ( Y )gi(y) হয় Exp ( - Y আমি ) / Σ আমি ' Exp ( - Y আমি ' )exp(−yi)/∑i′exp(−yi′)
যখন জি আমি ( Y )Gi(y) হয় Exp ( Y আমি ) / Σ আমি ' Exp ( Y আমি ' )exp(yi)/∑i′exp(yi′) ।
লিমিন এবং ল্যামাক্স নিম্নলিখিত অর্থে মসৃণ : কোনও ভেক্টরের জন্য d ∈ [ 0 , ε ] nd∈[0,ε]n এবং y ∈ R ny∈Rn ,
লিমিন ( y + d ) ≥ লিমিন ( y ) + ( 1 - ও ( ε ) ) d⋅g(y)Lmin(y+d) ≥ Lmin(y) + (1−O(ε))d⋅g(y)
and
Lmax(y+d) ≤ Lmax(y) + (1+O(ε))d⋅G(y).Lmax(y+d) ≤ Lmax(y) + (1+O(ε))d⋅G(y).
Note that both of the gradients have 1-norm equal to 1:
|G(y)|=|g(y)|=1|G(y)|=|g(y)|=1. (Throughout we use |z||z| to denote the 1-norm.)
Note also that, for a matrix AA,
the gradient of the function x↦Lmin(Ax)x↦Lmin(Ax) with respect to xx
is (by the chain rule) (g(Ax))TA(g(Ax))TA.
More explicitly, the partial derivative of the function with respect to xjxj
is ∑iAijexp(−Aix)/∑iexp(−Aix)∑iAijexp(−Aix)/∑iexp(−Aix).
Likewise, the partial derivative of Lmax( একটি এক্স )(Ax)
থেকে সম্মান সঙ্গে এক্স ঞxj হয় Σ আমি একজন আমি ঞ Exp ( একটি আমি এক্স ) / Σ আমি Exp ( একটিix)∑iAijexp(Aix)/∑iexp(Aix).
Fractional Set Cover
একটি সেট-কভার উদাহরণ ঠিক করুন। আসুন একটিA উপাদান / সেট ঘটনা ম্যাট্রিক্স বোঝান। সুতরাং, একটি ই গুলি = 1Aes=1 যদি ই ∈ গুলিe∈s , অন্য 0, এবং একটি ই এক্সAex ততটাই যে ভগ্ন কভার হয় এক্সx কভার উপাদান ইe ।
এলপি ন্যূনতম { | এক্স | : একটি এক্স ≥ 1 ; x ≥ 0 }min{|x|:Ax≥1;x≥0} । প্রদত্ত ε ∈ ( 0 , 1 )ε∈(0,1) আলগোরিদিম হয়
- Initialize all xs=0xs=0. Let N=log(n)/εN=log(n)/ε.
Repeat until mineAex≥NmineAex≥N:
2.1. Choose ss maximizing the partial derivative of Lmin(Ax)(Ax) w.r.t. xsxs.
(Explicitly, choose ss maximizing ∑e∈sexp(−∑s′∋exs′)∑e∈sexp(−∑s′∋exs′).)
2.2. Increase xsxs by εε.
Return x/mineAexx/mineAex.
The algorithm returns a (1+O(ε))(1+O(ε)) approximate solution
in O(|x∗|log(n)/ε2)O(|x∗|log(n)/ε2) iterations, where nn is the number of elements
and x∗x∗ is the optimal fractional set cover (trivially |x∗|≤n|x∗|≤n).
(A similar algorithm appears in the paper Chandra mentioned.
Vertex Cover is of course a special case.)
(Remark: Note that the iteration bound does not depend on the number
of sets, just the number of elements. Thus, the algorithm can be used
with an implicitly defined set system, as long as, given weights on the
elements, one can efficiently find a set of maximum (or near-maximum) total weight.
This kind of oracle is the same as the separation oracle required to apply
the ellipsoid algorithm to the dual problem. For packing problems such
as set packing, you need an oracle that, given weights on the elements,
returns a set minimizing the total weight. For problems such as
multi-commodity flow, you might, for example, need to find a path
minimizing the sum of some given edge weights.)
Here's a sketch of the proof of the performance guarantee.
In each iteration, the partial derivative w.r.t. the chosen ss
is at least 1/|x∗|1/|x∗|, where x∗x∗ is the optimal fractional set cover.
(To see why, recall that the gradient of Lmin(Ax)(Ax) with respect to xx is (g(Ax))TA(g(Ax))TA.
If we were to choose a set s′s′ at random from the distribution x∗/|x∗|x∗/|x∗|,
the expected value of the partial derivative with respect to xs′xs′
would thus be (g(Ax))TAx∗/|x∗|(g(Ax))TAx∗/|x∗|.
Since Ax∗≥1Ax∗≥1, this is at least |g(Ax)|/|x∗||g(Ax)|/|x∗|.
Since |g(Ax)|=1|g(Ax)|=1, this is at least 1/|x∗|1/|x∗|.
Thus, there must exist some ss giving partial derivative at least 1/|x∗|1/|x∗|.
Since the algorithm chooses xsxs in each iteration to maximize
the partial derivative, it achieves a partial derivative of at least 1/|x∗|1/|x∗|.)
Then, the step size εε is chosen just small enough so that
no coordinate of AxAx increases by more than εε.
Thus, because of the smoothness of Lmin, increasing xsxs
to xs+εxs+ε increases Lmin(Ax)Lmin(Ax) by at least
(1−O(ε))ε/|x∗|(1−O(ε))ε/|x∗|.
In this way, the algorithm maintains the invariant
Lmin(Ax)≥(1−O(ε))|x|/|x∗|−lnn.Lmin(Ax)≥(1−O(ε))|x|/|x∗|−lnn.
(Note that Lmin(¯0)(0¯¯¯) equals lnnlnn.)
At termination, in the invariant, the lnnlnn term is O(ε)O(ε) times the left-hand side,
so by calculation one gets mineAex≥(1−O(ε))|x|/|x∗|mineAex≥(1−O(ε))|x|/|x∗|.
After the normalization in the last line of the algorithm,
this implies |x|≤(1+O(ε))|x∗||x|≤(1+O(ε))|x∗|.
FWIW, the inequalities involved in proving the invariant are essentially
the same as those involved in proving the Chernoff bound.
(In fact, this algorithm can be derived by applying the
method of conditional probabilities to a randomized-rounding
scheme that repeatedly samples sets from the distribution x∗/|x∗|x∗/|x∗|
(with replacement), increasing xsxs for each sampled set ss.
This derandomization of that gives the algorithm:
the underlying invariant is just that the pessimistic estimator stays below 1.
The exponential penalties in the pessimistic estimator come from the
using the Chernoff bound in the analysis of the rounding scheme.
This basic idea is explained further in the paper Chandra mentioned.)
Fractional Weighted Set Cover (and general fractional Covering)
To handle problems such as Weighted Set Cover efficiently,
we modify the algorithm to use non-uniform increments
(an idea due to Garg and Konemann).
The LP is min{c⋅x:(∀e)∑s∋exs≥1}min{c⋅x:(∀e)∑s∋exs≥1},
where ee ranges over the elements, ss ranges over the sets,
and all variables are non-negative.
To present the algorithm, first rewrite the problem as a general covering problem.
Let Aes=1/csAes=1/cs for e∈se∈s and Aes=0Aes=0 otherwise.
Then (with a change of variables, scaling each xsxs by cscs),
the LP is min{|x|:Ax≥1;x≥0}min{|x|:Ax≥1;x≥0},
which we can view as a general covering LP.
Here is the algorithm:
Initialize all xs=0xs=0. Let N=log(n)/εN=log(n)/ε.
Repeat until all covering constraints have been deleted:
2.1. Choose ss maximizing the partial derivative of Lmin(Ax)(Ax) w.r.t. xsxs.
(Explicitly, choose ss maximizing ∑e∈sexp(−∑s′∋exs′)/cs∑e∈sexp(−∑s′∋exs′)/cs.)
2.2. Increase xsxs by δδ, where δδ is chosen maximally such that, for every remaining covering constraint ee, the increase in Ae⋅xAe⋅x is at most εε.
2.3 Delete all covering constraints ee such that Ae⋅x≥NAe⋅x≥N.
Return x/mineAe⋅xx/mineAe⋅x.
The algorithm returns a (1+O(ε))(1+O(ε))-approximate solution
in O(nlog(n)/ε2)O(nlog(n)/ε2) iterations, where nn is the number of covering constraints.
(Each iteration increases some remaining AexAex by εε; this can happen
only N/εN/ε times to a constraint before it is deleted.)
The proof of correctness is via essentially the same invariant as for Set Cover.
Weighted Vertex Cover is a special case.
Maximum Fractional Bipartite Matching
Given a graph G=(U,W,E)G=(U,W,E),
the natural LP for the problem is max{|x|:∀v.∑e∋vxe≤1}max{|x|:∀v.∑e∋vxe≤1}.
In matrix representation, this is a packing LP max{|x|:Ax≤1;x≥0}max{|x|:Ax≤1;x≥0}
with 0-1 coefficients (Ave=1Ave=1 if v∈ev∈e).
Such problems do not require non-uniform increments, so a simple algorithm analogous to the unweighted Set Cover algorithm (but for packing) will do:
- Initialize all xe=0xe=0. Let N=log(n)/εN=log(n)/ε.
While Ax<NAx<N:
2.1. Choose ee minimizing the partial derivative of Lmax(Ax)(Ax) w.r.t. xexe.
(Explicitly, choose ee to minimize ∑v∈eexp(∑e′∋vxe′)∑v∈eexp(∑e′∋vxe′).)
2.2. Increase xexe by εε.
Return x/maxvAvxx/maxvAvx.
The algorithm returns a (1−O(ε))(1−O(ε))-approximate solution in O(nlog(n)/ε2)O(nlog(n)/ε2) iterations.
(This is because each iteration increases |x||x| by εε,
and finally, before normalization, |x|=O(Nn)|x|=O(Nn).)
Just for fun, here is a curious alternative algorithm for Perfect Bipartite Matching.
Recall that G=(U,W,E)G=(U,W,E). Let n=|U|=|W|n=|U|=|W|.
- Initialize all xe=0xe=0. Let N=4ln(n)/εN=4ln(n)/ε.
Repeat nNnN times:
2.1. Choose uu uniformly at random from UU.
2.2. Choose ww such that (u,w)∈E(u,w)∈E minimizing ∑e∋wxe∑e∋wxe.
2.3. Increase xuwxuw by εε.
Return x/Nx/N.
If GG has a perfect matching, the algorithm returns
an xx such that |x|=n|x|=n, and, with high probability,
for all vertices u∈Uu∈U, 1−O(ε)≤∑e∋uxe≤1+O(ε)1−O(ε)≤∑e∋uxe≤1+O(ε),
and for all vertices w∈Ww∈W, ∑e∋wxe≤1+O(ε)∑e∋wxe≤1+O(ε).
If you are interested in the details of the proof, please ask...
Mixed Packing and Covering
You may have asked about bipartite matching hoping for an example
of a mixed packing and covering problem, that is, one of the form
∃x? Px≤p;Cx≥c;x≥0.∃x? Px≤p;Cx≥c;x≥0.
Here's one algorithm for such problems.
First, normalize so that p=¯1p=1¯¯¯ and c=¯1c=1¯¯¯.
Let mm be the number of constraints (rows in PP plus rows in CC).
- Initialize all xj=0xj=0. Let N=2ln(m)/εN=2ln(m)/ε.
While Px<NPx<N:
2.1. Choose jj so that the partial derivative of Lmax(Px)(Px) with respect to xjxj is at most the partial derivative of Lmin(Cx)(Cx) with respect to xjxj. (Explicitly, choose jj such that ∑iPijexp(Pix)∑iexp(Pix)≤∑iCijexp(−Cix)∑iexp(−Cix).)∑iPijexp(Pix)∑iexp(Pix)≤∑iCijexp(−Cix)∑iexp(−Cix).)
2.2. Increase xjxj by δδ, where δδ is chosen maximally such that no constraint PixPix or remaining constraint CixCix increases by more than εε.
2.3. Delete all covering constraints ii such that Cix≥NCix≥N.
Return x/maxiPixx/maxiPix.
Assuming the given problem is feasible, the algorithm
returns an xx such that Px≤1Px≤1
and Cx≥1−O(ε)Cx≥1−O(ε).
The number of iterations is O(mln(m)/ε2)O(mln(m)/ε2),
because each iteration increases some constraint by εε,
and this can happen for each constraint at most NN times.
The proof of correctness is via the invariant
Lmax(Px)≤2ln(m)+(1+O(ε))Lmin(Cx).Lmax(Px)≤2ln(m)+(1+O(ε))Lmin(Cx).
The invariant implies
maxPx≤2ln(m)+(1+O(ε))minCx.maxPx≤2ln(m)+(1+O(ε))minCx.
At termination the left-hand side is Ω(log(m)/ε)Ω(log(m)/ε),
proving the performance guarantee.
In Step 2.1, the desired jj must exist as long as the original problem is feasible.
(This is because, for any feasible x∗x∗, and any xx,
if we were to choose a random j′j′ from the distribution x∗/|x∗|x∗/|x∗|,
the expected value of the partial derivative of Lmax(Px)(Px) with respect to xj′xj′
would be at most 1/|x∗|1/|x∗| (see the previous proof sketch for Set Cover).
Likewise, the expected value of the partial derivative of Lmin(Cx)(Cx) with respect to xj′xj′
would be at least 1/|x∗|1/|x∗|.
Thus, there is an jj such that the partial derivative of Lmax(Px)(Px) with respect to xj′xj′
is at most the partial derivative of Lmin(Cx)(Cx).)
Then the invariant is maintained in each iteration because,
by the choice of xjxj and δδ,
and the smoothness of Lmin and Lmax,
increasing xjxj to xj+δxj+δ
increases Lmax(Px)(Px)
by at most 1+O(ε)1+O(ε)
times the increase in Lmin(Cx)(Cx).
Learning (following experts / boosting)
One reference for understanding this connection is
Adaptive game playing using multiplicative weights, by Freund and Schapire.
Here is a quick summary to give the technical idea.
Consider the following repeated game. In each round tt:
- You choose a probability distribution ptpt on [n][n] (the nn so-called experts).
- Knowing ptpt, the adversary then chooses a payoff vector at∈[0,1]nat∈[0,1]n.
- You receive payoff pt⋅atpt⋅at for the round.
The game stops after some number of rounds.
Your goal is to minimize your regret
in comparison to any single expert (i.e., pure strategy) ii.
That is, your goal is to minimize (maxi∑tati)−∑tpt⋅at(maxi∑tati)−∑tpt⋅at.
Fix any ε>0ε>0.
Let vector ytyt denote ε∑s≤tasε∑s≤tas, that is,
εε times the vector sum of the payoff vectors up to time tt.
Recall that G(y)G(y) is the gradient of Lmax(y)(y).
Here's the basic strategy we will analyze:
On round t, choose pt to be G(yt−1).
By inspection, this gives you payoff at⋅G(yt−1) in round t.
Because of the smoothness property of F,
Lmax(yt)≤Lmax(yt−1)+(1+O(ε))εat⋅G(yt−1).
That is, in each round, Lmax(yt) can't increase by more than ε(1+O(ε)) times your payoff.
Since Lmax(¯0)=lnn,
this maintains the invariant that Lmax(yt)
is at most your total payoff times ε(1+O(ε),
plus ln(n).
On the other hand, your regret in comparison to the best expert i
is maxi∑tati, i.e., ε−1maxiyti,
which is in turn at most ε−1Lmax(yt).
Thus, your regret is at most ε−1ln(n), plus O(ε) times your total payoff.
Remark: I think, as Freund and Schapire point out, a "boosting" algorithm (in learning theory) is also implicit in this analysis. See their paper for more details.
Minimizing total payoff
You can derive a similar strategy for the setting where the goal is to minimize,
rather than maximize, the total payoff.
Your regret, which you still want to minimize, is ∑tpt⋅at−miniati.
In that case, the corresponding strategy is to choose pt to be the gradient of Lmin(yt).
With this strategy your regret is again at most
ε−1lnn plus O(ε) times your total payoff.
Connection to Lagrangian-relaxation algorithms
To see the connection to Lagrangian-relaxation algorithms, fix a Set-Cover instance.
Consider the latter type of game (with the goal of minimizing payoff),
where the experts correspond to the elements e of your set system.
In each round, choose the probability distribution pt
to be the gradient of Lmin(yt) as above,
and have the adversary choose the payoff vector at as a function of pt as follows:
choose the set st maximizing ∑e∈spte,
then let ate=1 if e∈st, and ate=0 otherwise.
Given the correct stopping condition (discussed below),
this process gives you exactly the Set-Cover algorithm discussed at the start.
The performance guarantee of the algorithm follows from the regret bound as follows.
Let Xs be the number of times the adversary chose set s during the play.
Let x∗ be the optimal fractional set cover.
Let T=|Xs| be the number of rounds played.
The regret bound implies
∑tat⋅pt≤ε−1ln(m)+mine∑tate.
Using the definition of at, the tth payoff
(the tth term in the sum on the left)
equals ∑e∈stpte.
The adversary chose st to minimize this payoff.
If the adversary had instead chosen st randomly
from the distribution x∗/|x∗|, the expectation of
the payoff would have been
∑sx∗s|x∗|∑e∈spte = 1|x∗|∑epte∑s∋ex∗s ≥ 1|x∗|∑epte = 1|x∗|.
(Above we use that ∑s∋ex∗s≥1 for all e, and |pt|=1.)
Since each payoff is at least 1/|x∗|, the regret bound implies
T|x∗|≤ε−1ln(m)+mine∑tate.
By the definition of X,
we have |X|=T (each round chooses one set),
and ∑tate=∑e[e∈st]=∑s∋eXs,
giving
|X||x∗|≤ε−1ln(m)+mine∑s∋eXs.
We make the process stop when mine∑s∋eXs=Ω(ε−2lnm),
so then (rearranging terms)
|X|mine∑s∋eXs ≤ (1+O(ε)|x∗|.
That is, normalizing X gives a fractional set cover
of size at most (1+O(ε)) times optimum.
Remark: In a sense, this learning theory interpretation
generalizes the algorithmic interpretation. However, some
of the algorithmic techniques necessary for efficiency
(such as non-uniform increments and dropping satisfied
covering constraints) don't seem to carry over into the
learning theory setting naturally. Likewise, algorithms for
mixed packing and covering LPs (e.g. these)
don't seem to have natural analogues in the learning-theory setting.