প্লটকিন-শময়েস-টারদোস এবং অরোরা-কালের সমাধানকারীদের জন্য খেলনা উদাহরণ

আমি বুঝতে চাই যে অরোরা-কেল এসডিপি সলভার প্রায় লিনিয়ার সময়ে গোমানস-উইলিয়ামসন শিথিলকরণের প্রায় কীভাবে, প্লটকিন-শময়েস-টার্ডোস সলভার প্রায় লিনিয়ার সময়ে কীভাবে ভগ্নাংশ "প্যাকিং" এবং "কভারিং" সমস্যাগুলি সান্নিধ্যিত করে এবং কীভাবে অ্যালগরিদমগুলি হ'ল "বিশেষজ্ঞের কাছ থেকে শেখার" বিমূর্ত কাঠামোর ইনস্ট্যান্টেশন।

কালের থিসিসটিতে একটি দুর্দান্ত উপস্থাপনা রয়েছে, তবে আমি সরাসরি বিমূর্ত কাঠামোতে ঝাঁপিয়ে পড়া খুব কঠিন বলে মনে করি এবং আমি একটি সাধারণ সমস্যার উদাহরণ থেকে শুরু করতে পছন্দ করব যার জন্য এটি করা কী একেবারে সুস্পষ্ট, এবং তারপরে আরও সাধারণ সমস্যার দিকে চলে যেতে , ক্রমান্বয়ে অ্যালগরিদম এবং এর বিশ্লেষণে "বৈশিষ্ট্যগুলি" যুক্ত করা হচ্ছে।

উদাহরণ স্বরূপ:

প্লটকিন-শময়েস কীভাবে অপ্রকাশিত ভার্টেক্স কভারের লিনিয়ার প্রোগ্রামিং শিথিলকরণ সমাধান করে? ভার্টেক্স ভার্টেক্স কভার? কভার সেট? বিপরীতে মিলছে?

অরোরা-কালে অ্যালগরিদম আকর্ষণীয় কিছু করছে এমন সহজ উদাহরণটি কী? এটি কোনও গ্রাফের ল্যাপ্লেসিয়ার বৃহত্তম ইগন্যালিউটি কীভাবে গণনা করে?

(ল্যাপ্লেসিয়ার বৃহত্তম ইগেনুয়ালু গণনা হ'ল ম্যাক্স কাটের গোমানস-উইলিয়ামসন এসডিপি শিথিলকরণের একটি দুর্বল সংস্করণ সমাধানের সমস্যার সমতুল্য, যার পরিবর্তে প্রতিটি ভেক্টরের দৈর্ঘ্য এক হওয়ার প্রয়োজনের পরিবর্তে আপনি স্কোয়ারগুলির যোগফল চান) মানদণ্ডের | V |।)

— লুকা ট্রেভিসান
সূত্র

এটি একটি দুর্দান্ত প্রশ্ন।

— সুরেশ ভেঙ্কট

প্যাকিংয়ের সমস্যাগুলির জন্য পিএসটি-স্টাইলের অ্যালগরিদমগুলি বোঝার জন্য পিএসটি যেখান থেকে বিকশিত হয়েছিল সেখানে মাল্টিকোমোডিটি প্রবাহ সমস্যাটি প্রায় সমাধান করার জন্য অ্যালগরিদমগুলি লক্ষ্য করা ভাল। নিল ইয়ংয়ের কাগজগুলি সেট কভারটি বিশদভাবে বর্ণনা করে। Ihttp: //www.cs.ucr.edu/~neal/non_arxiv/SODA_1995_170.pdf। আমি ভেবেছিলাম অরোরা-কালে-হাজানের সমীক্ষা বিশেষজ্ঞদের ফ্রেমওয়ার্ক এবং প্যাকিং / কভারিং সলভারদের মধ্যে স্পষ্ট করে তোলে।

— চন্দ্র চেকুরি

@ চন্দ্রচেকুরী: বরং এটি বিলম্বিত হয়েছে, তবে আমি ভাবছি যে আপনার এটি উত্তর দেওয়া উচিত কিনা?

— সুরেশ ভেঙ্কট

এফডাব্লুআইডাব্লু, কয়েকটি নোটের জন্য সূডা কাগজে @ চন্দ্রচেকুরি উল্লিখিত নোটের জন্য দেখুন greedyalgs.info/blog/about ।

— নিল ইয়ং

আপডেট হওয়া লিঙ্ক: algnotes.info/on/obliv

— নিল ইয়ং

লুকা, যেহেতু এক বছর কেটে গেছে, আপনি সম্ভবত নিজের উত্তরটি নিয়ে গবেষণা করেছেন। আমি এখানে আপনার কয়েকটি প্রশ্নের উত্তর দিচ্ছি কেবল রেকর্ডের জন্য। আপনার উল্লিখিত সমস্যার জন্য আমি কিছু ল্যাঙ্গরজিয়ান-রিলাক্সেশন অ্যালগরিদম পর্যালোচনা করি এবং শেখার সংযোগটি স্কেচ করি (বিশেষত বিশেষজ্ঞের পরামর্শ অনুসারে)। আমি এখানে এসডিপি অ্যালগরিদমে মন্তব্য করি না।

নোট করুন যে নির্দিষ্ট অ্যালগরিদমগুলি আপনি উল্লেখ করেছেন তা প্রায় রৈখিক সময়ে চলবে না। (সেখানে জন্য একটি প্রায় রৈখিক সময় আলগোরিদিম স্পষ্টভাবে দেওয়া প্যাকিং বা সমস্যা আচ্ছাদন। দেখুন ফ্র্যাকশনাল প্যাকিং এবং প্রাবরণ লিনিয়ার প্রোগ্রামের জন্য সিমপ্লেক্স পিটিয়ে ।) আলগোরিদিম আপনি মনে আছে সাধারণত একটি প্রায় রৈখিক সংখ্যা যে run রূপগুলো আছে পুনরাবৃত্তিও , কিন্তু প্রতিটি পুনরাবৃত্তির জন্য সাধারণত কমপক্ষে রৈখিক সময়ও প্রয়োজন। আমি নীচে এই কয়েকটি অ্যালগরিদম নিয়ে আলোচনা করি।

কিছু দরকারী ফাংশন

আমরা শুরু করার আগে, এখানে কয়েকটি ফাংশন রয়েছে যা আমরা প্রমাণ স্কেচগুলিতে ব্যবহার করব। (আপনি আলগোরিদিম আগ্রহী, কিন্তু না এমন প্রমাণ বিবরণ, আপনি এগিয়ে এড়িয়ে যেতে পারেন।) কোন ভেক্টর জন্য $y$ , নির্ধারণ $\mbox{Lmax}(y)$ হতে $\ln \sum_i \exp(y_i)$ । এই ফাংশনটি $\max_i y_i$ উপরের একটি আবদ্ধ :

max i y i \leq Lmax (y) \leq max i y i + ln m .

$\max_i y_i ~\le~ \mbox{Lmax}(y) ~\le~ \max_i y_i + \ln m.$ আনুষাঙ্গিকভাবে, সংজ্ঞায়িত করুন

Lmin(y) $\mbox{Lmin}(y)$ হতে

−Lmax(−y) $-\mbox{Lmax}(-y)$ , উপর একটি নিম্ন আবদ্ধ

miniyi $\min_i y_i$ ।

এরপরে যা সুবিধে হয় তার সুবিধার জন্য, আমরা গ্রেডিয়েন্ট বোঝাতে $g(y)$ ব্যবহার করি। আমরা গ্রেডিয়েন্ট বোঝাতে ব্যবহার করি $\nabla \mbox{Lmin}(y)$ $G(y)$ $\nabla \mbox{Lmax}(y)$ করি।

স্পষ্টভাবে $g_i(y)$ হয় $\exp(-y_i)/\sum_{i'} \exp(-y_{i'})$ যখন $G_i(y)$ হয় $\exp(y_i)/\sum_{i'} \exp(y_{i'})$ ।

লিমিন এবং ল্যামাক্স নিম্নলিখিত অর্থে মসৃণ : কোনও ভেক্টরের জন্য $d\in[0,\varepsilon]^n$ এবং $y\in R^n$ ,

Lmin (y + d) \geq Lmin (y) + (1 - O (ε)) d \cdot g (y)

$\mbox{Lmin}(y+d) ~\ge~ \mbox{Lmin}(y) ~+~ (1-O(\varepsilon))\, d \cdot g(y)$ and

Lmax (y + d) \leq Lmax (y) + (1 + O (ε)) d \cdot G (y) .

$\mbox{Lmax}(y+d) ~\le~ \mbox{Lmax}(y) ~+~ (1+O(\varepsilon))\, d \cdot G(y).$

Note that both of the gradients have 1-norm equal to 1: $|G(y)| = |g(y)| = 1$ . (Throughout we use $|z|$ to denote the 1-norm.)

Note also that, for a matrix $A$ , the gradient of the function $x\mapsto \mbox{Lmin}(Ax)$ with respect to $x$ is (by the chain rule) $(g(Ax))^T A$ . More explicitly, the partial derivative of the function with respect to $x_j$ is $\sum_i A_{ij} \exp(-A_i x) / \sum_i \exp(-A_i x)$ . Likewise, the partial derivative of Lmax $(Ax)$ থেকে সম্মান সঙ্গে $x_j$ হয় $\sum_i A_{ij} \exp(A_i x)/\sum_i \exp(A_i x)$ .

Fractional Set Cover

একটি সেট-কভার উদাহরণ ঠিক করুন। আসুন $A$ উপাদান / সেট ঘটনা ম্যাট্রিক্স বোঝান। সুতরাং, $A_{es} = 1$ যদি $e\in s$ , অন্য 0, এবং $A_e x$ ততটাই যে ভগ্ন কভার হয় $x$ কভার উপাদান $e$ ।

এলপি $\min\{ |x| : A x \ge 1; x \ge 0\}$ । প্রদত্ত $\varepsilon\in (0,1)$ আলগোরিদিম হয়

Initialize all $x_s = 0$ . Let $N=\log(n)/\varepsilon$ .
Repeat until $\min_e A_e x \ge N$ :

2.1. Choose $s$ maximizing the partial derivative of Lmin $(Ax)$ w.r.t. $x_s$ .
(Explicitly, choose $s$ maximizing $\sum_{e\in s} \exp(-\sum_{s'\ni e} x_{s'})$ .)

2.2. Increase $x_s$ by $\varepsilon$ .
Return $x/\min_{e} A_e x$ .

The algorithm returns a $(1+O(\varepsilon))$ approximate solution in $O(|x^*|\log(n)/\varepsilon^2)$ iterations, where $n$ is the number of elements and $x^*$ is the optimal fractional set cover (trivially $|x^*|\le n$ ). (A similar algorithm appears in the paper Chandra mentioned. Vertex Cover is of course a special case.)

(Remark: Note that the iteration bound does not depend on the number of sets, just the number of elements. Thus, the algorithm can be used with an implicitly defined set system, as long as, given weights on the elements, one can efficiently find a set of maximum (or near-maximum) total weight. This kind of oracle is the same as the separation oracle required to apply the ellipsoid algorithm to the dual problem. For packing problems such as set packing, you need an oracle that, given weights on the elements, returns a set minimizing the total weight. For problems such as multi-commodity flow, you might, for example, need to find a path minimizing the sum of some given edge weights.)

Here's a sketch of the proof of the performance guarantee. In each iteration, the partial derivative w.r.t. the chosen $s$ is at least $1/|x^*|$ , where $x^*$ is the optimal fractional set cover.

(To see why, recall that the gradient of Lmin $(Ax)$ with respect to $x$ is $(g(Ax))^T A$ . If we were to choose a set $s'$ at random from the distribution $x^*/|x^*|$ , the expected value of the partial derivative with respect to $x_{s'}$ would thus be $(g(Ax))^T \,A x^*/|x^*|$ . Since $Ax^* \ge 1$ , this is at least $|g(Ax)|/|x^*|$ . Since $|g(Ax)|=1$ , this is at least $1/|x^*|$ . Thus, there must exist some $s$ giving partial derivative at least $1/|x^*|$ . Since the algorithm chooses $x_s$ in each iteration to maximize the partial derivative, it achieves a partial derivative of at least $1/|x^*|$ .)

Then, the step size $\varepsilon$ is chosen just small enough so that no coordinate of $A x$ increases by more than $\varepsilon$ . Thus, because of the smoothness of Lmin, increasing $x_s$ to $x_s+\varepsilon$ increases $\mbox{Lmin}(Ax)$ by at least $(1-O(\varepsilon))\varepsilon/|x^*|$ .

In this way, the algorithm maintains the invariant

Lmin (A x) \geq (1 - O (ε)) | x | / | x * | - ln n .

$\mbox{Lmin}(Ax) \ge (1-O(\varepsilon)) |x|/|x^*| - \ln n.$ (Note that Lmin

(0¯¯¯) $(\overline 0)$ equals

lnn $\ln n$ .)

At termination, in the invariant, the $\ln n$ term is $O(\varepsilon)$ times the left-hand side, so by calculation one gets $\min_e A_e x \ge (1-O(\varepsilon)) |x|/|x^*|$ . After the normalization in the last line of the algorithm, this implies $|x| \le (1+O(\varepsilon))|x^*|$ .

FWIW, the inequalities involved in proving the invariant are essentially the same as those involved in proving the Chernoff bound. (In fact, this algorithm can be derived by applying the method of conditional probabilities to a randomized-rounding scheme that repeatedly samples sets from the distribution $x^*/|x^*|$ (with replacement), increasing $x_s$ for each sampled set $s$ . This derandomization of that gives the algorithm: the underlying invariant is just that the pessimistic estimator stays below 1. The exponential penalties in the pessimistic estimator come from the using the Chernoff bound in the analysis of the rounding scheme. This basic idea is explained further in the paper Chandra mentioned.)

Fractional Weighted Set Cover (and general fractional Covering)

To handle problems such as Weighted Set Cover efficiently, we modify the algorithm to use non-uniform increments (an idea due to Garg and Konemann).

The LP is $\min\{ c\cdot x : (\forall e) \sum_{s\ni e} x_s \ge 1\}$ , where $e$ ranges over the elements, $s$ ranges over the sets, and all variables are non-negative. To present the algorithm, first rewrite the problem as a general covering problem. Let $A_{es} = 1/c_s$ for $e\in s$ and $A_{es} = 0$ otherwise. Then (with a change of variables, scaling each $x_s$ by $c_s$ ), the LP is $\min\{ |x| : A x \ge 1; x \ge 0\}$ , which we can view as a general covering LP. Here is the algorithm:

Initialize all $x_s = 0$ . Let $N=\log(n)/\varepsilon$ .
Repeat until all covering constraints have been deleted:

2.1. Choose $s$ maximizing the partial derivative of Lmin $(Ax)$ w.r.t. $x_s$ .
(Explicitly, choose $s$ maximizing $\sum_{e\in s} \exp({-\sum_{s'\ni e} x_{s'}})/c_s$ .)

2.2. Increase $x_s$ by $\delta$ , where $\delta$ is chosen maximally such that, for every remaining covering constraint $e$ , the increase in $A_e \cdot x$ is at most $\varepsilon$ .

2.3 Delete all covering constraints $e$ such that $A_e\cdot x \ge N$ .
Return $x/\min_e A_e\cdot x$ .

The algorithm returns a $(1+O(\varepsilon))$ -approximate solution in $O(n\log(n)/\varepsilon^2)$ iterations, where $n$ is the number of covering constraints. (Each iteration increases some remaining $A_e x$ by $\varepsilon$ ; this can happen only $N/\varepsilon$ times to a constraint before it is deleted.) The proof of correctness is via essentially the same invariant as for Set Cover.

Weighted Vertex Cover is a special case.

Maximum Fractional Bipartite Matching

Given a graph $G=(U,W,E)$ , the natural LP for the problem is $\max\{|x| : \forall v.\, \sum_{e\ni v} x_e \le 1\}$ .

In matrix representation, this is a packing LP $\max\{|x| : Ax \le 1; x \ge 0\}$ with 0-1 coefficients ( $A_{ve} = 1$ if $v\in e$ ). Such problems do not require non-uniform increments, so a simple algorithm analogous to the unweighted Set Cover algorithm (but for packing) will do:

Initialize all $x_e = 0$ . Let $N=\log(n)/\varepsilon$ .
While $A x < N$ :

2.1. Choose $e$ minimizing the partial derivative of Lmax $(Ax)$ w.r.t. $x_e$ .
(Explicitly, choose $e$ to minimize $\sum_{v\in e} \exp(\sum_{e'\ni v} x_{e'})$ .)

2.2. Increase $x_e$ by $\varepsilon$ .
Return $x/\max_{v} A_v x$ .

The algorithm returns a $(1-O(\varepsilon))$ -approximate solution in $O(n\log(n)/\varepsilon^2)$ iterations. (This is because each iteration increases $|x|$ by $\varepsilon$ , and finally, before normalization, $|x| = O(N n)$ .)

Just for fun, here is a curious alternative algorithm for Perfect Bipartite Matching. Recall that $G=(U,W,E)$ . Let $n=|U|=|W|$ .

Initialize all $x_e = 0$ . Let $N=4\ln(n)/\varepsilon$ .
Repeat $n\,N$ times:

2.1. Choose $u$ uniformly at random from $U$ .
2.2. Choose $w$ such that $(u,w)\in E$ minimizing $\sum_{e\ni w} x_e$ .
2.3. Increase $x_{uw}$ by $\varepsilon$ .
Return $x/N$ .

If $G$ has a perfect matching, the algorithm returns an $x$ such that $|x|=n$ , and, with high probability, for all vertices $u\in U$ , $1-O(\varepsilon) \le \sum_{e\ni u} x_e \le 1+O(\varepsilon)$ , and for all vertices $w\in W$ , $\sum_{e\ni w} x_e \le 1+O(\varepsilon)$ . If you are interested in the details of the proof, please ask...

Mixed Packing and Covering

You may have asked about bipartite matching hoping for an example of a mixed packing and covering problem, that is, one of the form

\exists x ? P x \leq p; C x \geq c; x \geq 0.

$\exists x?~ Px \le p; Cx \ge c; x \ge 0.$ Here's one algorithm for such problems. First, normalize so that

p=1¯¯¯ $p=\overline 1$ and

c=1¯¯¯ $c=\overline 1$ .

Let $m$ be the number of constraints (rows in $P$ plus rows in $C$ ).

Initialize all $x_j = 0$ . Let $N=2\ln(m)/\varepsilon$ .
While $P x < N$ :

2.1. Choose $j$ so that the partial derivative of Lmax $(Px)$ with respect to $x_j$ is at most the partial derivative of Lmin $(Cx)$ with respect to $x_j$ . (Explicitly, choose $j$ such that
$\sum i P i j exp ( P i x ) \sum i exp ( P i x ) \leq \sum i C i j exp ( - C i x ) \sum i exp ( - C i x ) .)$ $\frac{\sum_i P_{ij} \exp(P_i x)}{\sum_{i}\exp(P_i x)} \le \frac{\sum_i C_{ij} \exp(-C_i x)}{\sum_{i}\exp(-C_i x)}.)$

2.2. Increase $x_j$ by $\delta$ , where $\delta$ is chosen maximally such that no constraint $P_i x$ or remaining constraint $C_i x$ increases by more than $\varepsilon$ .

2.3. Delete all covering constraints $i$ such that $C_i x \ge N$ .
Return $x/\max_i P_i x$ .

Assuming the given problem is feasible, the algorithm returns an $x$ such that $Px\le 1$ and $Cx\ge 1-O(\varepsilon)$ . The number of iterations is $O(m\ln(m)/\varepsilon^2)$ , because each iteration increases some constraint by $\varepsilon$ , and this can happen for each constraint at most $N$ times.

The proof of correctness is via the invariant

Lmax (P x) \leq 2 ln (m) + (1 + O (ε)) Lmin (C x) .

$\mbox{Lmax}(Px) \le 2\ln(m) + (1+O(\varepsilon)) \mbox{Lmin}(Cx).$ The invariant implies

max P x \leq 2 ln (m) + (1 + O (ε)) min C x .

$\max Px \le 2\ln(m) + (1+O(\varepsilon)) \min Cx.$ At termination the left-hand side is

Ω(log(m)/ε) $\Omega(\log(m)/\varepsilon)$ , proving the performance guarantee.

In Step 2.1, the desired $j$ must exist as long as the original problem is feasible. (This is because, for any feasible $x^*$ , and any $x$ , if we were to choose a random $j'$ from the distribution $x^*/|x^*|$ , the expected value of the partial derivative of Lmax $(Px)$ with respect to $x_{j'}$ would be at most $1/|x^*|$ (see the previous proof sketch for Set Cover). Likewise, the expected value of the partial derivative of Lmin $(Cx)$ with respect to $x_{j'}$ would be at least $1/|x^*|$ . Thus, there is an $j$ such that the partial derivative of Lmax $(Px)$ with respect to $x_{j'}$ is at most the partial derivative of Lmin $(Cx)$ .)

Then the invariant is maintained in each iteration because, by the choice of $x_j$ and $\delta$ , and the smoothness of Lmin and Lmax, increasing $x_j$ to $x_j+\delta$ increases Lmax $(Px)$ by at most $1+O(\varepsilon)$ times the increase in Lmin $(Cx)$ .

Learning (following experts / boosting)

One reference for understanding this connection is Adaptive game playing using multiplicative weights, by Freund and Schapire. Here is a quick summary to give the technical idea.

Consider the following repeated game. In each round $t$ :

You choose a probability distribution $p^t$ on $[n]$ (the $n$ so-called experts).
Knowing $p^t$ , the adversary then chooses a payoff vector $a^t\in [0,1]^{n}$ .
You receive payoff $p^t\cdot a^t$ for the round.

The game stops after some number of rounds. Your goal is to minimize your regret in comparison to any single expert (i.e., pure strategy) $i$ . That is, your goal is to minimize $(\max_i \sum_t a^t_i) - \sum_t p^t\cdot a^t$ .

Fix any $\varepsilon>0$ . Let vector $y^t$ denote $\varepsilon \sum_{s \le t} a^s$ , that is, $\varepsilon$ times the vector sum of the payoff vectors up to time $t$ . Recall that $G(y)$ is the gradient of Lmax $(y)$ .

Here's the basic strategy we will analyze: On round $t$ , choose $p^t$ to be $G(y^{t-1})$ .

By inspection, this gives you payoff $a^t \cdot G(y^{t-1})$ in round $t$ .

Because of the smoothness property of $F$ ,

$\mbox{Lmax}(y^t) \le \mbox{Lmax}(y^{t-1}) + (1+O(\varepsilon)) \varepsilon a^t \cdot G(y^{t-1}).$ That is, in each round,

$\mbox{Lmax}(y^t)$ can't increase by more than

$\varepsilon(1+O(\varepsilon))$ times your payoff. Since

$\mbox{Lmax}(\overline 0) = \ln n$ , this maintains the invariant that

$\mbox{Lmax}(y^t)$ is at most your total payoff times

$\varepsilon(1+O(\varepsilon)$ , plus

$\ln(n)$ . On the other hand, your regret in comparison to the best expert

$i$ is

$\max_i \sum_t a^t_i$ , i.e.,

$\varepsilon^{-1} \max_i y^t_i$ , which is in turn at most

$\varepsilon^{-1} \mbox{Lmax}(y^t)$ .

Thus, your regret is at most $\varepsilon^{-1} \ln(n)$ , plus $O(\varepsilon)$ times your total payoff.

Remark: I think, as Freund and Schapire point out, a "boosting" algorithm (in learning theory) is also implicit in this analysis. See their paper for more details.

Minimizing total payoff

You can derive a similar strategy for the setting where the goal is to minimize, rather than maximize, the total payoff. Your regret, which you still want to minimize, is $\sum_t p^t\cdot a^t - \min_i a^t_i$ . In that case, the corresponding strategy is to choose $p^t$ to be the gradient of $\mbox{Lmin}(y^t)$ . With this strategy your regret is again at most $\varepsilon^{-1} \ln n$ plus $O(\varepsilon)$ times your total payoff.

Connection to Lagrangian-relaxation algorithms

To see the connection to Lagrangian-relaxation algorithms, fix a Set-Cover instance. Consider the latter type of game (with the goal of minimizing payoff), where the experts correspond to the elements $e$ of your set system. In each round, choose the probability distribution $p^t$ to be the gradient of Lmin $(y^t)$ as above, and have the adversary choose the payoff vector $a^t$ as a function of $p^t$ as follows: choose the set $s^t$ maximizing $\sum_{e\in s} p^t_e$ , then let $a^t_e = 1$ if $e\in s^t$ , and $a^t_e = 0$ otherwise.

Given the correct stopping condition (discussed below), this process gives you exactly the Set-Cover algorithm discussed at the start.

The performance guarantee of the algorithm follows from the regret bound as follows. Let $X_s$ be the number of times the adversary chose set $s$ during the play. Let $x^*$ be the optimal fractional set cover. Let $T=|X_s|$ be the number of rounds played. The regret bound implies

$\textstyle \sum_t a^t\cdot p^t \le \varepsilon^{-1}\ln(m) + \min_e \sum_t a_e^t.$

Using the definition of $a^t$ , the $t$ th payoff (the $t$ th term in the sum on the left) equals $\sum_{e\in s^t} p^t_e$ . The adversary chose $s^t$ to minimize this payoff. If the adversary had instead chosen $s^t$ randomly from the distribution $x^*/|x^*|$ , the expectation of the payoff would have been

$\sum_s \frac{x^*_s}{|x^*|} \sum_{e\in s} p^t_e ~=~ \frac{1}{|x^*|} \sum_e p^t_e \sum_{s\ni e} x^*_s ~\ge~ \frac{1}{|x^*|} \sum_e p^t_e ~=~ \frac{1}{|x^*|}.$ (Above we use that

$\sum_{s\ni e} x^*_s \ge 1$ for all

$e$ , and

$|p^t| = 1$ .) Since each payoff is at least

$1/|x^*|$ , the regret bound implies

$\frac{T}{|x^*|} \le \varepsilon^{-1}\ln(m) + \min_e \sum_t a_e^t.$ By the definition of

$X$ , we have

$|X| = T$ (each round chooses one set), and

$\sum_t a_e^t = \sum_e [e\in s^t] = \sum_{s\ni e} X_s$ , giving

$\frac{|X|}{|x^*|} \le \varepsilon^{-1}\ln(m) +\min_e \sum_{s\ni e} X_s.$ We make the process stop when

$\min_e \sum_{s\ni e} X_s = \Omega(\varepsilon^{-2}\ln m)$ , so then (rearranging terms)

$\frac{|X|}{\min_e \sum_{s\ni e} X_s}~ \le~ (1+O(\varepsilon)|x^*|.$ That is, normalizing

$X$ gives a fractional set cover of size at most

$(1+O(\varepsilon))$ times optimum.

Remark: In a sense, this learning theory interpretation generalizes the algorithmic interpretation. However, some of the algorithmic techniques necessary for efficiency (such as non-uniform increments and dropping satisfied covering constraints) don't seem to carry over into the learning theory setting naturally. Likewise, algorithms for mixed packing and covering LPs (e.g. these) don't seem to have natural analogues in the learning-theory setting.

— Neal Young
সূত্র

That is quite the answer !!

— Suresh Venkat

Thanks. Probably overdid it. Am interested in feedback: how to present these ideas accessibly, what else to include...

— Neal Young