রাজনৈতিক পোলে কেন এত বড় নমুনার আকার থাকে?

32

আমি যখন সংবাদটি দেখি তখন আমি খেয়াল করেছি যে রাষ্ট্রপতি নির্বাচনের মতো জিনিসের গ্যালাপ পোলে [আমি এলোমেলোভাবে ধরে নিচ্ছি] নমুনা আকারের এক হাজারেরও বেশি রয়েছে। কলেজের পরিসংখ্যানগুলি থেকে আমি যা মনে করি তা থেকে 30 টির একটি নমুনা আকারটি "উল্লেখযোগ্যভাবে বড়" নমুনা ছিল। এটি দেখে মনে হয়েছিল যে 30 টিরও বেশি একটি নমুনা আকার হ্রাস রিটার্নের কারণে অর্থহীন।

sampling sample-size power-analysis

— samplesize999
সূত্র

9

অবশেষে, কেউ এখানে বিগ ডেটা সম্রাটের নতুন পোশাক সম্পর্কে কথা বলতে পারেন। যদি আপনি কলেজের পরিসংখ্যানের নমুনা আকারের 30

— টির

1

স্টাস্ক, এটি হাসিখুশি।

— অ্যারন হল

শ্রেষ্ঠ মন্তব্য @StasK

— ব্রেনান

36

ওয়েন "30" ইস্যুটি যথেষ্ট ভালভাবে সম্বোধন করেছেন (আমার নিজস্ব থাম্বের নিয়ম: পরিসংখ্যানের ক্ষেত্রে 30 নম্বরটি উল্লেখ করা ভুল হতে পারে)।

1000 এর আশেপাশে সংখ্যাগুলি কেন প্রায়শই ব্যবহৃত হয়

প্রায় 1000-2000 সংখ্যা প্রায়ই এমনকি একটি সহজ অনুপাত ক্ষেত্রে, সার্ভে ব্যবহার করা হয় ( " আপনি পক্ষে কি সেখানে মত যাই হোক না কেন $<$ $>$ ?")।

এটি করা হয়েছে যাতে অনুপাতের যুক্তিসঙ্গত সঠিক অনুমান পাওয়া যায়।

দ্বিপদ স্যাম্পলিং অধিকৃত হয় তাহলে, নমুনা অনুপাত আদর্শ ত্রুটি * বৃহত্তম যখন অনুপাত হয় - তবে সেই উচ্চতর সীমাটি এখনও প্রায় 25% থেকে 75% এর মধ্যে অনুপাতের জন্য বেশ ভাল অনুমান। $\frac{1}{2}$

* "স্ট্যান্ডার্ড ত্রুটি" = "বিতরণের মানক বিচ্যুতি"

একটি সাধারণ লক্ষ্য মধ্যে প্রায় শতকরা অনুমান হয় সত্য শতাংশ, প্রায় সময়। এই কে ' মার্জিন অফ ত্রুটি ' বলা হয় । $\pm 3\%$ $95\%$ $3\%$

দ্বিপদী স্যাম্পলিংয়ের অধীনে সেই 'সবচেয়ে খারাপ ক্ষেত্রে' স্ট্যান্ডার্ড ত্রুটি, এটি বাড়ে:

$1.96 \times \sqrt{\frac{1}{2}\cdot(1-\frac{1}{2})/n} \leq 0.03$

$0.98 \times \sqrt{1/n} \leq 0.03$

$\sqrt{n} \geq 0.98/0.03$

$n \geq 1067.11$

... বা '1000 এর চেয়ে কিছুটা বেশি'।

সুতরাং আপনি যদি জনসংখ্যার তুলনায় 1000 জন লোককে এলোমেলোভাবে জরিপ করতে চান যার বিষয়ে আপনি সূচনা করতে চান এবং 58% নমুনা প্রস্তাবটিকে সমর্থন করে তবে আপনি নিশ্চিতভাবে নিশ্চিত হতে পারেন যে জনসংখ্যার অনুপাত 55% থেকে 61% এর মধ্যে রয়েছে।

(কখনও কখনও ত্রুটির মার্জিনের জন্য অন্যান্য মান যেমন 2.5% ব্যবহার করা যেতে পারে error যদি আপনি ত্রুটির মার্জিন অর্ধেক করেন তবে নমুনার আকারটি 4 এর একাধিক দিয়ে যায়)

জটিল জরিপগুলিতে যেখানে কিছু উপ-জনগোষ্ঠীর একটি অনুপাতের সঠিক অনুমানের প্রয়োজন হয় (যেমন প্রস্তাবের পক্ষে টেক্সাস থেকে কালো কলেজের স্নাতকদের অনুপাত), সংখ্যাগুলি এত বড় হতে পারে যে উপগোষ্ঠীটি কয়েকশো আকারের, সম্ভবত মোট হাজার হাজার প্রতিক্রিয়া অন্তর্ভুক্ত।

যেহেতু এটি দ্রুত অযৌক্তিক হয়ে উঠতে পারে, তাই জনসংখ্যাকে উপ-জনসংখ্যায় (স্তর) বিভক্ত করা এবং প্রতিটিকে আলাদাভাবে নমুনা দেওয়া সাধারণ। তবুও, আপনি কিছু খুব বড় সমীক্ষা দিয়ে শেষ করতে পারেন।

এটি দেখে মনে হয়েছিল যে 30 টিরও বেশি একটি নমুনা আকার হ্রাস রিটার্নের কারণে অর্থহীন।

এটি প্রভাবের আকার এবং আপেক্ষিক পরিবর্তনশীলতার উপর নির্ভর করে। উপর প্রভাবের অর্থ আপনার কিছু পরিস্থিতিতে বড় আকারের কয়েকটি নমুনার প্রয়োজন হতে পারে। $\sqrt n$

আমি এখানে একটি প্রশ্নের উত্তর দিয়েছি (আমি মনে করি এটি ইঞ্জিনিয়ারের কাছ থেকে এসেছিল) যা খুব বড় নমুনা আকারের সাথে ডিল করছে (যদি আমি মনে করি সঠিকভাবে লক্ষ লক্ষের আশেপাশে) তবে তিনি খুব ছোট প্রভাবের সন্ধান করছেন।

আসুন দেখুন একটি নমুনা অনুপাত অনুমান করার সময় 30 টির একটি নমুনা আকারের একটি এলোমেলো নমুনা কী আমাদের ছেড়ে যায়।

কল্পনা করুন আমরা 30 জনকে জিজ্ঞাসা করব যে তারা সামগ্রিকভাবে তারা রাজ্যের ইউনিয়ন ঠিকানার অনুমোদন দিয়েছে কিনা (দৃ strongly়ভাবে সম্মত, সম্মত, অসম্মতি, দৃ strongly়ভাবে অসম্মতি)। আরও কল্পনা করুন যে আগ্রহটি সেই অনুপাতের মধ্যে রয়েছে যা হয় সম্মত হয় বা দৃ strongly়ভাবে সম্মত হয়।

সাক্ষাত্কার গ্রহণকারীদের মধ্যে 11 জন মোট 16 টির জন্য সম্মত এবং 5 টি দৃ strongly়ভাবে সম্মত হন বলে Say

16/30 প্রায় 53%। জনসংখ্যার অনুপাতের জন্য আমাদের সীমাগুলি কী (একটি 95% ব্যবধান বলতে)?

আমাদের অনুমানগুলি ধরে রাখলে আমরা জনসংখ্যার অনুপাতকে 35% থেকে 71% (মোটামুটি) এর মধ্যে কোথাও পিন করতে পারি।

সব যে দরকারী।

— গ্লেন_বি -রিনস্টেট মনিকা
সূত্র

+1 টি। পুরো উত্তরটি দুর্দান্ত, তবে প্রথম লাইনটি নিজেই একটি মূল্যবান মূল্য ছিল।

— ম্যাট ক্রাউস

1

এবং তারপরে অবশ্যই আপনি গণনাটি বিপরীত করতে পারেন এবং 30 এর একটি নমুনা সহ ত্রুটির মার্জিন গণনা করতে পারেন ...

— Calimo

আপনার শেষ অনুচ্ছেদটি যেখানে স্ট্র্যাটেড নমুনা আসে, আমি বিশ্বাস করি। অন্যরা যেমন বলেছে, যোগ্য ভোটারদের জনসংখ্যার থেকে সরল এলোমেলো নমুনা সত্যিই জাতীয় পর্যায়ে করা হয় না।

— ওয়েইন

@ ওয়েইন ধন্যবাদ; আমি ফিরে গিয়ে শেষে কিছুটা যুক্ত করেছি।

— গ্লেন_বি -রিনস্টেট মনিকা

2

+1, এবং আমি আপনার থাম্ব নিয়মের বিপরীত প্রভাবগুলিও পছন্দ করি।

— জেমস স্ট্যানলি

10

থাম্বের সেই বিশেষ নিয়মটি সুপারিশ করে যে 30 টি পয়েন্টগুলি ধরে নেওয়া যথেষ্ট যে ডেটা সাধারণত বিতরণ করা হয় (যেমন, একটি বেল কার্ভের মতো দেখায়) তবে এটি সর্বোপরি একটি মোটামুটি গাইডলাইন। যদি এটি গুরুত্বপূর্ণ হয় তবে আপনার ডেটা পরীক্ষা করুন! এটি প্রস্তাব দেয় যে আপনি যদি আপনার বিশ্লেষণগুলি এই অনুমানগুলির উপর নির্ভর করে তবে আপনার পোলের জন্য কমপক্ষে 30 জন উত্তরদাতাকে চান but তবে অন্যান্য কারণও রয়েছে।

একটি প্রধান কারণ "প্রভাবের আকার"। বেশিরভাগ দৌড় মোটামুটি কাছাকাছি থাকে তাই নির্ভরযোগ্যভাবে এই পার্থক্যগুলি সনাক্ত করতে মোটামুটি বড় নমুনাগুলি প্রয়োজন। (আপনি যদি "সঠিক" নমুনার আকার নির্ধারণ করতে আগ্রহী হন তবে আপনার পাওয়ার বিশ্লেষণে নজর দেওয়া উচিত )। যদি আপনার কাছে প্রায় ৫০:৫০ বার্নোল্লি র্যান্ডম ভেরিয়েবল (দুটি ফলাফলের সাথে কিছু) পাওয়া যায়, তবে স্ট্যান্ডার্ড ত্রুটি 1.5% এ নেমে আসতে আপনার প্রায় 1000 ট্রায়াল দরকার। এটি সম্ভবত কোনও জাতিটির ফলাফলের পূর্বাভাস দেওয়ার পক্ষে যথেষ্ট সঠিক (সর্বশেষ 4 মার্কিন রাষ্ট্রপতি নির্বাচনের গড় পরিধি ছিল 2 3.2 শতাংশ), যা আপনার পর্যবেক্ষণকে সুন্দরভাবে মেলে।

জরিপের তথ্য প্রায়শই বিভিন্ন টুকরো টুকরো করে কাটা হয়: "প্রার্থী কি 75 বছরেরও বেশি বয়সী পুরুষদের নিয়ে নেতৃত্ব দিচ্ছেন?" বা যাই হোক না কেন. এটি আরও বৃহত্তর নমুনা প্রয়োজন কারণ প্রতিটি উত্তরদাতা এই বিভাগগুলির মধ্যে কয়েকটিতে ফিট করে।

রাষ্ট্রপতি জরিপগুলি কখনও কখনও অন্যান্য জরিপ প্রশ্নগুলির (যেমন, কংগ্রেসনাল রেস) সাথেও "বান্ডিল" হয়। যেহেতু এগুলি রাজ্য থেকে পৃথক হয়ে যায়, তাই কিছু "অতিরিক্ত" পোলিংয়ের ডেটা দিয়ে শেষ হয়।

বের্নুলির ডিস্ট্রিবিউশন মাত্র দুটি ফলাফলের সঙ্গে বিযুক্ত সম্ভাব্যতা ডিস্ট্রিবিউশন আছেন: বিকল্প 1 সম্ভাব্যতা সঙ্গে নির্বাচিত

, যখন বিকল্প 2 সম্ভাব্যতা সঙ্গে নির্বাচিত

।

p

$p$

1 - p

$1-p$

বার্নৌলি বিতরণের বৈচিত্রটি হ'ল , সুতরাং গড়টির স্ট্যান্ডার্ড ত্রুটি $p(1-p)$ । প্লাগ ইন করুন(নির্বাচনটি একটি টাই), স্ট্যান্ডার্ড ত্রুটি 1.5% (0.015) এ সেট করুন এবং সমাধান করুন। 1.5% এসই এ পেতে আপনার 1,111 বিষয় পেতে হবে $\sqrt{\frac{p(1-p)}{n}}$ $p=0.5$

— ম্যাট ক্রাউস
সূত্র

4

+1, তবে, "30 পয়েন্টগুলি ডেটা সাধারণত বিতরণ করা হয় তা ধরে নেওয়া যথেষ্ট" সত্য নয়। এটি ভাল হতে পারে যে লোকেরা এটি বিশ্বাস করে তবে সিএলটি-র নমুনা বিতরণকে পর্যাপ্ত পরিমাণে একটি স্বাভাবিক রূপান্তরিত করতে কতটা ডেটা প্রয়োজন তা ডেটা বিতরণের প্রকৃতির উপর নির্ভর করে ( এখানে দেখুন )। পরিবর্তে, 30 (প্রায় হতে পারে) যদি ডেটা ইতিমধ্যে স্বাভাবিক থাকে তবে এসডি একই ডেটা সেট (সিএফ, টি-বিতরণ) থেকে অনুমান করা হয়।

— গুং - মনিকা পুনরায়

@ গুং, সম্পূর্ণরূপে সম্মত, কিন্তু আমি খুব বেশি রেলপথ যেতে চাইনি। বিন্দুটিকে আরও দৃ strongly়রূপে করা উচিত যদি আপনি মনে করেন তবে আরও সম্পাদনা করুন।

— ম্যাট ক্রাউস

8

ইতিমধ্যেই এই প্রশ্নের কিছু চমৎকার উত্তর আছে, কিন্তু আমি উত্তর চাই কেন , মান ত্রুটি কি এটা কেন আমরা ব্যবহার সবচেয়ে খারাপ ক্ষেত্রে যেমন, এবং কিভাবে মান ত্রুটি সঙ্গে পরিবর্তিত হয় । $p = 0.5$ $n$

ধরা যাক আমরা মাত্র একটি ভোটারের সমীক্ষা নিই, আসুন তাকে বা তার ভোটারকে ফোন করুন এবং জিজ্ঞাসা করুন "আপনি কি বেগুনি দলের পক্ষে ভোট দিবেন?" আমরা উত্তরটি "হ্যাঁ" এর জন্য 1 এবং "" না "হিসাবে কোড করতে পারি। ধরা যাক যে "হ্যাঁ" এর সম্ভাবনা । আমাদের কাছে এখন বাইনারি র‌্যান্ডম ভেরিয়েবল যা সম্ভাব্যতা সহ 1 এবং সম্ভাব্যতা সহ 0 হয় । আমরা বলি যে হল সাফল্যের সম্ভাবনার সাথে একটি বার্নুইলি পরিবর্তনশীল , যা আমরা লিখতে পারি $p$ $X_1$ $p$ $1-p$ $X_1$ $p$ $X_1 \sim Bernouilli(p)$ । প্রত্যাশিত, অথবা গড়, মান দেওয়া হয় যেখানে আমরা উপর যোগফল সব সম্ভব ফলাফলের এর । তবে কেবল দুটি ফলাফল রয়েছে, 0 সম্ভাব্যতার সাথে এবং 1 সম্ভাব্যতা সহ 1 , সুতরাং যোগফলটি কেবল $X_1$ $\mathbb{E}(X_1)=\sum{xP(X_1=x)}$ $x$ $X_1$ $1-p$ $p$ । থাম এবং চিন্তা কর. এটি আসলে পুরোপুরি যুক্তিসঙ্গত বলে মনে হচ্ছে - যদি বেগুনি পার্টির সমর্থনকারী ভোটার 1 এর 30% সম্ভাবনা থাকে এবং আমরা "হ্যাঁ" বলি এবং "না" বললে 0 বলে ভেরিয়েবল 1 হয়ে যাব, তবে আমরা চাই গড়ে 0.3 হওয়ারআশা করে expect $\mathbb{E}(X_1)=0(1-p)+1(p)=p$ $X_1$

এর মনে কি আমরা বর্গ ঘটে যাক । যদি তবে এবং যদি তবে । তাই আসলে উভয় ক্ষেত্রে। যেহেতু তারা একই, তাই তাদের অবশ্যই একই প্রত্যাশিত মান থাকতে হবে, তাই । এটি আমার একটি Bernouilli পরিবর্তনশীল ভ্যারিয়েন্স গণক এর একটি সহজ উপায় দেয়: আমি ব্যবহার $X_1$ $X_1 = 0$ $X_1^2 = 0$ $X_1 = 1$ $X_1^2 = 1$ $X_1^2 = X_1$ $\mathbb{E}(X_1^2)=p$ $Var(X_1)=\mathbb{E}(X_1^2)-\mathbb{E}(X_1)^2=p - p^2 = p(1-p)$ and so the standard deviation is $\sigma_{X_1}=\sqrt{p(1-p)}$ .

Obviously I want to talk to other voters - lets call them voter 2, voter 3, through to voter $n$ . Let's assume they all have the same probability $p$ of supporting the Purple Party. Now we have $n$ Bernouilli variables, $X_1$ , $X_2$ through to $X_n$ , with each $X_i \sim Bernoulli(p)$ for $i$ from 1 to $n$ . They all have the same mean, $p$ , and variance, $p(1-p)$ .

$X_i$ $X=\sum_{i=1}^{n}X_i$ . I can calculate the mean or expected value of $X$ by using the rule that $\mathbb{E}(X+Y)=\mathbb{E}(X)+\mathbb{E}(Y)$ if those expectations exist, and extending that to $\mathbb{E}(X_1+X_2+\ldots+X_n)=\mathbb{E}(X_1)+\mathbb{E}(X_2)+\ldots+\mathbb{E}(X_n)$ . But I am adding up $n$ of those expectations, and each is $p$ , so I get in total that $\mathbb{E}(X)=np$ . Stop and think. If I poll 200 people and each has a 30% chance of saying they support the Purple Party, of course I'd expect 0.3 x 200 = 60 people to say "yes". So the $np$ formula looks right. Less "obvious" is how to handle the variance.

There is a rule that says

V a r (X_{1} + X_{2} + \dots + X_{n}) = V a r (X_{1}) + V a r (X_{2}) + \dots + V a r (X_{n})

$Var(X_1+X_2+\ldots+X_n)=Var(X_1)+Var(X_2)+\ldots+Var(X_n)$ but I can only use it if my random variables are independent of each other. So fine, let's make that assumption, and by a similar logic to before I can see that

V a r (X) = n p (1 - p)

$Var(X)=np(1-p)$ . If a variable

X

$X$ is the sum of

n

$n$ independent Bernoulli trials, with identical probability of success

p

$p$ , then we say that

X

$X$ has a binomial distribution,

X \sim B i n o m i a l (n, p)

$X \sim Binomial(n,p)$ . We have just shown that the mean of such a binomial distribution is

n p

$np$ and the variance is

n p (1 - p)

$np(1-p)$ .

Our original problem was how to estimate $p$ from the sample. The sensible way to define our estimator is $\hat{p}=X/n$ . For instance of 64 out of our sample of 200 people said "yes", we'd estimate that 64/200 = 0.32 = 32% of people say they support the Purple Party. You can see that $\hat{p}$ is a "scaled-down" version of our total number of yes-voters, $X$ . That means it is still a random variable, but no longer follows the binomial distribution. We can find its mean and variance, because when we scale a random variable by a constant factor $k$ then it obeys the following rules: $\mathbb{E}(kX)=k\mathbb{E}(X)$ (so the mean scales by the same factor $k$ ) and $Var(kX)=k^2 Var(X)$ . Note how variance scales by $k^2$ . That makes sense when you know that in general, the variance is measured in the square of whatever units the variable is measured in: not so applicable here, but if our random variable had been a height in cm then the variance would be in $cm^2$ which scale differently - if you double lengths, you quadruple area.

Here our scale factor is $\frac{1}{n}$ . This gives us $\mathbb{E}(\hat{p})=\frac{1}{n}\mathbb{E}(X)=\frac{np}{n}=p$ . This is great! On average, our estimator $\hat{p}$ is exactly what it "should" be, the true (or population) probability that a random voter says that they will vote for the Purple Party. We say that our estimator is unbiased. But while it is correct on average, sometimes it will be too small, and sometimes too high. We can see just how wrong it is likely to be by looking at its variance. $Var(\hat{p})=\frac{1}{n^2}Var(X)=\frac{np(1-p)}{n^2}=\frac{p(1-p)}{n}$ . The standard deviation is the square root, $\sqrt{\frac{p(1-p)}{n}}$ , and because it gives us a grasp of how badly our estimator will be off (it is effectively a root mean square error, a way of calculating the average error that treats positive and negative errors as equally bad, by squaring them before averaging out), it is usually called the standard error. A good rule of thumb, which works well for large samples and which can be dealt with more rigorously using the famous Central Limit Theorem, is that most of the time (about 95%) the estimate will be wrong by less than two standard errors.

Since it appears in the denominator of the fraction, higher values of $n$ - bigger samples - make the standard error smaller. That is great news, as if I want a small standard error I just make the sample size big enough. The bad news is that $n$ is inside a square root, so if I quadruple the sample size, I will only halve the standard error. Very small standard errors are going to involve very very large, hence expensive, samples. There's another problem: if I want to target a particular standard error, say 1%, then I need to know what value of $p$ to use in my calculation. I might use historic values if I have past polling data, but I would like to prepare for the worst possible case. Which value of $p$ is most problematic? A graph is instructive.

graph of sqrt(p(1-p))

The worst-case (highest) standard error will occur when $p=0.5$ . To prove that I could use calculus, but some high school algebra will do the trick, so long as I know how to "complete the square".

\sqrt{p (1 - p)} = \sqrt{p - p^{2}} = \sqrt{\frac{1}{4} - (p^{2} - p + \frac{1}{4})} = \sqrt{\frac{1}{4} - (p - \frac{1}{2})^{2}}

$\sqrt{p(1-p)}=\sqrt{p-p^2}=\sqrt{\frac{1}{4}-(p^2-p+\frac{1}{4})}=\sqrt{\frac{1}{4}-(p-\frac{1}{2})^2}$

The expression is the brackets is squared, so will always return a zero or positive answer, which then gets taken away from a quarter. In the worst case (large standard error) as little as possible gets taken away. I know the least that can be subtracted is zero, and that will occur when $p-\frac{1}{2}=0$ , so when $p=\frac{1}{2}$ . The upshot of this is that I get bigger standard errors when trying to estimate support for e.g. political parties near 50% of the vote, and lower standard errors for estimating support for propositions which are substantially more or substantially less popular than that. In fact the symmetry of my graph and equation show me that I would get the same standard error for my estimates of support of the Purple Party, whether they had 30% popular support or 70%.

So how many people do I need to poll to keep the standard error below 1%? This would mean that, the vast majority of the time, my estimate will be within 2% of the correct proportion. I now know that the worst case standard error is $\sqrt{\frac{0.25}{n}}=\frac{0.5}{\sqrt{n}} < 0.01$ which gives me $\sqrt{n} > 50$ and so $n > 2500$ . That would explain why you see polling figures in the thousands.

In reality low standard error is not a guarantee of a good estimate. Many problems in polling are of a practical rather than theoretical nature. For instance, I assumed that the sample was of random voters each with same probability $p$ , but taking a "random" sample in real life is fraught with difficulty. You might try telephone or online polling - but not only has not everybody got a phone or internet access, but those who don't may have very different demographics (and voting intentions) to those who do. To avoid introducing bias to their results, polling firms actually do all kinds of complicated weighting of their samples, not the simple average $\frac{\sum{X_i}}{n}$ that I took. Also, people lie to pollsters! The different ways that pollsters have compensated for this possibility is, obviously, controversial. You can see a variety of approaches in how polling firms have dealt with the so-called Shy Tory Factor in the UK. One method of correction involved looking at how people voted in the past to judge how plausible their claimed voting intention is, but it turns out that even when they're not lying, many voters simply fail to remember their electoral history. When you've got this stuff going on, there's frankly very little point getting the "standard error" down to 0.00001%.

To finish, here are some graphs showing how the required sample size - according to my simplistic analysis - is influenced by the desired standard error, and how bad the "worst case" value of $p=0.5$ is compared to the more amenable proportions. Remember that the curve for $p=0.7$ would be identical to the one for $p=0.3$ due to the symmetry of the earlier graph of $\sqrt{p(1-p)}$

Graph of required sample sizes for different desired standard errors

— Silverfish
সূত্র

log10 scale in the y-axis might help here.

— EngrStudent - Reinstate Monica

7

The "at least 30" rule is addressed in another posting on Cross Validated. It's a rule of thumb, at best.

When you think of a sample that's supposed to represent millions of people, you're going to have to have a much larger sample than just 30. Intuitively, 30 people can't even include one person from each state! Then think that you want to represent Republicans, Democrats, and Independents (at least), and for each of those you'll want to represent a couple of different age categories, and for each of those a couple of different income categories.

With only 30 people called, you're going to miss huge swaths of the demographics you need to sample.

EDIT2: [I've removed the paragraph that abaumann and StasK objected to. I'm still not 100% persuaded, but especially StasK's argument I can't disagree with.] If the 30 people are truly selected completely at random from among all eligible voters, the sample would be valid in some sense, but too small to let you distinguish whether the answer to your question was actually true or false (among all eligible voters). StasK explains how bad it would be in his third comment, below.

EDIT: In reply to samplesize999's comment, there is a formal method for determining how large is large enough, called "power analysis", which is also described here. abaumann's comment illustrates how there is a tradeoff between your ability to distinguish differences and the amount of data you need to make a certain amount of improvement. As he illustrates, there's a square root in the calculation, which means the benefit (in terms of increased power) grows more and more slowly, or the cost (in terms of how many more samples you need) grows increasingly rapidly, so you want enough samples, but not more.

— Wayne
সূত্র

2

"The whole point of a sample -- it's entire validity -- is that it reflects the population, not that it's random." That is patently wrong! Validity (in the sense of generalizability) stems exactly from the random character of the sampling procedure. The case is rather that since you are interested in very small margins, you need a precise estimate, necessitating a large sample size.

— abaumann

3

@abaumann: As far as I understand things, there's no magic in randomization: it is just the most objective way we have for creating samples that are reflective of the population. That's why we may use randomization within strata, or use stratification and weighting to attempt to compensate for not-so-great randomization.

— Wayne

2

samplesize: This has little or nothing to do with being an "expert." For instance, US presidential candidates run weekly and daily "tracking polls" during their campaigns and these only survey about 200-300 people. These sample sizes provide an adequate balance of cost and information. At another extreme, certain health related studies like NHANES enroll tens or hundreds of thousands of people because that is needed to produce actionable information of such high value that the enormous costs of these studies become worthwhile. In both cases experts are determining the sample sizes.

— whuber

2

Technically, the generalization will be valid if the sample is representative of the population. The idea is that having a random sample guarantees the sample will be representative, but that this is harder (not necessarily impossible) to achieve if the sample is not random. FWIW, no poll uses simple random sampling.

— gung - Reinstate Monica

1

@sashkello, there is a middle ground: one could use a stratified random sample (essentially your option #1), or attempt to reweight/benchmark the sample afterward. Like Gung, I think most big surveys do something more complex than a simple random sample

— Matt Krause

0

A lot of great answers have already been posted. Let me suggest a different framing that yields the same response, but could further drive intuition.

Just like @Glen_b, let's assume we require at least 95% confidence that the true proportion who agree with a statement lies within a 3% margin of error. In a particular sample of the population, the true proportion $p$ is unknown. However, the uncertainty around this parameter of success $p$ can be characterized with a Beta distribution.

We don't have any prior information about how $p$ is distributed, so we will say that $p \sim Beta(\alpha=1, \beta=1)$ as an uninformed prior. This is a uniform distribution of $p$ from 0 to 1.

As we get information from respondents from the survey, we get to update our beliefs as to the distribution of $p$ . The posterior distribution of $p$ when we get $\delta_y$ "yes" responses and $\delta_n$ "no" responses is $p \sim Beta(\alpha=1+\delta_y, \beta=1+\delta_n)$ .

Assuming the worst-case scenario where the true proportion is 0.5, we want to find the number of respondents $n=\delta_y+\delta_n$ such that only 0.025 of the probability mass is below 0.47 and 0.025 of the probability mass is above 0.53 (to account for the 95% confidence in our 3% margin of error). Namely, in a programming language like R, we want to figure out the $n$ such that qbeta(0.025, n/2, n/2) yields a value of 0.47.

If you use $n=1067$ , you get:

> qbeta(0.025, 1067/2, 1067/2) [1] 0.470019

which is our desired result.

In summary, 1,067 respondents who evenly split between "yes" and "no" responses would give us 95% confidence that the true proportion of "yes" respondents is between 47% and 53%.

— mnmn
সূত্র