গণনার জটিলতা সম্পর্কে জ্ঞান না থাকা প্রোগ্রামার হওয়ার সমস্যা কি?


30

আমাকে আমার বিশ্ববিদ্যালয়ে একটি মহড়া দেওয়া হয়েছে। আমি এটি বাড়িতে নিয়ে গিয়েছিলাম এবং এটির সমাধানের জন্য একটি অ্যালগরিদম প্রোগ্রাম করার চেষ্টা করেছি, এটি গ্রাফের সাথে সম্পর্কিত কিছু ছিল, সংযুক্ত উপাদানগুলি সন্ধান করেছিল, আমার ধারণা।

তারপরে আমি আমার মধ্যে সবচেয়ে তুচ্ছ জিনিসটি তৈরি করেছিলাম যা আমার মনের মধ্যে এসেছিল এবং তারপরে আমার প্রভাষককে দেখিয়েছি। একটি সংক্ষিপ্ত পর্যবেক্ষণের পরে, তিনি বুঝতে পেরেছিলেন যে আমার সমাধানের রানটাইম জটিলতা অদৃশ্য এবং আরও কিছু দক্ষ দেখায়। এবং প্রোগ্রামারদের একটি traditionতিহ্য আছে যাদের গণনা জটিলতা (আমি তাদের মধ্যে একজন ছিলাম) সম্পর্কে কোনও ধারণা নেই, সুতরাং যদি কোনও প্রোগ্রামার কম্পিউটারের জটিলতা কী তা সম্পর্কে ধারণা না রাখেন তবে এটি কী সমস্যা?


3
মডারেটর নোটিশ : দয়া করে প্রসারিত আলোচনার জন্য বা উত্তর উত্তর পোস্ট করার জন্য মন্তব্যগুলি ব্যবহার করবেন না। এই প্রশ্নটি আলোচনা করতে আপনি চ্যাট রুমটি ব্যবহার করতে পারেন ; পূর্ববর্তী মন্তব্যগুলি সেখানে সরানো হয়েছে।
গিলস

4
আপনার শিরোনাম প্রোগ্রামার বলে, কিন্তু আপনার প্রশ্ন ছাত্র বলে। সাধারণত 'প্রোগ্রামার' বলতে 'পেশাদার প্রোগ্রামার' বোঝায় - সুতরাং আপনি কি জিজ্ঞাসা করছেন যে গণনা জটিলতার অজান্তে পেশাদার প্রোগ্রামার হওয়ার সমস্যা? বা কোনও প্রোগ্রামিং শিক্ষার্থীর পক্ষে সেই জ্ঞান না থাকা ঠিক আছে কিনা? দুটি ভিন্ন প্রশ্ন, এমনকি যদি দেখা যায় যে তাদের একই উত্তর রয়েছে।
কর্সিকা

উত্তর:


42

হ্যাঁ, আমি বলব যে কোনও গুরুতর প্রোগ্রামারটির জন্য গণ্য জটিলতা সম্পর্কে কিছু জানা জরুরি। যতক্ষণ না আপনি বিশাল ডেটা সেটগুলির সাথে ডিল করছেন না আপনি জটিলতা না জেনে ভাল হবেন, তবে আপনি যদি এমন একটি প্রোগ্রাম লিখতে চান যা গুরুতর সমস্যাগুলি মোকাবেলা করে আপনার প্রয়োজন হয়।

আপনার নির্দিষ্ট ক্ষেত্রে, সংযুক্ত উপাদানগুলি অনুসন্ধানের আপনার উদাহরণটি নোডের গ্রাফের জন্য কাজ করতে পারে । তবে, আপনি যদি 100.000 নোড সহ একটি গ্রাফ চেষ্টা করে থাকেন তবে আপনার প্রভাষকের অ্যালগরিদম সম্ভবত এটি 1 সেকেন্ডে পরিচালনা করতে পারত, যখন আপনার অ্যালগরিদমটি (জটিলতাটি কতটা খারাপ ছিল তার উপর নির্ভর করে) 1 ঘন্টা, 1 দিন, এমনকি এমনকি 1 অনন্তকালও গ্রহণ করেছিল।100100.000

আমাদের অ্যালগরিদম কোর্সে শিক্ষার্থীরা কিছুটা সাধারণ ভুল করে তা এই জাতীয় অ্যারেতে পুনরাবৃত্তি হয়:

while array not empty
    examine first element of array
    remove first element from array

এটি সর্বাধিক সুন্দর কোড নাও হতে পারে তবে একটি জটিল প্রোগ্রামে প্রোগ্রামার এটি সম্পর্কে অবহিত না হয়ে এর মতো কিছু প্রদর্শিত হতে পারে। এখন, এই প্রোগ্রামে সমস্যা কি?

ধরুন আমরা এটি উপাদানগুলির একটি ডেটা সেটটিতে চালাই । নিম্নলিখিত প্রোগ্রামের তুলনায়, পূর্ববর্তী প্রোগ্রামটি চলবে 50.000 ধীর গতিতে।100.00050.000

while array not empty
    examine last element of array
    remove last element from array

আমি আশা করি আপনি সম্মত হন যে আপনার প্রোগ্রামটি গুণ দ্রুত চালিত করার জ্ঞান থাকা সম্ভবত কোনও প্রোগ্রামারের পক্ষে একটি গুরুত্বপূর্ণ বিষয়। দুটি প্রোগ্রামের মধ্যে পার্থক্য বোঝার জন্য জটিলতা তত্ত্ব সম্পর্কে কিছু প্রাথমিক জ্ঞান এবং আপনি যে ভাষায় প্রোগ্রামিং করছেন সেগুলির বিশদ সম্পর্কে কিছুটা জ্ঞান প্রয়োজন।50.000

আমার সিউডোকোড ভাষায়, "অ্যারের থেকে একটি উপাদান সরিয়ে ফেলা" সমস্ত উপাদানকে বাম থেকে একটি অবস্থান সরিয়ে ফেলা উপাদানটির ডানদিকে স্থানান্তরিত করে। এটি কেবলমাত্র 1 টি উপাদানের সাথে ইন্টারঅ্যাক্ট করতে হবে তা করার জন্য এটি সর্বশেষ উপাদানটিকে একটি অপারেশন সরিয়ে দেয় । প্রথম উপাদান মুছে ফেলার পদ্ধতি হল হে ( ) প্রথম উপাদান আমরা সব অন্যান্য নামান প্রয়োজন পরিষ্কার করার জন্য যেহেতু এন - 1 পাশাপাশি বাম এক অবস্থানে রয়েছে এমন উপাদানগুলির।O(1)O(n)n1

জটিলতার একটি খুব প্রাথমিক অনুশীলন হ'ল প্রথম প্রোগ্রামটি 1 করবে তা প্রমাণ করাঅপারেশন যখন দ্বিতীয় প্রোগ্রামটি কেবলএনক্রিয়াকলাপব্যবহারকরে। আপনি যদিএন=100.000প্লাগ করেনতবে দেখতে পাবেন একটি প্রোগ্রাম অন্যটির চেয়ে মারাত্মকভাবে কার্যকর efficient12n2nn=100.000

এটি কেবল একটি খেলনার উদাহরণ তবে এর মধ্যে দুটি প্রোগ্রামের মধ্যে পার্থক্য বলতে জটিলতার প্রাথমিক ধারণাটি প্রয়োজন, এবং যদি আপনি প্রকৃতপক্ষে আরও জটিল প্রোগ্রামটি ডিবাগ / অপ্টিমাইজ করার চেষ্টা করছেন তবে এই ভুলটি খুঁজে পেতে এটি আরও বৃহত্তর বোঝার দরকার পড়ে বাগ যেখানে আছে। কারণ এই ফ্যাশনে একটি অ্যারে থেকে কোনও উপাদান সরানোর মতো একটি ভুল কোডের বিমূর্ততা দ্বারা খুব ভালভাবে আড়াল করা যেতে পারে।

সমস্যা সমাধানের জন্য দুটি পদ্ধতির তুলনা করার সময় জটিলতার বিষয়ে ভাল বোঝাপড়া করাও সহায়তা করে। ধরুন আপনি নিজেরাই সংযুক্ত উপাদানগুলির সমস্যা সমাধানের জন্য দুটি ভিন্ন পদ্ধতির সাথে নিয়ে এসেছেন: তাদের মধ্যে সিদ্ধান্ত নেওয়ার জন্য এটি খুব কার্যকর হবে যদি আপনি (দ্রুত) তাদের জটিলতার অনুমান করতে পারেন এবং আরও ভালটি বেছে নিতে পারেন।


10
"So long as you are not dealing with huge data sets you will be fine not knowing complexity" This is often true, but not always so. For instance, an O(n!) algorithm will not be viable even for relatively small data sets. If you use an O(n!) algorithm where you could have used O(n^2) your program will take 36,288 times longer to execute on a data size of 10. On a data size of 20, you're looking at 2.4 quintillion operations.
reirab

1
I think @reirab's example should be included in the answer. It is more dramatic and proves your point more decisively. And I personally have been bitten by such algorithms, before I learned computational complexity.
Siyuan Ren

2
আমি মনে করি খেলতে আরও বড় সমস্যা আছে। আপনি যদি সহজেই জানেন না তবে এমন কাজগুলিতে আপনি নিজেকে নির্বাচন করুন যেখানে এটির প্রয়োজন হয় না। সুতরাং আপনি বলতে পারেন যে আমার প্রায় X প্রশ্নগুলি শেষ হওয়া উচিত, এটি কার্যকর হতে পারে know সুতরাং নির্বিশেষে যদি এটির সমালোচনা হয় তবে এটি জানা এখনও ভাল বা এটি শেষ পর্যন্ত আপনার কামড়ে আসতে পারে।
joojaa

"Understanding the difference between the two programs requires some basic knowledge about complexity theory" -- I think for this particular example it doesn't. You could profile it, observe that all the time is taken in "remove element", know (without understanding complexity theory) that removing the last element is faster than removing the first, make the change, and therefore speed up the program. The advantage of understanding complexity theory is that it lets you loosely quantity such problems without profiling them, so you can "prematurely" optimize.
Steve Jessop

.. and in general I suspect that all or almost all practical examples can be solved, one by one, without reference to complexity theory. In this case, knowing that copying a lot of data is slower than not doing so, isn't "complexity theory". But of course it's still useful in programming (and any profession) to have a good mental model of principles that commonly come up, because you can analyse, discuss and solve such problems routinely by principle instead of one at a time by ad hoc means.
Steve Jessop

26

This is a rebuttal of Tom van der Zanden's answer, which states that this is a must.

The thing is, most times, 50.000 times slower is not relevant (unless you work at Google of course).

If the operation you do takes a microsecond or if your N is never above a certain threshold (A high portion of the coding done nowadays) it will NEVER matter. In those cases thinking about computational complexity will only make you waste time (and most likely money).

Computational complexity is a tool to understand why something might be slow or scale badly, and how to improve it, but most of the time is complete overkill.

I've been a professional programmer for more than five years now and I've never found the need to think about computational complexity when looping inside a loop O(M * N) because always the operation is really fast or M and N are so small.

There are far more important, generally used, and harder things to understand for anyone doing programming jobs (threading and profiling are good examples in the performance area).

Of course, there are some things that you will never be able to do without understanding computational complexity (for example: finding anagrams on a dictionary), but most of the time you don't need it.


3
To expand on your point, there are cases where too much emphasis on computational complexity can lead you astray. For example, there may be situations where "better" algorithm is actually slower for small inputs. The profiler is the ultimate source of truth.
Kevin Krumwiede

2
@Kevin Krumwiede, I completely agree with you that optimizing a sort for a trivial data set is overkill. But it also illustrates that having at least an understanding of complexity is still important. The understanding is what will lead you to make the decision that a bubble sort is appropriate as opposed to some other, more complex, algorithm.
Kent A.

4
When you know the data set is small in all cases you can get away with this sort of thing. You have to be very careful of excess complexity in stuff called within loops, though--not long ago I cut a minute runtime to a second this way. I've also encountered a O(n^8) problem once (data validation.) Lots of care got it down to 12 hours.
Loren Pechtel

7
I've never found the need to think about computational complexity when looping inside a loop O(M * N) because always the operation is really fast or M and N are so small. – Ironically, the argument you give shows that you did think about computational complexity. You decided that it’s not a relevant issue for what you are doing and possibly rightfully so, but you are still aware of the existence of this issue, and if it would ever pose a problem, you could react to it before serious consequences happen on the user level.
Wrzlprmft

4
Premature optimization is the root of all evil, but premature pessimization is the root of at least a good deal of annoyed users. You may not need to be able to solve a recurrence relation, but if you are, at the very least, not capable of telling the difference between O(1), O(N) and O(N^2), especially when you're nesting loops, someone is going to have to clean up the mess later. Source: the messes I had to clean up later. A factor 50.000 is so big that you had better know if you can still afford that later, when your inputs have grown.
Jeroen Mostert

14

I've been developing software for about thirty years, working both as a contractor and employee, and I've been pretty successful at it. My first language was BASIC, but I quickly taught myself machine language to get decent speed out of my underpowered box. I have spent a lot of time in profilers over the years and have learned a lot about producing fast, memory efficient optimized code.

Regardless to say, I'm self taught. I never encountered the O notation until I started interviewing a few years ago. It's never come up in my professional work EXCEPT during interviews. So I've had to learn the basics just to handle that question in interviews.

I feel like the jazz musician who can't read sheet music. I can still play just fine. I know about hashtables (heck, I invented hashtables before I learned that they had already been invented) and other important data structures, and I might even know some tricks that they don't teach in school. But I think the truth is that if you want to succeed in this profession, you will either need to go indie or learn the answers to the questions that they will ask during interviews.

Incidentally, I most recently interviewed for a front end web developer role. They asked me a question where the answer required both a knowledge of computational complexity and logarithms. I managed to remember enough math from twenty years ago to answer it more or less correctly, but it was a bit jarring. I've never had to use logarithms in any front end development.

Good luck to you!


2
So, your answer is "yes"?
Raphael

6
TL;DR: "yes". However, in my experience you're not going to be talking about computational complexity in most jobs after you're hired. Yes, know your data structures and their performance, but just knowing that an algorithm is O(n) or whatever does not a good programmer make. It's much better to focus on writing good code quickly and then optimizing the hot spots later. Readability and maintainability are usually more important for most code than performance.
Scott Schafer

3
I think it may happen that complexity comes up in a corporate setting, but the first real concern for companies is shipping: if it works, it's good enough, until there's available budget to improve the app, or a customer comes back to complain about poor performances. In b2b situations for adhoc projects, it's probably quite uncommon. In b2c, or in highly competitive markets (off the shelf products), it would probably come up more often, with the direct effect of raising the entry bar for new hires.
didierc

4
@didierc "Good enough" is also what breaks things all the time.
Raphael

1
@didierc 1) Well, people with solid backgrounds in CS do (hopefully) have a good intuition for what is correct and what is not, whereas ad-hoc problem solvers may commit "simple" mistakes. Ensuring that the execution after multiplie compilations is exactly what was specifid is highly non-trivial, and afaik an unsolved problem. 2) No.
Raphael

9

The question is quite subjective, so I think the answer is it depends.

It doesn't matter that much if you work with small amounts of data. In these cases, it is usually fine to use whatever e.g. the standard library of your language offers.

However, when you deal with large amounts of data, or for some other reason you insist that your program is fast, then you must understand computational complexity. If you don't, how do you know how a problem should be solved, or how quickly it is even possible to solve it? But understanding just theory is not enough to be a really good programmer. To produce extremely fast code, I believe, you also have to understand how e.g. your machine works (caches, memory layout, the instruction set), and what your compiler does (compilers do their best, but are not perfect).

In short, I think understanding complexity clearly makes you a better programmer.


1
I think you generally have right idea, but "subjective" doesn't describe this issue adequately; "circumstantial" would be a better word. Also, one can however write very slow programs that don't operate on a lot of data. I recently answered a question on math.se about polynomial representation/storage. That usually involves a pretty small amount of data e.g. ~1000-term polynomials are typical; yet there are huge real-world differences in performance (hundreds or thousands of seconds vs. a few seconds for a multiplication) depending on the implementation.
Fizz

4

It is certainly a problem if someone who is developing significant algorithms does not understand algorithm complexity. Users of an algorithm generally rely on a good quality of implementation that has good performance characteristics. While complexity is not the only contributor to performance characteristics of an algorithm, it is a significant one. Someone who does not understand algorithm complexity is less likely to develop algorithms with useful performance characteristics.

It is less of a problem for users of an algorithm, assuming the algorithms available are of good quality. This is true for developers who use languages that have a significant, well-specified, standard library - they just need to know how to pick an algorithm that meets there needs. The problem comes in where their are multiple algorithms of some type (say, sorting) available within a library, because complexity is often one of the criteria for picking between. A developer who does not understand complexity then cannot understand the basis for picking an effective algorithm for their task at hand.

Then there are developers who focus on (for want of a better description) non-algorithmic concerns. For example, they may focus on developing intuitive user interfaces. Such developers will often not need to worry about algorithm complexity although, again, they may rely on libraries or other code being developed to a high quality.


3

It depends, but not on amount of data you're working with, but on kind of work you do, programs you develop.

Let's call programmer that doesn't know about conceptual complexity noobish programmer.

The noobish programmer can do:

  • develop big data databases - he doesn't have to know how it works inside, all he has to know are rules about developing databases. He knows things like: what should be indexed,... where it is better to make redundancy in data, where it is not...
  • make games - he just has to study how some game engine works and follow its paradigms, games and computer graphics are quite a big data problems. Consider 1920*1080*32bit = cca 7.9MB for single picture/frame... @60 FPS it's at least 475MB/s. Consider, that just one unnecessary copy of fullscreen picture would waste around 500MB memory throughput per second. But, he doesn't need to care about that, because he only uses engine!

The noobish programmer shouldn't do:

  • develop very frequently used complex programs no matter of size of data it's working with,... for example, small data won't cause noticeable impact of improper solution during development, because it will be slower than compilation time, etc. So, 0.5sec for one simple program ain't that much from noobish programmer perspective, Well, consider server server, that runs this program twenty times per second. It would require 10cores to be able to sustain that load!
  • develop programs for embedded devices. Embedded devices work with small data, but they need to be as efficient as it's possible, because redundant operations make unnecessary power consuption

So, noobish programmer is fine, when you want just use technologies. So, when it comes to development of new solutions, custom technologies, etc. Then it's better to hire not noobish programmer.

However, if company doesn't develop new technologies, just uses already made ones. It would be waste of talent to hire skilled and talented programmer. The same applies, if you don't want to work on new technologies and you're fine putting customers ideas into designs and programs using already made frameworks, then it's waste of your time, to learn something you won't ever need, except if it's your hobby and you like logical challenges.


1
This answer could be improved if it used a more neutral label, or no label at all, much like the other reply that used the term "incompetent programmer."
Moby Disk

1
I'm not sure what you mean by "conceptual complexity". My experience is that people who don't know enough about trees or hashtables can't make intelligent decisions regarding how to index (parts of) a big database.
Fizz

3

I'm somewhat hesitant to write an answer here but since I found myself nitpicking on several others' [some of my comments got moved to chat], here's how I see it...

There are levels/degrees of knowledge to a lot of things in computing (and by this term I mean roughly the union of computer science with information technology). Computation complexity surely is a vast field (Do you know what OptP is? Or what the Abiteboul-Vianu theorem says?) and also admits a lot of depth: most people with a CS degree can't produce the expert proofs that go into research publications in computational complexity.

The level of knowledge and skill/competence required in such matters depends a lot on what one works on. Completely clueless O(n2) sorting is sometimes said to be a major cause of slow programs[citation needed], but a 2003 SIGCSE paper noted "Insertion sort is used to sort small (sub) arrays in standard Java and C++ libraries." On the flip side, premature optimization coming from someone who doesn't understand what asymptotic means (computational complexity being such a measure) is sometimes a problem in programming practice. However, the point of knowing at least when computational complexity matters is why you need to have some clue about it, at least at an undergraduate level.

I would honestly dare compare the situation of knowing when to apply computational complexity concepts (and knowing when you can safely ignore them) with the somewhat common practice (outside of Java world) of implementing some performance-sensitive code in C and the performance-insensitive stuff in Python etc. (As an aside, this was called in a Julia talk the "standard compromise".) Knowing when you don't have to think about performance saves you programming time, which is a fairly valuable commodity too.

And one more point is that knowing computational complexity won't automatically make you good at optimizing programs; you need to understand more architecture-related stuff like cache locality, [sometimes] pipelining, and nowadays parallel/multi-core programming too; the latter has both its own complexity theory and practical considerations as well; a taste of the latter from a 2013 SOSP paper "Every locking scheme has its fifteen minutes of fame. None of the nine locking schemes we consider consistently outperforms any other one, on all target architectures or workloads. Strictly speaking, to seek optimality, a lock algorithm should thus be selected based on the hardware platform and the expected workload."


1
In the long run, developing or finding a better algorithm is usually more beneficial than changing programming language for the performance-sensitive bits. I agree with you that there is a strong association between lack of understanding of complexity and premature optimisation - because they usually target the less performance-sensitive bits for optimisation.
Rob

1
In practice, (inadvertent) Schlemiel the Painter's algorithms are much more frequent than O(n^2) sorting.
Peter Mortensen

-1

If you don't know big-O you should learn it. It's not hard, and it's really useful. Start with searching and sorting.

I do notice that a lot of answers and comments recommend profiling, and they almost always mean use a profiling tool.

The trouble is, profiling tools are all over the map in terms of how effective they are for finding what you need to speed up. Here I've listed and explained the misconceptions that profilers suffer from.

The result is that programs, if they are larger than an academic exercise, can contain sleeping giants, that even the best automatic profiler cannot expose. This post shows a few examples of how performance problems can hide from profilers.

But they cannot hide from this technique.


You claim "Big-Oh" is useful but then you advocate a different approach. Also, I don't see how learning "Big-Oh" (mathematics) can "start with searching and sorting" (algorithmis problems).
Raphael

@Raphael: I do not advocate a different approach - it's orthogonal.Big-O is basic knowledge for understanding algorithms, whereas finding performance problems in non-toy software is something you do after the code is written and run, not before. (Sometimes academics don't know this, so they continue teaching gprof, doing more harm than good.) In so doing, you may or may not find that the problem is use of a O(n*n) algorithm, so you should be able to recognize that. (And big-O is just a mathematically defined property of algorithms, not a different subject.)
Mike Dunlavey

"And big-O is just a mathematically defined property of algorithms, not a different subject." -- that's wrong, and dangerously so. "Big-Oh" defines classes of functions; per se, it has nothing to do with algorithms at all.
Raphael

আমাদের সাইট ব্যবহার করে, আপনি স্বীকার করেছেন যে আপনি আমাদের কুকি নীতি এবং গোপনীয়তা নীতিটি পড়েছেন এবং বুঝতে পেরেছেন ।
Licensed under cc by-sa 3.0 with attribution required.