এমন কোনও তদারকি-শেখার সমস্যা রয়েছে যা (গভীর) নিউরাল নেটওয়ার্কগুলি সম্ভবত অন্য কোনও পদ্ধতিকে ছাড়িয়ে যেতে পারে না?


33

আমি দেখেছি লোকেরা এসভিএম এবং কার্নেলগুলিতে প্রচুর প্রচেষ্টা করেছে এবং তারা মেশিন লার্নিংয়ের একটি স্টার্টার হিসাবে বেশ আকর্ষণীয় দেখায়। তবে আমরা যদি আশা করি যে প্রায়-সর্বদা আমরা (গভীর) নিউরাল নেটওয়ার্কের ক্ষেত্রে কার্যকর ফলাফল খুঁজে পেতে পারি তবে এই যুগে অন্যান্য পদ্ধতির চেষ্টা করার অর্থ কী?

এই বিষয়টিতে আমার সীমাবদ্ধতা এখানে।

  1. আমরা কেবল তদারকি-শিক্ষার বিষয়ে চিন্তা করি; রিগ্রেশন এবং শ্রেণিবিন্যাস।
  2. ফলাফলের পাঠযোগ্যতা গণনা করা হয় না; তত্ত্বাবধান-লার্নিং সমস্যার উপর কেবল নির্ভুলতা গণনা করা হয়।
  3. গণনা-ব্যয় বিবেচনায় নেই।
  4. আমি বলছি না যে অন্য কোনও পদ্ধতি অকেজো।

3
Any constraints on the amount of training data available?
Jake Westfall

1
I haven't done it, but I expect you'd have one hell of a hard time training a neural network to do e.g. a singular-value decomposition on a non-trivially-sized matrix (say, rank > 10).
Mehrdad

1
Google Translate now uses neural networks, and now produces more curious fails for a Brazilian town name where a word-for-word use of a dictionary would have been much better
Henry

I haven't ever heard of deep learning for use of matrix completion (although using matrix completion before deep learning is common practice). You could argue this could be a computational-cost issue, but it's also worth noting that I don't know if all the computers in the world could do deep learning matrix completion with, say, the netflix problem.
Cliff AB

@CliffAB: (tongue halfway in cheek...) it's probably worth noting that they might not be able to, but I'm not sure it's worth noting that you don't know if they are able to ;)
Mehrdad

উত্তর:


31

Here is one theoretical and two practical reasons why someone might rationally prefer a non-DNN approach.

  1. The No Free Lunch Theorem from Wolpert and Macready says

    We have dubbed the associated results NFL theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.

    In other words, no single algorithm rules them all; you've got to benchmark.

    The obvious rebuttal here is that you usually don't care about all possible problems, and deep learning seems to work well on several classes of problems that people do care about (e.g., object recognition), and so it's a reasonable first/only choice for other applications in those domains.

  2. Many of these very deep networks require tons of data, as well as tons of computation, to fit. If you have (say) 500 examples, a twenty layer network is never going to learn well, while it might be possible to fit a much simpler model. There are a surprising number of problems where it's not feasible to collect a ton of data. On the other hand, one might try learning to solve a related problem (where more data is available), use something like transfer learning to adapt it to the specific low-data-availability-task.

  3. Deep neural networks can also have unusual failure modes. There are some papers showing that barely-human-perceptible changes can cause a network to flip from correctly classifying an image to confidently misclassifying it. (See here and the accompanying paper by Szegedy et al.) Other approaches may be more robust against this: there are poisoning attacks against SVMs (e.g., this by Biggio, Nelson, and Laskov), but those happen at train, rather than test time. At the opposite extreme, there are known (but not great) performance bounds for the nearest-neighbor algorithm. In some situations, you might happier with lower overall performance with less chance of catastrophe.


I agreed everything you say. But the problem is about "computational issues ignored". That means the OP assumes, you'd have infinite samples and infinite computing resources.
SmallChess

17
Infinite computation != infinite samples. For example, I have access to an astonishingly large cluster for processing data. However, the lab experiments we do to actually acquire some of that data are difficult, slow, and time-consuming (on the order of hours to days for a single data point) and all the computation in the world won't help that along.
Matt Krause

2
SVMs with any given feature extractor are likely just as vulnerable to adversarial inputs as CNNs – it's just harder to find them, because we don't have easily available gradients of the feature extraction layers.
Dougal

1
A recent and interesting real-life example of the problem @MattKrause gives and an attempt to circumnavigate it using transfer learning is presented in Sim-to-Real Robot Learning from Pixels with Progressive Nets
HBeel

@Dougal, I also wonder if it matters that the DNN-feature-extractors are learned, while the SVM ones are (usually) hand-made and correspond to features that humans notice. Part of what makes the panda example so insidious is the imperceptible difference between the adversarial example and the regular one.
Matt Krause

24

Somewhere on this playlist of lectures by Geoff Hinton (from his Coursera course on neural networks), there's a segment where he talks about two classes of problems:

  1. Problems where noise is the key feature,
  2. Problems where signal is the key feature.

I remember the explanation that while neural nets thrive in this latter space, traditional statistical methods are often better suited to the former. Analyzing high-res digital photographs of actual things in the world, a place where deep convolutional nets excel, clearly constitutes the latter.

On the other hand, when noise is the dominant feature, for example, in a medical case-control study with 50 cases and 50 controls, traditional statistical methods may be better suited to the problem.

If anybody finds that video, please comment and I'll update.


Excellent answer. Exactly why we turn to deep learning for things we can already do (like recognize images and write text) but may turn to other models for things that may be intuitively difficult.
Mustafa S Eisa

I personally take this answer in my greatest interest. Thank you so much for the answer.
Robin

13

Two linearly perfected correlated variables. Can deep-network with 1 million hidden layers and 2 trillion neutrons beat a simple linear regression?

EDITED

In my experience, sample collection is more expensive than computation. I mean, we can just hire some Amazon instances, run deep learning training and then come back a few days later. The cost in my field is about $200 USD. The cost is minimal. My colleagues earn more than that in a day.

Sample collection generally requires domain knowledge and specialized equipments. Deep learning is only suitable for problems with cheap and easy access data set, such as natural language processing, image processing and anything that you can scrape off from the Internet.


1
Of course any MLE method will out-perform deep learning conditional on the generating model meeting the assumptions of the MLE. However, this is never happens on real data, or at least for any interesting problems (i.e. not predicting outcome of coin flip). So I think the OP is asking for examples involving real questions of interest with real data.
Cliff AB

It is a very nice answer. You have offered a very intuitive, and realistic viewpoint. Thank you so much.
Robin
আমাদের সাইট ব্যবহার করে, আপনি স্বীকার করেছেন যে আপনি আমাদের কুকি নীতি এবং গোপনীয়তা নীতিটি পড়েছেন এবং বুঝতে পেরেছেন ।
Licensed under cc by-sa 3.0 with attribution required.