সামিয়া নিউরাল নেটওয়ার্কে ব্যাক-প্রসারণ কীভাবে কাজ করে?

আমি স্বাক্ষর স্বীকৃতির জন্য ১৯৯৪ সালে ইয়ান লেকুন এবং তার সহযোগীদের দ্বারা প্রবর্তিত সাইমাস নিউরাল নেটওয়ার্কের আর্কিটেকচার অধ্যয়ন করছি ( " সাইমাস সময় বিলম্বিত নিউরাল নেটওয়ার্ক ব্যবহার করে স্বাক্ষর যাচাইকরণ"। পিডিএফ , এনআইপিএস ১৯৯৪)

আমি এই আর্কিটেকচারের সাধারণ ধারণাটি বুঝতে পেরেছিলাম, তবে ব্যাকপ্রোপেশন এই ক্ষেত্রে কীভাবে কাজ করে তা আমি সত্যিই বুঝতে পারি না। নিউরাল নেটওয়ার্কের লক্ষ্য মানগুলি কী কী তা আমি বুঝতে পারি না, যা প্রতিটি নিউরনের ওজন সঠিকভাবে সেট করতে ব্যাকপ্রোপেশনকে মঞ্জুরি দেয়।

চেন লিউ (টরন্টো ইউনিভার্সিটি ২০১৩) দ্বারা "শেখার উপস্থাপনার জন্য সম্ভাব্য সিয়ামিজ নেটওয়ার্ক" থেকে চিত্র

এই আর্কিটেকচারে, অ্যালগরিদম দুটি নিউরাল নেটওয়ার্কের চূড়ান্ত উপস্থাপনার মধ্যে কোসাইন মিলকে গণনা করে কাগজটি বলে: "সত্যিকারের স্বাক্ষরগুলি উপস্থাপিত করার সময় দুটি সাবনেটওয়ার্কের ফলাফল (এফ 1 এবং এফ 2) এর মধ্যে একটি ছোট কোণের জন্য আকাঙ্ক্ষিত আউটপুট হয় , এবং একটি বড় স্বাক্ষর যদি স্বাক্ষরগুলির একটি জালিয়াতি হয় "।

আমি সত্যিই বুঝতে পারি না কীভাবে তারা ব্যাকপ্রসারণ চালানোর জন্য লক্ষ্য হিসাবে একটি বাইনারি ফাংশন (দুটি ভেক্টরের মধ্যে কোসাইন মিল) ব্যবহার করতে পারে।

সামিয়াস নিউরাল নেটওয়ার্কগুলিতে ব্যাকপ্রোপেশনটি কীভাবে গণনা করা হয়?

neural-networks

— DavideChicco.it
সূত্র

আমি কাগজটি ডাউনলোড করতে পারছি না .... আপনার অন্য কোনও বা ড্রপবক্স উত্স আছে?

— ব্র্যাথলজ

এনআইপিএস সংরক্ষণাগার: কাগজ.নিপস.সি.সি.

— ইয়ানিস

উভয় নেটওয়ার্ক একই ধরণের আর্কিটেকচার ভাগ করে নিলেও প্রকাশনা বিভাগ 4 [1] এ বর্ণিত যেমন ওজন রয়েছে তেমন তাদের প্রতিবন্ধকতা রয়েছে।

তাদের লক্ষ্য হ'ল এমন বৈশিষ্ট্যগুলি শিখুন যা কোস্টিনের সাদৃশ্যকে হ্রাস করে, তাদের আউটপুট ভেক্টর যখন স্বাক্ষরগুলি অকৃত্রিম হয় এবং জাল করা হয় তখন এটি সর্বাধিক করে তোলে (এটি ব্যাকপ্রপ লক্ষ্যটিও রয়েছে, তবে প্রকৃত ক্ষতি ফাংশনটি উপস্থাপন করা হয়নি)।

$\cos(A,B) = {A \cdot B \over \|A\| \|B\|}$ of two vectors $A, B$ , is a measure of similarity that gives you the cosine of the angle between them (therefore, its output is not binary). If your concern is how you can backprop to a function that outputs either true or false, think of the case of binary classification.

You shouldn't change the output layer, it consists of trained neurons with linear values and its a higher-level abstraction of your input. The whole network should be trained together. Both outputs $O_1$ and $O_2$ are passed through a $cos(O_1,O_2)$ function that outputs their cosine similarity ( $1$ if they are similar, and $0$ if they are not). Given that, and that we have two sets of input tuples $X_{Forged}, X_{Genuine}$ , an example of the simplest possible loss function you could have to train against could be:

L = \sum_{(x_{A}, x_{B}) \in X_{F o r g e d}} c o s (x_{A}, x_{B}) - \sum_{(x_{C}, x_{D}) \in X_{G e n u i n e}} c o s (x_{C}, x_{D})

$\mathcal{L}=\sum_{(x_A,x_B) \in X_{Forged}} cos(x_A,x_B) - \sum_{(x_C,x_D) \in X_{Genuine}} cos(x_C,x_D)$

After you have trained your network, you just input the two signatures you get the two outputs pass them to the $cos(O_1,O_2)$ function, and check their similarity.

Finally, to keep the network weights identical there are several ways to do that (and they are used in Recurrent Neural Networks too); a common approach is to average the gradients of the two networks before performing the Gradient Descent update step.

[1] http://papers.nips.cc/paper/769-signature-verification-using-a-siamese-time-delay-neural-network.pdf

— Yannis Assael
সূত্র

I know that the target is to minimize the cosine similarity, but I cannot understand what I should insert in the output layer of my neural network. When I create the neural network, I put the targets in the last output layer. If they're values, that's alright. But if the target is a function, where do I find the values to fill? Thanks

— DavideChicco.it

I have updated my answer. Your output layer will just be another normal layer that it outputs to the cosine similarity function. The two networks connected the cosine similarity function should be trained together against a loss criterion. Finally, I've suggested you the most simple loss you could have in this case.

— Yannis Assael

ধন্যবাদ। আমি মনে করি আমি আমার সমস্যাটি বুঝতে পেরেছি: এটি টর্চ 7-এ এই নিউরাল নেটওয়ার্ক আর্কিটেকচারের বাস্তবায়ন। এখানে, প্রশিক্ষণ এবং পরীক্ষার আগে, নিউরাল নেটওয়ার্ক নির্মাণের সময়, আমাকে ইনপুট ডেটাসেট মান এবং আউটপুট-লক্ষ্য স্তর মান সহ অনেক কিছুই নির্দিষ্ট করতে হবে । অতীতে, আমি তদারকি সমস্যাগুলির সাথে মোকাবিলা করেছি যা সর্বদা স্থির আউটপুট-টার্গেট স্তর মান (যেমন সত্য / মিথ্যা লেবেল বা [0, 1] ব্যবধানে মান) থাকে। তবে এবার এটি আলাদা: আউটপুট স্তরটি এমন দুটি ফাংশনের উপর নির্ভর করে যা প্রশিক্ষণের সময় গণনা করা হবে। এটা কি সঠিক?

— ডেভিডচিকো.ইট

Exactly this time you have linear values in the output layer not binary (so its just the output of the neuron). Furthermore, you don't have direct output-target values in this case, but you have a loss function to optimize. Finally, the output layer is the output of

n_{o u t p u t}

$n_{output}$ neurons (the number of units

n_{o u t p u t}

$n_{output}$ is defined by the model architecture and is referenced in the paper). The neurons depending on the activation function chosen (tanh, sigmoid etc) have linear not binary activations [-1,1] or [0,1] respectively.

— Yannis Assael

Thanks @iassael. Do you have any idea on how to implement this in Torch7 ?

— ডেভিডচিকো.ইট