The Incredible Life And Mysteries Of Rory Kesinger

Rory Gene Kesinger fled her Pembroke, Massachusetts home when police Lt. Willard Boulter knocked on the door to execute a traffic warrant. He noticed her running away as she was only wearing…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Understanding The Intuitions Behind Deep Learning

Getting insights about the basics can boost your learning to the next level

I have not been able to sleep very well for a few weeks to now, every time i touch the bed, a lot of thoughts comes to my mind without control, thoughts about what i was learning in the last moths, Deep Learning. All this thoughts come in a weird order, from here, from there, from every direction and about every little piece of knowledge i manage to get before. All this comes from the fact that i am trying to understood everything the best i can, trying to assemble the puzzle. it is painful not be able to order my thoughts, and that is the reason i decide to write my first article, to give order to my disorder, and by the way, give insights to other people like me, starting on this journey!.

What is a mathematical function? we all know this, is a combination of different math rules and building blocks that together and in certain order, are able to transform the inputs, in a meaningful output. The mathematicians got the job of find the correct combination and order of this building blocks and rules, guided by the observations they take from the real world and their intuitions, all with the hope of get answers.

This process of finding the correct combination of building blocks and rules, is very expensive and time consuming, a lot of personalities that we remember, took many years for finish their work. All this human effort was put there because it worth it, the fact that the most of the answers in our universe can be described using a mathematical function with the correct combination of blocks and rules is powerful and very attractive because at the end, what all of us look for, are answers, and have the superpower of get all this answers is something we cant ignore.

This lead us to a question, which is the correct combination that will give us the answers we want? that is not an easy one. We can split this question in two:

For the first one, we can deduce that as more quantity of blocks added to our function, more complex are the answers it can spit. The second one, is a bit more harder to response, but we can infer that there can be a few of combinations that will give us the information we want, additionally, there can be lots of combinations that will spit sub-optimal but acceptable results.

In short from the previous explanation: We can get any answers we want using a large enough network of operations and one of the correct combination of values inside that network.

From here, the big deal is, how we reach that perfect/acceptable combination of network and values? that depends on the complexity the question: If we want to know which is the double of a given number, we will no doubt about the solution:

There we have only one tune-able parameter, it is the factor (2) that scales our input x. if we put a (3) we do not answer our question (We answer another question that is not in our scope), the same if we put (1) there. So, (2) is the perfect value, but, maybe we can try (1.9999), that is close to the perfect solution, is an acceptable one, but not the perfect one.

But for complex questions we need complex functions, maybe we can still do it by hand, doing complex and time consuming studies about the problem we want to solve, just the way the mathematicians do it over the centuries. Fine-tuning the block of their functions, gathering data from here and there,
scratching every bit of meaningful information and intuition they can get.

But for even complex questions we need even complex functions, larger, and with too many more parameters, millions or billions of them (Yeah there is no limit to the size of theses networks, SPOILER: the limit are ourselves and our possibilities and ultimately, our limited human minds). How we can reach the perfect combination of parameters in that situation?. It is clear it is an infeasible task do it by hand.

This is the crossroad the big minds behind Machine Learning the last century face in that time. They Develop a very good solution, you surely hear about it before reach this article, the well know bundle, Gradient Descent plus Back Propagation, a set of algorithms capable of tune the parameters of a complex network of operations step by step, little by little, with the objective of reduce the difference between the output of the network, and a ground truth we know beforehand (AKA the kind of answers we need).

But, all of us already know that our imperfect minds almost always reach imperfect solutions, Gradient Descent and Back Propagation was the best approaches never developed for suit the problem of optimization we propose in this article, but is far, far away from perfection.

At this point, we need to get off the cloud we imagine at first, there is no magical way to get the perfect parameters combination for solve every question, what we have is this dirty algorithm, that at least for now we must accept as the best solution, that introduces a package of secondary effects that matters, we cant ignore them.

Among the problems we must deal from now we can mention a few, for example, the vanishing gradient problem, that stop us from create larger and larger networks for give answer to more complex questions and problems, this joint with the fact that this algorithms are iterative and resource hungry (Compute Power and Data), and those are limited resources for us. There are other problems like the Over-fitting, that surges from the fact that the human kind do not have all the information about everything for the train required by this complex functions (That is part of the reason we develop this solution first!), and the few data we have is imperfect. There is Under-fitting that happens among other reasons because the Function need to be optimized further or the Network of operations is not big enough. If not enough, there is no security of reach the perfect solution, we must to conform only with acceptable ones. I miss the magic non-existing solution that just spawn the perfect parameters :(.

Gradient Descent and Back Prop hurts notably the perfection of the diamond proposed in the first paragraphs, it is now turned in a brute diamond, and, like every brute diamond, we need to polish it for get higher value. Here is where the big minds put their efforts on patch or minimize all the problems caused by this dirty way of find acceptable parameters. They come with lots of ideas, ones that works, other not too much and just get forget or replaced with better ones, for say a few examples, we can mention Parameters Clipping, Parameters Regularization, Batch Normalization, Dropout, Gradient Descent with Momentum, Mini-Batch Gradient Descent, Skip-Connections, among lot of other ones, and every day new techniques arise for improving something here, something there, and all between.

This imperfection about all this topic, push the Deep Learning research field to find new architectures for approach different kind of data more
efficiently, for example Convolutional Neural Networks for image processing, or Recurrent Neural Networks for data with a ordered nature (one after the next), like Sentences, or Time distributed samples. There is also a lot of architectures between, like Recurrent Convolutional Neural Networks, that helps a lot when processing videos, that are no more that a sequence of ordered images!. Unsupervised Learning Techniques, that approach the fact that our data is not labeled by nature (AKA lacks of quality in terms of the Gradient Descent and Back Prop design).

Do not let yourself being contagion with my pessimism, but hope my vision of this Science/Tech field helps you get a deeper insight about what is going on here. Understand this is key for keep going and patching the problems we have. There is plenty of room for research and improvement on the Deep Learning field, that was one of the main reasons i decide to focus on this. And any one of us, or better, all of us, will shape the future with our ideas! Hoping the humanity do not self destroyed to time we publish our first paper :”).

In resume of all we talk here, Math functions let us transform any input to a meaningful output, using parameters that must be found, for find these parameters, GD and BP was designed, but they are not perfect solutions, lots of advances on the Deep Learning research field was pointing to patch the short-legs of those algorithms.

This article was pointing to Data Science beginners like my self, just opening a path on this, but if you are an experienced Data Scientist, i hope you like read my very first post, sorry if i am not asserted in some points explained here, but i am still learning :).

Add a comment

Related posts:

Getting first job in Freelancer

If you are planning on being a Freelancer or you are a new freelancer who hasn’t gotten his first job yet, you should know some important information head on. If you wanna work on a certain field…

Getting started with fish

I had used bash for several years. But I felt I couldn’t finish some tasks efficiently sometimes. This was the main motivation to start using fish for me. Fish is a great shell. I believe the biggest…

Abu Dhabi announces EV charging policy

Abu Dhabi has taken a significant step forward in creating the framework to meet the country’s Net Zero by 2050 strategic initiative with the release of the regulatory policy for electric vehicle…