A general roadmap for building a deep learning architecture

Building a deep learning model has always been a daunting topic for me. There is always so much to do. Before even starting to train the data, a lot has to be decided, from determining the number of I/O modules to deciding which activation and optimization to use and the worst, when and where to do what. In machine learning things are quite improved, all thanks to the sci-kit learning library, where after some pre-processing of the data, all you have to do is create the learner and then train and fit the data you in one step. predictions and find the accuracy. In just three or four cheerfully short and clean steps, you’re done with a simple machine learning model. But unfortunately things are not that simple for building deep learning models. Of course, there is a reason to choose this computationally extensive, complex architecture, as the extent of nonlinearity in your data and target can be too complex for an ML algorithm to learn and resolve. So I tried to outline the general steps in deep learning that every architect should go through regardless of architecture. Of course this is a general way of explaining, but sometimes a road map really helps for an absolute beginner. I have noted the highlights of the discussion in this article in a chart at the end and now let’s get started.

Get your data

As a beginner, I’ve mostly relied on Kaggle for deep learning data, and the best thing is that you can read the notebooks of what others have submitted if you’re not sure how to approach the data. But I would urge you to try to do it yourself first and save other people’s notebooks as a last resort if you can’t make head or tail of the data set or compare your work. If you are more familiar with machine learning (which you probably are) try working with a familiar dataset at first since you don’t have to deal with how to pre-process the data. UCI archive also offers some fantastic datasets for machine learning. Also Github is regularly updated with good datasets.

Data Cleaning and Analysis

The main cleanup involves how to deal with missing values. If your data set is quite large, say over 10,000, then as a beginner, I would suggest you to clearly delete the examples with missing data. Because sometimes the data rendering techniques to fill these gaps tend to mess up the training process later. But if you must, then you might fill in an average or a mode (if the column is categorical). If the columns are correlated, then you can use machine learning to estimate the missing value. Your next hit would be outliers in the data. Some of the professionals prefer to “fix” the value. Simply put, if the value is below 1.5 times the Inter Quartile Range (IQR) of the minimum, they will raise it to the minimum value, and if it is more than 1.5 times the IQR of the maximum, the value will be truncated to maximum price. But I would ask you to tread carefully here. While the lousy data entry operator might have added an extra ‘0’ to the price of Dizzyland’s 4BHK house and taken it to INR 5600K while the 3BHK in the neighborhood sold for just INR 450K, but a remedy that barely lengthened Mr. Gupta’s honor Life in just 3 weeks can give Mr. Shen 4 more years to his life, this is a fact and not a rumor. In short, what I want to say is that determining outliers may not be a simple task and in most cases requires some domain knowledge. So if your data set is too noisy and that’s hindering learning, I’d ask you to leave that set and try a new one, then revisit it later. Data analysis is important as you may want to leave some of the unnecessary columns to make learning easier. For example, if a Starbucks 250 yards from a house doesn’t contribute much to its price, then it can safely be ignored as a column. Finally, as with machine learning, this is ultimately a numbers game, so you’ll need to transform all your categorical data into numbers and also normalize and scale your data, and you’re free to use your old friend’s learning library to the job. There may also be some application-based preprocessing, such as padding or trimming in the case of text data, if you’re building a natural language processing application.

Create the Network

The simplest model will have an input layer, a hidden layer and an output layer. Since a hidden layer itself is linear, you should of course add an activation layer to add nonlinearity to your model. Also, say you are working with Multi layered Perceptron (MLP), the number of hidden layers and the number of hidden units in each is another pro parameter to be decided and there is no general rule. You should get your hands dirty, try and fail and try again. The first few days teaching your model as a new teacher is not a pretty process. But it will get easier with time. RNNs and CNNs will mostly be used as intermediate layers if your data is image or text, and the output will mostly be fed to an MLP output layer that will tell if the image was of a dog or a cat, or if a certain speech was offensive. So your output layer will have output units equal to the number of classes in the classification dataset, or one in case of regression.

Fix hyperparameters

Let’s say if you are intuitive enough not to struggle with these intermediate layers in your neural network, there are still a few parameters that need to be coded. First, of course, is the learning rate. As a golden rule, start with a small number, as AI and DL fans suggest. Therefore, you should decide on the optimizer to use to eliminate the empirical error in your data set. Choose from a variety of Gradient Descent, RMSProp or Adam type algorithms. Next comes the initialization of the parameters, in simple words, initializing the weights and biases of each individual layer. You won’t really have to think about whether you should fix the W matrix to 0.00001 for all values in the first level and keep that bias factor at zero, no, the designer angels of the deep learning libraries have it covered. There are off-the-shelf functions in popular frameworks like Keras and Pytorch. Typically, if you are using a sigmoid type of activation in the plane, use a Xavier or Glorot preparation, and if your choice is ReLU or its type, choose any of the variations of the He preparation. In short, the choice of initialization depends on the type of activation function you use, so that a sigmoidal curve does not become a straight line for bad initialization W, b. Then choose a loss function. For starters, MSE for regression and Cross Entropy for classification are quite good. Next, consider the lot size. After much trial and error, finally, the minibatch approach to optimize the loss function has been declared the best and is the generally accepted way of training the model. A very popular choice is 64, but don’t be afraid to try 32 or 128 or whatever you prefer. Last but not least, specify the number of epochs to run training on your data. An epoch is completed when the model finishes training on all examples in the data set once. So decide how many times to train your model on the dataset. Unlike your human pupil, this artificial pupil will only be refined to a limit. This number of times can be from 25 to 250 or even more or less than that. So again, there is no general rule here.

Start training

If you are using Pytorch, you will need to batch the train and test dataset. Next you should set the training mode. While the vanishing gradient problem is well addressed, you might want to define a function to take care of the exploding gradient. For example, you can go through my transformer model and get to the training section where a slope cutoff function is defined. Note the model’s accuracy per epoch to see if the model has already started to overfit (a way to also find out how many epochs are good enough for your cyber learner).

Now cross your fingers compile and run the model. Are you happy with the accuracy? If so, congratulate yourself and go get some fresh air. If not, don’t worry, you’re not alone. Let’s explore a little deeper.

A super-custom model : Time to tidy up your model. Add some leaks in the intermediate layers, add some weight decay to make the learning process more difficult. Both are hyperparameters and your intermediate layers already have it nulled, you just need to pass some values as a parameter where you set the layer (see the library documentation here).

Underfitting model : Add one or both more intermediate layers, increasing the number of hidden modules.

Difference : Well, there may be no explainable reason for this. If you are using a new data set, just check if the data can be better pre-processed, such as rescaling or further handling of outliers. Also change the batch size, learning rate. This, to be honest, cannot be explained well. So just revisit all the previous steps and see what changes can be made. You have to go slower here. Please do not make the changes at the same time.

The main points of the article are illustrated

While an entire book can be written in detail on this topic, I tried to add my two cents to it. I hope it helps you form an opinion on the matter.

AI companies are finally being forced to cough up training data

NanoNets AI solution feeds delivery information to Jamix

Why harmonize bank statements? Explain the importance and benefits

Que sont les règles métier ? : The wizard is not complete

Training AI music models is about to get very expensive

Understanding YOLOv5 Loss: A Comprehensive Analysis

Master Advanced Prompt Engineering with LangChain for Context-Aware Language Models

Arduino vs Raspberry Pi: What’s the difference?

Top 20 Generative AI Applications/ Use Cases Across Industries

Top 35+ Finance Interview Questions And Answers

A general roadmap for building a deep learning architecture

DataRobot: A Leader in the 2024 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms

Building a $2bn game by breaking the rules

Understanding YOLOv5 Loss: A Comprehensive Analysis

Master Advanced Prompt Engineering with LangChain for Context-Aware Language Models

A Deep Dive into In-Context Learning | by Aris Tsakpinis | May, 2024

Building a Strong AI Foundation: The Critical Role of High-Quality Data

How Forex Trading Robots Are Transforming Financial Markets

U.S. Awards $504 Million for ‘Tech Hubs’ in Overlooked Regions

Meta drops ‘3D Gen’ bomb: AI-powered 3D asset creation at lightning speed

Our Picks

How Forex Trading Robots Are Transforming Financial Markets

U.S. Awards $504 Million for ‘Tech Hubs’ in Overlooked Regions

Meta drops ‘3D Gen’ bomb: AI-powered 3D asset creation at lightning speed

Subscribe to Updates

A general roadmap for building a deep learning architecture

Related Posts