 Hello, Fermenters! Deep learning has become a powerful engine for working with artificial intelligence. Vivid illustrations and simple code examples will save you from the need to delve into the complex aspects of constructing deep learning models, making complex tasks accessible and fun.

John Krohn, Grant Beileveld, and the illustrator Aglae Bassens, a great illustrator, use vivid examples and analogies to explain what deep learning is, why it is so popular, and how the concept is related to other approaches to machine learning. The book is ideal for developers, data processing specialists, researchers, analysts and novice programmers who want to apply deep learning in their work. Theoretical calculations are perfectly complemented by Python application code in Jupyter notebooks. You will learn the techniques for creating effective models in TensorFlow and Keras, as well as get to know PyTorch.

Basic knowledge of deep learning will allow you to create real applications - from computer vision and natural language processing to image generation and game algorithms.

### Keras-based intermediate depth network

At the end of this chapter, we will translate the new theoretical knowledge into the neural network and see if we can surpass the previous shallow_net_in_keras.ipynb model in the classification of handwritten numbers.

The first few stages of creating a network of intermediate depth in the notebook Jupyter intermediate_net_in_keras.ipynb are identical to the stages of creating its predecessor - a shallow network. First, the same Keras dependencies are loaded, and the MNIST data set is entered and processed in the same way. As you can see in Listing 8.1, the fun part starts where the neural network architecture is defined.

Listing 8.1. Code Defining Neural Network Architecture with Intermediate Depth

``model=Sequential() model.add(Dense(64, activation='relu', input_shape=(784,))) model.add(Dense(64, activation='relu')) model.add(Dense(10, activation='softmax')) ``

The first line in this code fragment, model=Sequential (), is the same as in the previous network (Listing 5.6); This is an instance of a neural network model object. The next line begins the discrepancy. In it, we replaced the sigmoid activation function in the first hidden layer with the relu function, as recommended in chapter 6. All other parameters of the first layer, except the activation function, remained the same: it still consists of 64 neurons, and the dimension of the input layer remained the same - 784 neurons.

Another significant change in Listing 8.1 over the shallow architecture in Listing 5.6 is the presence of a second hidden layer of artificial neurons. By calling the model.add () method, we effortlessly add a second Dense layer with 64 relu neurons, justifying the word intermediate in the name of the notepad. By calling model.summary (), you can see, as shown in Fig. 8.9 that this extra layer adds 4160 additional learning parameters, compared to a shallow architecture (see Figure 7.5). Parameters can be divided into:

• 4096 weights corresponding to the bonds of each of 64 neurons in the second hidden layer with each of 64 neurons in the first hidden layer (64 × 64=4096);
• plus 64 offsets, one for each neuron in the second hidden layer; ????
• the result is 4160 parameters: nparameters=nw + nb=4096 + 64=
=4160.

In addition to changes to the model architecture, we also changed the compilation options for the model, as shown in Listing 8.2.

Listing 8.2. Intermediate Depth Neural Network Compilation Code

``model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.1), metrics=['accuracy']) `` These lines from Listing 8.2:
????
• define the cost function based on cross-entropy: loss='categorical_crossentropy' (the squared value loss='mean_squared_error' was used in a shallow network);
• define a stochastic gradient descent method to minimize cost: optimizer=SGD;
• determine the hyperparameter of the learning speed: lr=0.1 (1) ;
• indicate that in addition to the default loss feedback provided by Keras library, we also want feedback on model accuracy: metrics=['accuracy'] (2) .

(1) You can try to increase the learning speed by several orders of magnitude, and then reduce it by several orders of magnitude and observe how this will affect learning.

(2) Loss is the most important indicator that allows you to see how the quality of a model changes over time, but the specific value of the loss depends on the characteristics of this model and, as a rule, cannot be interpreted and cannot be compared for different models. Therefore, even knowing that the losses should be as close to zero as possible, it is often very difficult to understand how close the loss to a particular model is to zero. Accuracy, on the other hand, is easy to interpret and easy to generalize: we know exactly what it means (for example, “a shallow neural network correctly classified 86 percent of handwritten digits in a test data set”), and we can compare with the accuracy of any other model (“accuracy 86 percent worse than the accuracy of our deep neural network ”).

Finally, we train the intermediate network by running the code in Listing 8.3.
Listing 8.3. Intermediate Depth Neural Network Learning Code

``model.fit(X_train, y_train, batch_size=128, epochs=20, verbose=1, validation_data=(X_valid, y_valid)) ``

The only thing that has changed in training an intermediate network compared to a shallow network (see Listing 5.7) is to reduce the epochs hyperparameter by an order of magnitude from 200 to 20. As you will see later, a more efficient intermediate architecture requires much less epochs to learn. In fig. 8.10 presents the results of the first four eras of network learning. As you probably remember, our shallow architecture reached a plateau at 86% accuracy on test data after 200 eras. The network of intermediate depth significantly exceeded it: as the val_acc field shows, the network reached 92.34% accuracy after the first training epoch. After the third epoch, accuracy exceeded 95%, and by the 20th epoch it seems to have reached a plateau at about 97.6%. We are seriously moving forward!

Let us examine in more detail the output of model.fit (), shown in Fig. 8.10:

• The process indicator shown below is filled in during 469 “training cycles” (see Fig. 8.5):
60000/60000 [==============================]
• 1s 15us/step means that for all 469 training cycles in the first era it took 1 second, an average of 15 microseconds per cycle.
• loss shows the average cost on training data for an era. For the first epoch, it is equal to 0.4744 and steadily decreases from epoch to epoch by the methods of stochastic gradient descent (SGD) and back propagation, and ultimately decreases to 0.0332 by the twentieth epoch.
• acc - accuracy of classification on training data in this era. The model correctly classified 86.37% after the first era and reached a level above 99% by the twentieth. Since the model can be retrained, one should not be particularly surprised at the high accuracy in this parameter.
• Fortunately, the cost of a validation dataset (val_loss), as a rule, also decreases and ultimately reaches a plateau at 0.08 during the last five eras of training.
• At the same time as the cost decreases, the accuracy increases (val_acc) on the verification data. As already mentioned, the accuracy on the verification data was 97.6%, which is significantly higher than 86% in our shallow network.

### Summary

In this chapter, we have done a great job. First, we learned how a neural network with fixed parameters processes information. Then we sorted out the interacting methods - cost functions, stochastic gradient descent and back propagation, which allow you to adjust the network parameters to approximate any true value of y that has a continuous relationship with some input x. Along the way, we got acquainted with several hyperparameters, including the learning speed, the package size and the number of training eras, as well as the practical rules for setting each of them.At the end of the chapter, we applied new knowledge to create a neural network of intermediate depth, which significantly surpassed the previous shallow network in the same task of classifying handwritten numbers. Next, we will get acquainted with methods to improve the stability of artificial neural networks as they deepen, which will allow us to develop and train a full-fledged deep learning model.