Read Aloud the Text Content
This audio was created by Woord's Text to Speech service by content creators from all around the world.
Text Content or SSML code:
• why neural networks? Neural network has huge capacity for data utilisation. Traditional ML algorithms has data capacity limitation; as more and more data is being fed into traditionally ML algorithms, it gets to a point where they stop utilizing the new information. Giving the current data volume, DNN are very relevant today. Model class or Types --------------------- • The choice of model is data dependent. • Linear model: • Non-linear model Model Training --------------- • How do we come up with the best parameter settings for model? The process of parameter tuning or setting is called the training process. It is the process of determining the best choice pf parameter that produce the best output. • We may not be able to determine the best parameter (w) that would be suitable for all features. Our best bet could be to find the w that would best fit the features (even if it's not the optimum value) • Loss (loss function): a loss is an error in determining the best parameter for a function. Since we may not be able to come up with the optimum W, a loss is the difference between what you able to come up with and the optimum value of W (y1-y2)². Where y1 is the optimum value of W. And y2 is the one you were able to come up with. The data scientist must do the following before determining the best W: -- determine the model type (linear or non-linear) -- loss function (e.g squared LF, Cross entropy LF etc. -- Optimizer: optimizer minimizes the degree of the loss. (gradient descent, stochastic gradient descent, Adams etc) All the above choices (could be more than that) that you have to make before starting to minimise your LF, they are called HYPERPARAMETERS. Occam's Razor --------------- • It says if you have two models that are equally tied in terms of use, the simpler model is better. Overfitttng ---------- • It is a process where a function learn the training set perfectly but under perform in real dataset. This is due to tuning complicated models to fit a function perfectly. By doing this, you lost the original pattern in the data and go after the outliers and noise in the set. The major cause is deploying a very flexible function (complicated function) Underfitting ------------ • Deploying a very straight function without giving room for any flexibility. Thus under performing. It fails to get the original pattern. • The best model is not the one that minimizes the training loss. It is the one that captures the pattern of the data. How to avoid overfitttng ------------------------ • Use simple model • sample more training data • apply regularization: restriction on the model from using all the values assigned to a parameter(w). For instance, if W¹ could use from -1 to +infinity, I could resteict it to using 0-3. I can restrict other parameters. Regularization is also hyperparameter. Generalization -------------- • The performance of your model on unseen data. • You can either train and deploy the model and wait for users to tell you the performance of the model or you have a testing set to test the model. • If your training loss is low and your test loss is high, then your model has overfitttng. If on the other hand both your training and test loss are low, then your model is balanced. Data Snooping --------------- • Data Snooping is a phenomenon in which the test set is exposed to the model unknowningly, thereby contributing to the model performance. • When you use your training set for modeling, say first attempt, you may not get the best parameter combinations or best result after using the test set to test the performance pf the model. • You could adjust the model parameters and repeat the process again. • You could end up running the model multiple times before finally getting the correct model. • However, as you exposed your testing set to the function several times, the test set become more and more impure. This process of data impurity (test set) is called data snooping. • To avoid data snooping, validation or development set is used. • In this process, dataset is split into 3 sets, say training (80%), validation (15%), and testing (5%). • The validation set is used in place of the test set. After all the parameter tuning, the test set is used and the final test for the model, thereby avoiding data snooping. • It also a way to avoid overfitttng. Cross validation ---------------- • Cross validation is a technique for assessing how the model generalises to an independent data set • It is a technique for evaluating machine learning models by training several models on subsets of the available input data and evaluating them on the complementary subset of the data. Using cross-validation, there are high chances that we can detect over-fitting with ease. • The k-fold is the most popular means of computing cross validation. • Initially, the entire training data set (after removing test set from the data) is broken up in k equal parts. The first part is kept as the hold out (testing) set and the remaining k-1 parts are used to train the model. • The above process is repeated k times, in each case we keep on changing the holdout set. Thus, every data point get an equal opportunity to be included in the test set. • from sklearn.model_selection import cross_val_score print(cross_val_score(model, X_train, y_train, cv=5)) • The method will return a list of k accuracy values for each iteration. In general, we take the average of them and use it as a consolidated cross-validation score. • import numpy as np print(np.mean(cross_val_score(model, X_train, y_train, cv=5)))