# NPTEL Introduction to Machine Learning – IITM Assignment 2021

NPTEL INTRODUCTION TO MACHINE LEARNING – IITM course aimed at helping students enable data-driven disciplines with the increased availability of a variety of data from varied sources There has been increasing attention paid to the various methods of analytics and machine learning.

NPTEL INTRODUCTION TO MACHINE LEARNING is a MOOC course offered by IIT Madras on the NPTEL platform. This course is intend to introduce some of the basic concepts of machine learning The course is developed by Prof. Balaraman Ravindran is currently a Professor in Computer Science at IIT Madras and Mindtree Faculty Fellow.

1. Who Can Join: This is an elective course. Intended for senior UG/PG students. BE/ME/MS/PhD
2. Requirements/Prerequisites: We will assume that the students know programming for some of the assignments.If the students have done introductory courses on probability theory and linear algebra it would be helpful. We will review some of the basic topics in the first two weeks as well.
3. INDUSTRY SUPPORT: Any company in the data analytics/data science/big data domain would value this course.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of the best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100 Final score = Average assignment score + Exam score

Students will be eligible for CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If any of the 2 criteria are not met, the student will not get the certificate even if the Final score >= 40/100.

## NPTEL Introduction to machine learning Assignment Week 12 Answers:-

Contents

Q1. In solving a classification problem, if in the learned model there is a large difference between the output of the learned model and the expected output of the learned model over various sources of variability, then we can expect _ the component of the generalisation error to be high.

Q2. Given below are some properties of different classification algorithms. In which among the following would you expect feature

Q3. Which of the following measure best analyze the performance of a classifier?

Q4. As discussed in the lecture, most of the classifiers minimize the empirical risk. Which among the following is an exceptional case?

Q5. What do you expect to happen to the variance component of the generalisation error of your model as the size of the training data set increases?

Q6. What happens when your model complexity (such as interaction terms in linear regression, order of polynomial in SVM, etc.) increases?

Q7. Suppose we want an RL agent to learn to play the game of golf. For training purposes, we make use of a golf simulator program. Assume

Q8. You want to toss a fair coin a number of times and obtain the probability of getting heads by taking a simple average. What is the

Q9. You face a particularly challenging RL problem, where the reward distribution keeps changing with time. In order to gain maximum

## NPTEL Introduction to machine learning Assignment Week 11 Answers:-

Q1. During parameter estimation for a GMM model using data X, which of the following quantities are you minimizing (directly or indirectly)?

Q2. When executing the Expectation Maximization algorithm, a common problem is the sheer complexity of the number of parameters
to estimate. For a typical K-Gaussian Mixture Model in an n-dimensional space, how many independent parameters are being estimated in total?

Q3. Which of the following is an assumption that reduces Gaussian Mixture Models to K-means?

Q4. Given N samples x1,x2,…,xN drawn independently from a Gaussian distribution with variance σ2 and unknown mean μ.
Assume that the prior distribution of the mean is also a Gaussian distribution, but with parameters mean μp and variance σ2p. Find the MAP
estimate of the mean.

Q5. You are presented with a dataset that has hidden/missing variables that influences your data. You are asked to use Expectation
Maximization algorithm to best capture the data. How would you define the E and M in Expectation Maximization?

Q6. During parameter estimation for a GMM model using data X, which of the following quantities are you minimizing (directly or indirectly)?

Q7. You are given n p-dimensional data points. The task is to learn a classifier to distinguish between k classes. You come to know
that the dataset has missing values. Can you use EM algorithm to fill in the missing values ? (without making any further assumptions)

## NPTEL Introduction to machine learning Assignment Week 10 Answers:-

Q1. Considering single-link and complete-link hierarchical clustering, is it possible for a point to be closer to points in other clusters
than to points in its own cluster? If so, in which approach will this tend to be observed?

Q2. Consider the following one dimensional data set: 12, 22, 2, 3, 33, 27, 5, 16, 6, 31, 20, 37, 8 and 18. Given k = 3 and initial cluster centers to be 5, 6 and 31, what are the final cluster centres obtained on applying the k-means algorithm?

All the best for the final exam, for extra preparation, take our membership for better score in exam read more here:- Final Exam Membership

Q3. For the previous question, in how many iterations will the k-means algorithm converge?

Q4. In the lecture on the BIRCH algorithm, it is stated that using the number of points N, sum of points SUM and sum of squared
points SS, we can determine the centroid and radius of the combination of any two clusters A and B. How do you determine the centroid of the combined cluster? (In terms of N,SUM and SS of both the clusters)

Q5. What assumption does the CURE clustering algorithm make with regards to the shape of the clusters?

Q6. What would be the effect of increasing MinPts in DBSCAN while retaining the same Eps parameter? (Note that more than
one statement may be correct)

Q7. Visualize the dataset DS1. Which of the following algorithms will be able to recover the true clusters (first check by visual inspection and then write code to see if the result matches to what you expected).

Q8. For two independent runs of K-Mean clustering is it guaranteed to get same clustering results?
Note: seed value is not preserved in independent runs.

Q9. Consider the similarity matrix given below: Which of the following shows the hierarchy of clusters created by the single link
clustering algorithm.

Q10. For the similarity matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.

## NPTEL Introduction to machine learning Assignment Week 9 Answers:-

Q1. Consider the bayesian network shown below.

Two students – Manish and Trisha make the following claims:

• Manish claims P(D|{S, L, C}) = P(D|{L, C})
• Trisha claims P(D|{S, L}) = P(D|L)

Q2. Consider the Bayesian graph shown below in Figure 2.

All the best for the final exam, for extra preparation, take our membership for better score in exam read more here:- Final Exam Membership

Q3. Using the data given in the previous question, compute the probability of following assignment, P(i=1,g=1,s=1,l=0)
irrespective of the difficulty of the course? (up to 3 decimal places)

Q4. Consider the Bayesian network shown below in Figure 3

Two students – Manish and Trisha make the following claims:

• Trisha claims P(H|{S, G, J}) = P(H|{G, J})
• Manish claims P(H|{S, C, J}) = P(H|{C, J})

Q5. Consider the Markov network shown below in Figure 4

Which of the following variables are NOT in the markov blanket of variable “4” shown in the above Figure 4 ? (multiple answers may be correct)

Q6. In the Markov network given in Figure 4, two students make the following claims:

• Manish claims variable “1” is dependent on variable “7” given variable “2”.
• Trina claims variable “2” is independent of variable “6” given variable “3”.

Q7. Four random variables are known to follow the given factorization

P(A1=a1,A2=a2,A3=a3,A4=a4)=1Zψ1(a1,a2)ψ2(a1,a4)ψ3(a1,a3)ψ4(a2,a4)ψ5(a3,a4)

The corresponding Markov network would be

Q8. Consider the following Markov Random Field.

Which of the following nodes will have no effect on H given the Markov Blanket of H?

All the best for the final exam, for extra preparation, take our membership for better score in exam read more here:- Final Exam Membership

Q9. Select the correct pairs of (Inference Algorithm, Graphical Model) (note: more than one option may be correct)

Q10. Here is a popular toy graphical model. It models the grades obtained by a student in a course and it’s implications. Difficulty represents the difficulty of the course and intelligence is an indicator of how intelligent the student is, SAT represents the SAT scores of the student and Letter
presents the event of the student receiving a letter of recommendation from the faculty teaching the course.

## NPTEL Introduction to machine learning Assignment Week 8 Answers:-

Q1. In a given classification problem, there are 6 different classes. In building a classification model, we want to penalise specific
errors made by the model depending upon the actual and predicted class label. For example, given a training data point belonging to class 1, if the model predicts it as class 2, then the penalty for this will be different if for the same data point, the model had predicted it as class 3. To build such a model, we need to select an appropriate

Q2. The Naive Bayes classifier makes the assumption that the ________ are independent given the ________ .

Q3. Consider the problem of learning a function XY , where Y is Boolean. X is an input vector (X1,X2), where X1 is categorical
and takes 3 values, and X2 is a continuous variable (normally distributed). What would be the minimum number of parameters required to define a Naive Bayes model for this function?

All the best for the final exam, for extra preparation, take our membership for better score in exam read more here:- Final Exam Membership

Q4. In boosting, the weights of data points that were miscalssified are _________ as training progresses.

Q5. In a random forest model let m<<p be the number of randomly selected features that are used to identify the best split at
any node of a tree. Which of the following are true? (p is the original number of features) (Multiple options may be correct)

Q6. Consider the following data for 500 instances of home, 600 instances of office and 700 instances of factory type buildings

Q7. Consider the following graphical model, which of the following are false about the model? (multiple options may be correct)

All the best for the final exam, for extra preparation, take our membership for better score in exam read more here:- Final Exam Membership

Q8. Consider the Bayesian network given in the previous question. Let ‘A’, ‘B’, ‘C’, ‘D’and ‘E’denote the random variables shown in the network. Which of the following can be inferred from the network structure?

## NPTEL Introduction to machine learning Assignment Week 7 Answers:-

Q1. For the given confusion matrix, compute the recall

Q2. Which of the following are true?
TP – True Positive, TN – True Negative, FP – False Positive, FN – False Negative

Q3. How does bagging help in improving the classification performance?

Q4. Which method among bagging and stacking should be chosen in case of limited training data? and what is the appropriate reason for your preference?

Q5. Which of the following statements are false when comparing Committee Machines and Stacking

Q6. Which of the following measure best analyze the performance of a classifier?

Q7. For the ROC curve of True positive rate vs False positive rate, which of the following are true?

Q8. Which of the following are true about using 5-fold cross validation with a data set of size n = 100 to select the value of k in the kNN algorithm.

## NPTEL Introduction to machine learning Assignment Week 6 Answers:-

Q1. Decision trees can be used for __________ .

Q2. In building a decision tree model, to control the size of the tree, we need to control the number of regions. One approach to do
this would be to split tree nodes only if the resultant decrease in the sum of squares error exceeds some threshold. For the described method,
which among the following are true?

Q3. In a decision tree, if we decide to swap out the usual splits (of the form xi<k or xi>k) and instead used a linear
combination of features instead, (like βTX+β0 ), where the parameters of the hyperplane β,β0 are also simultaneously learnt, which of
the following statements would be true?

Q4. Having built a decision tree, we are using reduced error pruning to reduce the size of the tree. We select a node to collapse.
For this particular node, on the left branch, there are 3 training data points with the following outputs: 5, 7, 9.6 and for the right branch,
there are four training data points with the following outputs: 8.7, 9.8, 10.5, 11. The average value of the outputs of data points denotes the response of a branch. The original responses for data points along the two branches (left right respectively) were response_left and, response_right and the new response after collapsing the node is response_new. What are the values for response_left, response_right and response_new (numbers in the option are given in the same order)?

Q5. Which among the following split-points for the feature1 would give the best split according to the information gain measure?

Q6. For the same dataset, which among the following split-points for feature2 would give the best split according to the gini index measure?

Q7. In which of the following situations is it appropriate to introduce a new category ’Missing’ for missing values? (multiple options may be correct)

## NPTEL Introduction to machine learning Assignment Week 5 Answers:-

Q1. Decision trees can be used for __________ .

Q2. In building a decision tree model, to control the size of the tree, we need to control the number of regions. One approach to do
this would be to split tree nodes only if the resultant decrease in the sum of squares error exceeds some threshold. For the described method,
which among the following are true?

Q3. In a decision tree, if we decide to swap out the usual splits (of the form xi<k or xi>k) and instead used a linear
combination of features instead, (like βTX+β0 ), where the parameters of the hyperplane β,β0 are also simultaneously learnt, which of
the following statements would be true?

Q4. Having built a decision tree, we are using reduced error pruning to reduce the size of the tree. We select a node to collapse.
For this particular node, on the left branch, there are 3 training data points with the following outputs: 5, 7, 9.6 and for the right branch,

Q5. Which among the following split-points for the feature1 would give the best split according to the information gain measure?

Q6. For the same dataset, which among the following split-points for feature2 would give the best split according to the gini index measure?

Q7. In which of the following situations is it appropriate to introduce a new category ’Missing’ for missing values? (multiple options may be correct)

## NPTEL Introduction to machine learning Assignment Week 5 Answers:-

Q1. You are given the N samples of input (x) and output (y) as shown in the figure below. What will be the most appropriate model y=f(x)

Q2. Given N samples x1,x2,…,xN drawn independently from a Gaussian distribution with variance σ2 and unknown mean μ, find the MLE of the mean.

Q3. Consider the following function.

Q4. Using the notations used in class, evaluate the value of the neural network with a 3-3-1 architecture (2-dimensional input with 1 node for the bias term in both the layers). The parameters are as follows

Answer:- WILL BE UPDATED BY MIDNIGHT AND WILL NOTIFY ON TELEGRAM , CLICK ON BELOW IMAGE FOR LINK

Q5. Which of the following statements are true:

Q6. We have a function which takes a two-dimensional input x=(x1,x2) and has two parameters w=(w1,w2) given by f(x,w)=σ(σ(x1w1)w2+x2) where σ(x)=11+ex.We use backpropagation to estimate the right parameter values. We start by setting both the parameters to 2. Assume that we are given a training point x2=1,x1=0,y=3. Given this information answer the next two questions. What is the value of ∂fw2.

Q7. If the learning rate is 0.5, what will be the value of w2 after one update using backpropagation algorithm?

Q8. Which of the following are true when comparing ANNs and SVMs?

Q9. Which of the following are correct?

Q10. Which of the following are false?

## NPTEL Introduction to machine learning Assignment Week 4 Answers:-

Q1. Suppose we use a linear kernel SVM to build a classifier for a 2-class problem where the training data points are linearly separable. In general, will the classifier trained in this manner produce the same decision boundary as the classifier trained using the perceptron training algorithm on the same training data?

Q2. Consider the data set given below. Claim: PLA (perceptron learning algorithm) can be used to learn a classifier that achieves zero misclassification error on the training data. This claim is:

Q3. For a support vector machine model, let xi be an input instance with label yi. If yi (β^0+xTiβ^)>1 where β0 and β^ are the estimated parameters of the model, then

Q4. Suppose we use a linear kernel SVM to build a classifier for a 2-class problem where the training data points are linearly separable. In general, will the classifier trained in this manner be always the same as the classifier trained using the perceptron training algorithm on the same training data?

Q5. Train a linear regression model (without regularization) on the above dataset.Report the coefficients of the best fit model.
Report the coefficients in the following format:
β0β1β2β3.

Q6. Train an l2 regularized linear regression model on the above dataset. Vary the regularization parameter from 1 to 10. As you increase the regularization parameter, absolute value of the coefficients (excluding the intercept) of the model:

Q7. Train an l2 regularized logistic regression classifier on the modified iris dataset. We recommend using sklearn. Use only the first
two features for your model. We encourage you to explore the impact of varying different hyperparameters of the model. Kindly note that the C parameter mentioned below is the inverse of the regularization parameter λ. As part of the assignment train a model with the following hyperparameters: Model: logistic regression with one-vs-rest classifier, C=1e4
For the above set of hyperparameters, report the best classification accuracy

Q8. Train an SVM classifier on the modified iris dataset. We recommend using sklearn. Use only the first two features for your model. We encourage you to explore the impact of varying different hyperparameters of the model. Specifically try different kernels and the associated hyperparameters. As part of the assignment train models with the following set of hyperparameters
RBF-kernel, gamma = 0.5, one-vs-rest classifier, no-feature-normalization. Try C = 0.01, 1, 10. For the above set of hyperparameters, report the best classification accuracy along with total number of support vectors on the test data.

## NPTEL Introduction to machine learning Assignment Week 3 Answers:-

Q1. Consider the case where two classes follow Gaussian distribution which are centered at (4, 7) and (−4, −1) and have identity covariance matrix. Which of the following is the separating decision boundary using LDA assuming the priors to be equal?

Q2. Consider the following data with two classes. The color indicates different class.

Q3. We discussed the use of MLE for the estimation of parameters of logistic regression model. We used which of the following assumptions to derive the likelihood function ?

Q4. Which of the following statements is true about LDA regarding outliers?

Q5. Consider the following distribution of training data:

Q6. Suppose that we have two variables, X and Y (the dependent variable). We wish to find the relation between them. An expert tells us that relation between the two has the form Y=mlog(X)+c. Available to us are samples of the variables X and Y. Is it possible to apply linear regression to this data to estimate the values of m and c?

Q7. In a binary classification scenario where x is the independent variable and y is the dependent variable, logistic regression assumes that the conditional distribution y|x follows a

Q8. Assuming that you apply LDA to this data, what is the estimated covariance matrix?

Q9. Given the following 3D input data, identify the principal component. (Steps: center the data, calculate the sample covariance matrix, calculate the eigenvectors and eigenvalues, identify the principal component)

Q10. For the data given in the previous question, find the transformed input along the first two principal components.

## NPTEL Introduction to machine learning Assignment Week 2 Answers:-

Q1. Given a training dataset, the following visualization shows the fit of three different models (in blue line). Assume that the test data and training data come from the same distribution. What can you conclude from the following visualizations? Multiple options can be correct.

Q2. Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge regression with tuning parameter lambda to reduce its complexity. Choose the option below which describes relationship of bias and variance with lambda.

Q3. Given a training data set of 10,000 instances, with each input instance having 17 dimensions and each output instance having 2 dimensions, the dimensions of the design matrix used in applying linear regression to this data is

Q4. Suppose we want to add a regularizer to the linear regression loss function, to control the magnitudes of the weights β. We have a choice between Ω1(β)=∑i=1p|β| and Ω2(β)=∑i=12. Which one is more likely to result in sparse weights?

Q5. Consider forward selection, backward selection and best subset selection with respect to the same data set. Which of the following is true?

Q6. In the formulation of the method, we observe that in iteration k, we regress the entire dataset on z0,z1,…zk−1 . It seems like a waste of computation to recompute the coefficients for z0 a total of p times, z1 a total of p−1 times and so on. Can we re-use the coefficients computed in iteration j for iteration j+1 for zj−1 ?

Q7. Consider the following five training examples We want to learn a function f(x) of the form f(x)=ax+b which is parameterised by (a,b). Using squared error as the loss function, which of the following parameters would you use to model this function to get a solution with the minimum loss.

Q8. Here is a data set of words in two languages.

## NPTEL Introduction to machine learning Assignment Week 1 Answers:-

Q1. Which of the following is a supervised learning problem?

Q2 – Which of the following is not a classification problem?

Q3 – Which of the following is a regression task? (multiple options may be correct)

Note:- WE NEVER PROMOTE COPYING AND We do not claim 100% surety of answers, these answers are based on our sole knowledge, and by posting these answers we are just trying to help students to reference, so we urge do you assignment on your own.

#### Also Check:- INTERNSHIP OPPORTUNITIES

Q4 – Which of the following is an unsupervised task?

Q5 – Which of the following is a categorical feature?

Q6 – Let X and Y be a uniformly distributed random variable over the interval [0, 4] and [0, 6] respectively. If X and Y are independent events, then compute the probability, P(max(X,Y)>3)

NOTE:- IF THERE IS ANY CHANGE IN ANSWERS OF NPTEL Introduction to Machine Learning WILL UPDATE BEFORE LAST DATE AND NOTIFY ON TELEGRAM OR WHATSAPP. SO KINDLY JOIN US, CLICK ON BELOW IMAGE AND JOIN US.

Q7 – Let the trace and determinant of a matrix A[acbd] be 6 and 16 respectively. The eigenvalues of A are.

Q8 – What happens when your model complexity increases? (multiple options may be correct)

Note:- WE NEVER PROMOTE COPYING AND We do not claim 100% surety of answers, these answers are based on our sole knowledge, and by posting these answers we are just trying to help students to reference, so we urge do you assignment on your own.

Q9 – A new phone, E-Corp X1 has been announced and it is what you’ve been waiting for, all along. You decide to read the reviews before buying it. From past experiences, you’ve figured out that good reviews mean that the product is good 90% of the time and bad reviews mean that it is bad 70% of the time. Upon glancing through the reviews section, you find out that the X1 has been reviewed 1269 times and only 172 of them were bad reviews. What is the probability that, if you order the X1, it is a bad phone?

Q10 – Which of the following are false about bias and variance of overfitted and underfitted models? (multiple options may be correct)

### NPTEL Introduction to machine learning Assignment Week 0 Answers:-

Q1. There are n bins of which the k-th bin contains k−1 blue balls and nk red balls. You pick a bin at random and remove two balls at random without replacement. Find the probability that:

Q2. A medical company touts its new test for a certain genetic disorder. The false negative rate is small: if you have the disorder, the probability that the test returns a positive result is 0.999. The false positive rate is also small: if you do not have the disorder, the probability that the test returns a positive result is only 0.005. Assume that 2% of the population has the disorder. If a person chosen uniformly from the population is tested and the result comes back positive, what is the probability that the person has the disorder?

Q3. In an experiment, n coins are tossed, with each one showing up heads with probability p independently of the others. Each of the coins which shows up heads is then tossed again. What is the probability of observing 5 heads in the second round of tosses, if we toss 15 coins in the first round and p = 0.4?

Q4. Consider two random variables X and Y having joint density function f(x,y)=2exy,<x<y<∞. Are X and Y independent? Find the covariance of X and Y .

Q5. An airline knows that 5 percent of the people making reservations on a certain flight will not show up. Consequently, their policy is to sell 52 tickets for a flight that can hold only 50 passengers. What is the probability that there will be a seat available for every passenger who shows up?

NOTE:- IF THERE IS ANY CHANGE IN ANSWERS OF NPTEL Introduction to Machine LearningWILL UPDATE BEFORE LAST DATE AND NOTIFY ON TELEGRAM OR WHATSAPP. SO KINDLY JOIN US, CLICK ON BELOW IMAGE AND JOIN US.

Q6. Let X have mass function  f(x)={{x(x+1)}−10ifx=1,2,…,otherwise,

Q7. Is the following a distribution function?

Q8. Can the value of a probability density function be greater than one? What about the cumu- lative distribution function?

Answer:- B – PDF: yes, CDF: no

Q9. You are given a biased coin with probability of seeing a head is p = 0.6 and probability of seeing a tail is q = 0.4. Suppose you toss the coin 10 times, what is the probability of you getting the head at most 2 times? Also, what is the probability of you getting the head for the first time on your fourth attempt?