# NPTEL Introduction to Machine Learning Assignment 10 Answers 2022 Are you looking for the Answers to NPTEL Introduction to Machine Learning Assignment 10 – IIT Madras? This article will help you with the answer to the National Programme on Technology Enhanced Learning (NPTEL) Course “NPTEL Introduction to Machine Learning Assignment 10

## What is Introduction to Machine Learning?

With the increased availability of data from varied sources there has been increasing attention paid to the various data driven disciplines such as analytics and machine learning. In this course we intend to introduce some of the basic concepts of machine learning from a mathematically well motivated perspective. We will cover the different learning paradigms and some of the more popular algorithms and architectures used in each of these paradigms.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Below you can find the answers for NPTEL Introduction to Machine Learning Assignment 10

## NPTEL Introduction to Machine Learning Assignment 10 Answers:-

Q1. The pairwise distance between 6 points is given below. Which of the option shows the hierarchy of clusters created

Q2. For the pairwise distance matrix given in the previous question, which of the following shows the hierarchy of clusters created by the complete link clustering algorithm.

For Final exam preparation material Click Me

Q3. In BIRCH, using number of points N, sum of points SUM and sum of squared points SS, we can determine the centroid
and radius of the combination of any two clusters A and B. How do you determine the radius of the combined cluster?
(In terms of N,SUM and SS of both two clusters A and B)

Q4. Statement 1: CURE is robust to outliers.

Statement 2: Because of multiplicative shrinkage, the effect of outliers is dampened.

Q5. Run K-means on the input features of the iris dataset using the following initialization:
KMeans(n clusters=3, random state=seed)

Usually, for clustering tasks, we are not given labels, but since we do have labels for our dataset, we can use accuracy to determine
how good our clusters are.

Label the prediction class for all the points in a cluster as the majority true label.E.g. {a, a, b} would be labeled as {a, a, a}What is the accuracy of the resulting labels?

For Final exam preparation material Click Me

Q6. For the same clusters obtained in the previous question, calculate the rand-index. Formula for rand-index:

R=a+bCn2

Where,a = number of times a pair of elements occur in the same cluster in both sequences.b = number of times a pair of elements occur in the different clusters in both sequences.

Q7. a in rand-index can be viewed as true positives(pair of points belonging to the same cluster) and b as true negatives(pair of points
belonging to different clusters). How then, are rand-index and accuracy from the previous two questions related?

Q8.Run BIRCH on the input features of iris dataset using Birch(n clusters=3, threshold=1). What is the rand-index obtained?

Q9. Run BIRCH on the following values of threshold parameter: [0.01, 0.02, 0.03, …, 0.99, 1.00] using the same command as given
in the previous question. What value of threshold achieves the best rand-index?

Q10. Run PCA on Iris dataset input features with n components = 2. Now run DBSCAN using DBSCAN(eps=0.5, min samples=5) on both the original features and the PCA features. What are their respective number of outliers/noisy points detected by DBSCAN?

As an extra, you can plot the PCA features on a 2D plot using matplotlib.pyplot.scatter with parameter c = y-pred (where y-pred is the cluster prediction) to visualise the clusters and outliers.