# NPTEL Data Science for Engineers Assignment 8 Answers 2022

## What is Data Science for Engineers?

Data Science for Engineers is a fun-filled course where Domain Certification helps learners to gain expertise in a specific Area/Domain. This can be helpful for learners who wish to work in a particular area as part of their job or research or for those appearing for some competitive exam or becoming job-ready or specialising in an area of study.

Every domain will comprise Core courses and Elective courses. Once a learner completes the requisite courses per the mentioned criteria, you will receive a Domain Certificate showcasing your scores and the domain of expertise.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 6 assignments out of the total 8 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

## NPTEL Data Science for Engineers Assignment 8 Answers:-

Q1. The Euclidean distance between the two data points X(−5,5) and Y(10,10) is _________ (Rounded off to three decimal places)

Q2. kNN is used for both function approximation and classification problems.

Consider the dataset “USArrests.csv”.

This data set contains statistics, in arrests per 100,000 residents for assault, murder, and rape in each of the 50 US states in 1973. Also given is the percent of the population living in urban areas.

• Set the column “States” as index of the data frame while reading the data
• Set the random number generator to set.seed(123)
• Normalize the data using scale function and build the K-means algorithm with the given conditions:

Q3. According to the built model ,the within cluster sum of squares for each cluster is __________ (the order of values in each option could be different):-

Q4. According to the built model, the size of each cluster is __________ (the order of values in each option could be different):-

Q5. The Between Cluster Sum-of-Squares (BCSS) value of the built K-means model is _________(Choose the appropriate range)

Q6. The Total Sum-of-Squares value of the built k-means model is _________ (Choose the appropriate range)

Q7. A k-Means Clustering model becomes better as

Q8. Larger K values in K-means clustering __________