**NPTEL DATA SCIENCE FOR ENGINEERS** Data science training program teaches software engineers about the fundamentals of data analytics and how to apply these principles. This data scientist certification course is beneficial to anyone who wants to learn more about big data without having a sponsorship or trying to break through the red tape.

** DATA SCIENCE FOR ENGINEERS** is a MOOC course offered by IIT Madras on the NPTEL platform. This course helps the students to achieve knowledge analytical thinking via data science techniques. The course is developed by Prof. Ragunathan Rengasamy.

**Who Can Join:**Any interested learner**Requirements/Prerequisites:**Nil**INDUSTRY SUPPORT:**All industry

**CRITERIA TO GET A CERTIFICATE**

Average assignment score = 25% of the average of the best 8 assignments out of the total 12 assignments given in the course.

Exam score = 75% of the proctored certification exam score out of 100 Final score = Average assignment score + Exam score

Students will be eligible for CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If any of the 2 criteria are not met, the student will not get the certificate even if the Final score >= 40/100.

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 8 ANSWER:-

Contents

**Q1.** What is the accuracy (in %) of the model over ** test_data**?

**(Choose the appropriate range)**

**Answer:-** **C**

**Q2.** Total number of misclassified samples obtained from the prediction of the kNN model built in Q1 are: –

**Answer:-** **A**

**Q3.** The sensitivity of the kNN model built in Q1 is: **– (rounded off to two decimal points)**

**Answer:-** **B**

**Q4.** Which of the following products has the highest annual spending unit?

**Answer:-** **A**

**Q5.** Which of the following interpretations are true with respect to the distribution of the variable ** Grocery**?

**Answer:-** **C**

**Q6.** What is the percentage of customers buying the product from different region?

**Answer:-** **A**

**Q7.** The Within Cluster Sum-of-Squares (WCSS values for each cluster (in no specific order) are: –

**Answer:-** **A**

**Q8.** The size of each cluster (in no specific order) is: –

**Answer:-** **B**

**Q9.** What is the Between Cluster Sum-of-Squares (BCSS) value of the K-means model? **(Choose the appropriate range)**

**Answer:-** **C**

**Q10.** What is the Total Sum-of-Squares value of the k-means model? **(Choose the appropriate range)**

**Answer:-** **A**

**Q11.**Elbow plot can be used to decide the optimal k value in both kNN and K-means clustering problems.

**Answer:-** **A**

**Q12.** The most commonly used distance metric to calculate distance between centroid of each cluster and data points in K-means algorithm is

**Answer:-** **C**

**Q13.** A k-Means Clustering model becomes better as

**Answer:-** **C**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 7 ANSWER:-

**Q1.** The total number of missing values in the data frame is.

**Answer:-** **b**

**Q2.** The Pearson’s correlation coefficient between **mpg** & **acceleration** is (rounded off to two decimal places): –

**Answer:-** **a**

**Q3.** What is the value of adjusted R-Squared for **“lr_model”**?

**Answer:-** **c**

**Q4.** The coefficient of the variable ‘displacement’ is:

**Answer:-** **c**

**Q5.** Which of the variables is not significant in “lr_model”?

**Answer:-** **a**

**Q6.** The accuracy of the model is (rounded off to two decimal places): –

**Answer:-** **c**

**Q7.** The sensitivity pertaining to the given confusion matrix is (rounded off to two decimal places)

**Answer:-** **a**

**Q8.** Which command is used to build a logistic regression model in R?

**Answer:-** **a**

**Q9.** The Logistic regression tends to overfit when we have large number of independent variables present.

**Answer:-** **a**

**Q10.** An ROC curve is plotted between.

**Answer:-** **b**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 6 ANSWER:-

**Q1.** The Pearson correlation coefficient for the given data is *_*

**Answer:-** **c**

**Q2.** Data was collected from a laptop manufacturer for a study, to understand the impact of battery life of a laptop on customer satisfaction (in terms of ratings). The data consisted of ** ratings (y)** provided for each laptop based on their

**(in years). The following regression model was obtained**

*battery life (x)***Answer:-** **c**

**Q3.** Homoscedasticity in regression analysis is the condition in which the

**Answer:-** **a**

**Q4.** For the best linear regression model, *R*2

**Answer:-** **c**

**Q5.** Based on the image given above, a model was built with an objective to predict the salary of an individual based on the years of experience. From the given output, what does the p-value indicate with respect to hypothesis testing?

**Answer:-** **a,c**

**Q6.** The adjusted *R*2 for the linear model is ___

**Answer:-** **d**

**Q7.** The third quartile residual value for the linear model built is ___

**Answer:-** **a **

**Q8.** The t value corresponding to the coefficient of weight is ___

**Answer:-** **a**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 5 ANSWER:-

**Q1.** The function min *f*(*x*,*y*)=3*x*+*y* subject to the given constraints *x*2+*y*2<10 is an example of

**Answer:-** **c**

**Q2.** We intend to find the maxima of *f*(*x*,*y*)=3*x*+*y* subject to the constraint *x*2+*y*2=10. The Lagrangian function is:-

**Answer:-** **c**

**Q3.** The values of the stationary points *x*∗,*y*∗ and *λ*∗ for the objective function *f*(*x*,*y*)=3*x*+*y* subject to the given constraints *x*2+*y*2−10=0 are.

**Answer:-** **a**

**Q4.** The values of the stationary points *x*∗,*y*∗ and *λ*∗ for the objective function *f*(*x*,*y*)=5*x*−3*y* subject to the given constraints *x*2+*y*2=136 are.

**Answer:-** **b**

**Q5.** The hessian matrix for the function *f*(*x*,*y*)=−5*x*2+4*xy*+3*y*2+2*x*−*y*

**Answer:-** **c**

**Q6.** The eigen values for the hessian matrix obtained in Q5 are: –

**Answer:-** **c**

**Q7.** A predictive modeling problem where the class label is predicted for the input data is a type of

**Answer:-** **b**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 4 ANSWER:-

**Q1.** If *f*(*x*)=3*x*4−2*x*3−3*x*2+6, then the first order necessary condition for either maxima or minima of *f*(*x*) is

**Answer:-** **b**

**Q2.** For the function *f*(*x*)=3*x*4−2*x*3−3*x*2+6, which of the following point(s) is/are stationary point(s) of *f*(*x*)?

**Answer:-** **a,b,c**

**Q3.** For the function *f*(*x*)=3*x*4−2*x*3−3*x*2+6, the stationary point(s) which maximize(s) the value of *f*(*x*) is

**Answer:-** **b**

**Q4.** For the function *f*(*x*)=3*x*4−2*x*3−3*x*2+6, the stationary point(s) which minimize(s) the value of *f*(*x*) is

**Answer:-** **a,c**

**Q5.** If the objective function, inequality constraints, equality constraints are all linear functions, then the type of optimization problem is:

**Answer:-** **c**

**Q6.** For any two points x1, and x2 in the range and any 0<*λ*<1, if f(x) is a convex function then:

**Answer:-** **a**

**Q7.** Consider an optimization function f(x), if x is the decision variable and f is the function to be minimized, then the type of optimization problem is

**Answer:-** **b**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 3 ANSWER:-

**Q1.** Sum of the deviations about mean is

**Answer:-** **c**

**Q2.** The mode of the normal distribution is

**Answer:-** **d**

**Q3.** For the positively skewed distribution the extreme values will lie in _____

**Answer:-** **a**

**Q4.** The domain of the t distribution is

**Answer:-** **b**

**Q5.** The statistical power of a test is denoted by

**Answer:-** **c**

**Q6.** If type I error is decreases

**Answer:-** **b**

**Q7.** The average number of car drivers killed after the law was in effect is *__*?

**Answer:-** **c**

**Q8.** How many front seat passengers were injured or killed in the year 1984

**Answer:-** **b**

**Q9.** Calculate the variance for the variables “front” and “rear” and choose the correct option.

**Answer:-** **c**

**Q10.** Maximum kms driven by the driver is ___?

**Answer:-** **a**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 2 ANSWER:-

Consider a data matrix, ‘M’ which comprises the information of 150 cricket players which includes player’s jersey number, number of matches played, total runs, average, highest score, number of 100’s, and number of 50’s.

**Q1.** What would be the size of the matrix ‘**M**’?

**Answer:-** **b**

**Q2.** Rank of the matrix ‘M’ is 3 then what would be the nullity (number of equations) for the matrix ‘**M**’?

**Answer:-** **c**

**Q3.** Rank of the matrix, *A*=⎡⎣⎢⎢⎢⎢⎢⎢

**Answer:-** **4**

**Q4.** Eigen values of the given matrix D is

**Answer:-** **d**

**Q5.** Eigen vectors of the given matrix D is

**Answer:-** **b**

**Q6.**The product of roots of characteristic equation of a square matrix ???? is equal to

**Answer:-** **a**

**Q7.** Which of the following vector(s) is / are orthogonal?

**Answer:-**

**Q8.** If ???? and ???? are any two square matrices of SAME dimensions such that ???????? = ???? and if ???? is non-singular, then

**Answer:-** **b**

**Q9.** The point is in ______ half space of the hyper plane *x*1−9*x*2+3*x*3+2*x*4=8

**Answer:-** **a**

**Q10.** The trace of a matrix A can be found by

**Answer:-** **a,b**

## DATA SCIENCE FOR ENGINEERS ASSIGNMENT WEEK 1 ANSWER:-

**Q1.** Choose the variable name that is invalid in R.

**Answer:-** **C – 1data = 10**

**Q2.** The command to access help in R Studio is—

**Answer:-** **B – help(topic)**

**Q3.** In the R code given below, the value of “i” at which the loop breaks is_____

**Answer:-** **C – 10**

**Q4.** The library that supports right _join () function in R is______

**Answer:-** **A – dplyr**

**Q5.** The value of sum and month when i = 9 is____

**Answer:-** **C – “Oct” “63”**

**Q6.** Table 1 provides the scores of the students in three subjects. Create a data frame called students_scores out of Table 1. Which of the following options gives Table 2 as an output?

**Answer:-** **A – melt(students_scores, id.vars = c(“Name”) , measure.vars =c(“Mathematics”, “English”,”Science”) )**

**Q7.** The command to add a new row to the data frame “athletics” with the following values passed to each vector?

**rank**= 4, **competitor**=”Asafa”, **mark**=9.72

**Answer:-** **C – athletics=rbind(athletics,data.frame(rank=4,competitor=”Asafa”,mark=9.72))**

**Q8.** The command to add a new column to the data frame “**athletics**” with vector “**nationality**” taking values “JAM”,”USA”,”JAM”,”JAM” is:-

**Answer:-** **B – athletics=cbind(athletics,data.frame(nationality = c(“JAM”,”USA”,”JAM”,”JAM”)))**

**Q9.** The correct way to extract all elements for which “**mark**” is less than 9.69 using the “**subset**” command is_____

**Answer:-** **A – subset(athletics, athletics$mark <9.69)**

**Q10.** Which of the following defined functions will return the output as 9.37?

**Answer:-** **D – None of the above**

**Also check:- Internship opportunities**