## What is Python for Data Science?

Python for Data Science is a fun-filled course where Domain Certification helps learners to gain expertise in a specific Area/Domain. This can be helpful for learners who wish to work in a particular area as part of their job or research or for those appearing for some competitive exam or becoming job ready or specialising in an area of study.

Every domain will comprise Core courses and Elective courses. Once a learner completes the requisite courses per the mentioned criteria, you will receive a Domain Certificate showcasing your scores and the domain of expertise.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.

Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

**YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.**

## NPTEL Python for Data Science Assignment 4 Answers:-

**Q1.** How many unique values are present in the Sbal feature; also, what is the most frequent value within Sbal?

**Answer:-** ** c**

**Q2.** Find the average age of those customers who have a credit history [Chist] wherein the dues are not paid earlier.

**Answer:-** **b**

**Q3.** A Logistic Regression model is built in which none of the features used are standardized. The train to test proportion is 75:25 and the random state is set to 1. The accuracy of the model is ________.

**Answer:-** **c**

**Q4.** Import StandardScaler() from the sklearn.preprocessing package to standardize the features. Use the same train-test proportion and the random state should be set to 1. After standardizing the logistic regression model, by what percentage has the misclassified samples changed?

**Answer:-** **c**

**Q5.** When KNN classification is applied on the same standardized data at the optimal value for k nearest neighbours, the accuracy achieved is ______.

**Answer:-** **a**

**Q6.** A multiple linear regression model is built on the Global Happiness Index dataset “GHI_Report.csv”. What is the rmse of the baseline model?

**Answer:-** **c**

**Q7.** From the multiple linear regression model built on the GHI index, we get an R-squared value of _______ on the test data subset.

**Answer:-** **d**

**Q8.** Which of the following statement/s about Linear Regression is / are true?

**Answer:-** ** a,b,c**

**Q9.** Which of the following statements is inaccurate about Logistic Regression?

**Answer:-** **c**

**Q10.** In a KNN model, by which means do we handle categorical variables?

**Answer:-** **b**

