# NPTEL PYTHON FOR DATA SCIENCE ASSIGNMENT 2021

NPTEL PYTHON FOR DATA SCIENCE The course aims at equipping participants to be able to use the python programming language for solving data science problems. This particular language is widely used in big data analytics and machine learning – two things that go hand in hand with each other nowadays! So we encourage you to master this skill now, while it’s hot and useful as we predict it’s going to stay in demand even further into the future!

PYTHON FOR DATA SCIENCE is a MOOC course offered by IIT Madras on the NPTEL platform. This course helps the students to achieve knowledge of python programming language as well as data science which are a few of the most demanding and emerging technologies in the world. The course is developed by Prof.Rengaswamy is a professor of Chemical Engineering.

1. INTENDED AUDIENCE: Final Year Undergraduates
2. Requirements/Prerequisites: Knowledge of basic data science algorithms
3. INDUSTRY SUPPORT: All industry/companies/organizations will recognize and value this course and recommend this for their employees and trainee programs.

CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of the best 3 assignments out of the total 4 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100 Final score = Average assignment score + Exam score

Students will be eligible for CERTIFICATE ONLY IF AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If any of the 2 criteria are not met, the student will not get the certificate even if the Final score >= 40/100.

## NPTEL PYTHON FOR DATA SCIENCE ONLINE PROCTORED EXAM EVENING SESSION 2021:-

Q1. How many categorical columns are there in the dataframe?

Python for Data Science Assignment 1 Answers 2022

Q2. How many rows in the dataframe contain missing values? If there are null values, then drop the rows with null values.

Q3. What percentage of clients with secondary education have not subscribed to a deposit?

Q4. Using Seaborn’s countplot API, plot a grouped bar plot of marital status and deposit (as hue). Identify the category that has more clients who have subscribed to a deposit than clients who have not

Q5. How many clients who have not subscribed to a deposit have a housing or personal loan?

Q6. What % of clients have subscribed to a term deposit as an outcome of the successful marketing campaign? (approx.)

Python for Data Science Assignment 1 Answers 2022

Q7. Visualize the age and the average yearly balance of clients using a scatter plot from Seaborn API. Select the correct statement from the options below.

Note:- Check answers 15 minutes before the submission deadline if there any change will update 15 min before

Q8. What is the accuracy of the model (Choose the appropriate range)?

Q9. How many samples were misclassified by the model (Choose the appropriate range)?

Q10. What is the sensitivity of the model (Choose the appropriate range)?

Q11. What is the specificity of the model (Choose the appropriate range)?

Q12. What is the prevalence of the model (Choose the appropriate range)?

Q13. What is the sensitivity of the model (Choose the appropriate range)?

Q14. What is the specificity of the model (Choose the appropriate range)?

Q15. What is the prevalence of the model (Choose the appropriate range)?

## NPTEL PYTHON FOR DATA SCIENCE ONLINE PROCTORED EXAM 2021:-

Q1. What is the correlation between the age of the clients and their average yearly bank balance? (approx.)

Q2. How many numerical columns are present in the dataframe?

Q3. How many rows of data contain missing values?

Q4. What is the shape of the dataframe after dropping the rows containing NA values?

Q5. What % of clients with primary education have subscribed to a deposit? (approx.)

Q6. What are the average number of contacts performed during this campaign for clients who have not subscribed to a deposit? (approx.)

Q7. How many clients who have subscribed to a deposit either have a housing or a personal loan?

Q8. Using Seaborn’s countplot(), plot a grouped bar plot of marital status and deposit (as hue). Identify the category that has more clients who have subscribed to a deposit than clients who have not deposited.

Q9. What is the accuracy of the model (Choose the appropriate range)?

Q10. How many samples were misclassified by the model (Choose the appropriate ran

Q11. What is the sensitivity of the model (Choose the appropriate range)?

Q12. What is the specificity of the model (Choose the appropriate range)?

Q13. What is the prevalence of the model (Choose the appropriate range)?

Q14. What is the accuracy of the logistic regression model (Choose the appropriate range)?

Q15. How many samples were misclassified by the logistic regression model?

## NPTEL PYTHON FOR DATA SCIENCE ASSIGNMENT WEEK 4 ANSWERS:-

Q1. Determine the type of problem from a machine learning perspective by considering the target variable as Service (Y) and other variables as input features (X)

Q2. Which of the following machine learning techniques would be appropriate to solve the problem given in the problem statement?

Q3. What is the frequency distribution (in %) of the target variable of train_data dataframe?

Q4. Build a logistic regression model between the input and output variables. What is the accuracy range (in %) of the predictions made over test data?

Q5. Total no. of samples misclassified based on the model built in Q4 are: (given as a range)

Q6. Build a kNN model considering 4 nearest neighbors with the distance metric as Euclidean. What is the accuracy range (in %) of the model for the predictions made on test data? Note: Read the data afresh and follow the same data preparation procedure as mentioned earlier

Q7. If the test data is as follows, the corresponding flower species would be classified under

Q8. Which of the following is/are valid decision rules?

Q9. Check the correlation between all variables and identify the variable that is highly correlated with the output variable, Current.

Q10. Build a multiple linear regression model using train data. What % of variation (given as a range) in the current month’s milk production is explained by the variation of independent features of test data?

## NPTEL PYTHON FOR DATA SCIENCE ASSIGNMENT WEEK 3 ANSWERS:-

Q1. NumPy method(s) that are used to manipulate values based on multiple conditions is/are:

Q2. Choose the appropriate command(s) to subset details of customers who have opted for a Fiber optic internet service?

Q3. From the given data, typically, customerID should have 10 characters. By performing a logical check, identify the indexes of customer IDs whose lengths are not equal to 10.

Q4. From the given data, ‘Dependents‘ should have only 2 levels (Yes / No) but it has 3 levels. Hence, replace the other level with the modal value of the variable and select the correct distribution of the levels of the variable ‘Dependents’ after replacement

Q5. Plot a box plot between the variables MonthlyCharges and InternetService. Which of these interpretations can be made from the box plot?

Q6. Filter the rows of the churn dataframe to identify how many customers have not opted for an internet service but have signed up for any allied service.

Q7. Clean the churn dataframe using the logic given below and answer the question that follows.

Q8. Clean the variable tenure and verify if it has outliers. If yes, what would be the most appropriate measure from the following options to replace the outliers present in that variable?

Q9.Which of the following commands would be best suited to impute the missing values in the SeniorCitizen variable?

Q10. Calculate and perform mean imputation on TotalCharges and MonthlyCharges by grouping the rows based on the levels of the variable Churn. The average total charges and monthly charges paid by the customer after imputation are: (approx.)

## NPTEL PYTHON FOR DATA SCIENCE ASSIGNMENT WEEK 2 ANSWERS:-

Q1. Given a list, ls = [1, 2, 3, 3, 2, 3, 1, 4, 5, 6, 5, 6, 3, 2, 1, 1, 1, 7, 8, 9, 7, 8, 9, 7, 8, 9, 7, 8, 9, 10, 10, 1, 2, 3, 9, 10], which of the following methods would be the most efficient in determining the unique elements present in ls?

Answer:- B – By converting ls into a set, (ANY CHANGE IN ANSWERS WILL BE NOTIFIED ON TELEGRAM, CLICK ON BELOW IMAGE TO JOIN)

Q2. What will be the output obtained from the code snippet provided below?

Q3. State whether the following statements are TRUE or FALSE.

i. A tuple can be directly initialized with its values separated by commas

ii. Dictionary is an ordered sequence data type

Answer:- D – I – TRUE, II – TRUE

Q4. What will be the output for the following code snippet?

Answer:- A – “scitamehtaM si laitnesse ni ynam sdleif ”

Q5. State whether the following statements are TRUE or FALSE.

i. append() method takes a list as an input, and adds a list onto the existing list
ii .extend() method takes a list as an input, and appends the elements individually to the existing list

Answer:- C – I – TRUE, II – TRUE

Q6. State whether the following statements are TRUE or FALSE.

i. The nbytes attribute is used to find the memory consumed by each element of a NumPy array

ii. The itemsize is used to find the total memory consumed by a NumPy array.

Answer:- C- I – FALSE, II – FALSE

Q7. What will be the output for the following code snippet?

Answer:- B – {‘l’: 6, ‘i’: 4, ‘s’: 9, ‘t’: 7, ‘a’: 3, ‘r’: 2, ‘e’: 11, ‘m’: 3, ‘u’: 4, ‘b’: 1, ‘q’: 1, ‘n’: 3, ‘c’: 4, ‘y’: 2, ‘p’: 1, ‘d’: 1, ‘o’: 8, ‘f’: 1, ‘h’: 1, ‘g’: 1}

Q8.Choose the right option based on the code snippet attached below.

Answer:- B – A copy of sample_str has been created from the original sample_str with the replaced values

Q9.Which of the following command(s) can be used to alter the size of a NumPy array.

Q10. State whether the following statements are TRUE or FALSE.

i. A list cannot be appended to a set.

ii. A set can be appended to a set.

Answer:- D -I – TRUE, II – FALSE

## NPTEL PYTHON FOR DATA SCIENCE ASSIGNMENT WEEK 1 ANSWERS:-

Q1. What is the output of the following code?

Answer:- D – Error: Invalid operation, unsupported operator ‘+’ used between ‘int’ and ‘str’

Q2. What would be the datatype of the output obtained by performing division using ‘/’?

Q3. Let a = 5 (101 in binary) and b = 3 (011 in binary), Which of the following operations results in the values 7 and 1?

Answer:- D – a and b ; a or b

Q4. Let x = 20 and y = 100, Which of the following expressions results in True?

Answer:- B – (x ** 5 > 150) or (y < 100)

Q5.     When using the floor division operator (//), if the result is negative, then the result is rounded off to the next largest integer.

Q6.   Let x = “50”    Which of the following commands converts the ‘x’ to float datatype?

Q7. Identify the INCORRECT statement about Python keywords.

Answer:- B – Assigning a value to any Python keyword throws an error

Q8. Consider the following variable initialisation: var = “5+7j”   The data type of var is

Q9. Which of the following variable names are INVALID in Python?

Q10. What do the following commands display?
import sys
print(sys.version)

Answer:- A – A string containing the version number of the Python interpreter plus additional information on the build number and compiler used

Also check :- Internship oppurtinites

Note:- We do not claim 100% surety of answers, these answers are based on our sole knowledge and by posting these answers we are just trying to help students, so we urge do you assignment own your own.