# NPTEL An Introduction to Artificial Intelligence Assignment 11 Answers 2022 Are you looking for the Answers to NPTEL An Introduction to Artificial Intelligence Assignment 11 – IIT Delhi? This article will help you with the answer to the National Programme on Technology Enhanced Learning (NPTEL) Course “NPTEL An Introduction to Artificial Intelligence Assignment 11

## What is An Introduction to Artificial Intelligence?

An Introduction to Artificial Intelligence by IIT Delhi course introduces the variety of concepts in the field of artificial intelligence. It discusses the philosophy of AI, and how to model a new problem as an AI problem. It describes a variety of models such as search, logic, Bayes nets, and MDPs, which can be used to model a new problem. It also teaches many first algorithms to solve each formulation. The course prepares a student to take a variety of focused, advanced courses in various subfields of AI.

## CRITERIA TO GET A CERTIFICATE

Average assignment score = 25% of the average of best 8 assignments out of the total 12 assignments given in the course.
Exam score = 75% of the proctored certification exam score out of 100

Final score = Average assignment score + Exam score

YOU WILL BE ELIGIBLE FOR A CERTIFICATE ONLY IF THE AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75. If one of the 2 criteria is not met, you will not get the certificate even if the Final score >= 40/100.

Below you can find the answers for NPTEL An Introduction to Artificial Intelligence Assignment 11

## NPTEL An Introduction to Artificial Intelligence Assignment 11 Answers:-

Q1. What is the key point about reinforcement learning in a strong simulator setting?

Q2. Suppose you are doing Passive Learning on the following state space with the given policy of actions (as mentioned by arrows in the cell). A4 and C4 are absorbing states. Reward for each of the 4 actions (up, down, left, right) is -1. Discount factor is 1.

Now on executing this policy for learning, you reach state B3 1000 times, and on executing the up action, you reach A3 299 times, B2 294 times, B4 349 times and C3 58 times. Note : All of these numbers indicate how many times each state is reached in the first step once we execute am Up action from B3. Then write the value of T(B3, U, C3) without any smoothing performed.

Q3. Which of the following statements are correct about Boltzmann Exploration?

Q4. If we implement Q-Learning with 𝝰=0.9, then what will be the value of Q(c, RIGHT)? All Q(s,a) pairs are initialized to zero. Assume discounting factor to be 1.

Q5. If we have an epsilon greedy policy with epsilon=0.2 then what is the probability of the agent taking action RIGHT in state C after the first episode is over?

Q6.Which of the following algorithms can we use to compute optimal policies assuming we do not know the parameters of the MDP, but a simulator for it is available?

Q7. Let us say that we wish to do feature-based Q learning to find the optimal policy for an MDP. Assume n feature functions, f1(s, a), f2(s, a)…fn(s, a) with weights w1, w2,…wn, that are all initialized to 0. Assume discount factor and learning rate both to be equal to 1.

Assume that our initial state is s1, and on taking action a1, we transition to s2, earning a reward of 10. On updating the feature weights, we observe that w1 increased. Select the most appropriate choice(s).

Q8. Q(C, go)

Q9. Q(B, go)

Q10. Q(A, go)