CPS 841/CP8309: Reinforcement Learning
Course Management Form, Winter 2017
||mes (at) scs (dot) ryerson (dot) ca
(write cps841 in Subject of your email)
||The Centre for Computing and Engineering, ENG275
|| Friday, 12:10-13:00 (every week)
Monday, 13:10-13:45 (by appointment only)
|| Vitaliy Batusov (vbatusov (at) scs dot ryerson.ca)
This course will provide a comprehensive introduction to reinforcement learning, a powerful approach to
learning from interaction to achieve goals in stochastic and deterministic environments. Reinforcement
learning has adapted key ideas from machine learning, operations research, control theory, psychology,
and neuroscience to produce some strikingly successful engineering applications. The focus is on algorithms that learn what actions to take, and when to take them, so as to optimize long-term performance.
This may involve sacrificing immediate reward to obtain greater reward in the long-term or just to obtain more information about the environment. The course will cover Markov decision processes, dynamic
programming, temporal-difference learning, Monte Carlo reinforcement learning methods, function approximation methods, and the integration of learning and planning. The course covers some of the key
approaches underlying the success of the modern computer programs that can defeat human professional
players in the game of Go and other classic games.
The course requires ability to write computer programs in one of the modern programming
languages such as C, Java or Python (CPS305 or equivalent) as well as basic probability
theory (CPS420 or equivalent) and calculus (MTH 207 or equivalent).
For undergraduate students: CPS 721.
Required Text Book:
Andrew Barto and R. S. Sutton
Reinforcement Learning: An Introduction.
Cambridge, MA: MIT Press, 1998, or draft of the
2nd edition (2017).
(Clicking on the title will take you to a full description of the 1st edition,
from which you can see what will be covered in this course.)
Algorithms for Reinforcement Learning (freely available online).
Morgan & Claypool Publishers, 2010.
Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 4.
Dimitri Bertsekas, ``Dynamic Programming and Optimal Control".
Athena Scientific; 4th edition, 2012, volume 2, Chapter 6
Approximate Dynamic Programming, a draft from November 11, 2011.
Hector Geffner, Blai Bonet
``A Concise Introduction to Models and Methods for Automated Planning",
Chapter 6. Morgan and Claypool Publishers, 2013.
Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 7, No. 2.
June 2013. Available online from Ryerson Library.
3 assignments (10% each): worth a total of 30% of the final grade.
Midterm: 25%. Final exam: 45%.
Graduate students will have to do additional work on each assignment and test.
Undergraduate students can earn bonus marks for doing extra work.
This course focuses on topics related to reinforcement learning. The course will cover making multiple-
stage decisions under uncertainty, heuristic search in planning, Markov decision processes, dynamic programming, temporal-difference learning including Q-learning, Monte Carlo reinforcement learning methods, function approximation methods, and the integration of learning and planning.
Policy on collaboration in homework assignments
The students are strongly encouraged to take notes in class, and study their notes
after class. Learning can be a gradual process that requires time and efforts.
The students benefit from attending lectures since some important details
will be discussed only there. For this reason, attending lectures is mandatory.
Some of the announcements and clarifications mentioned in class will not be
communicated by any other means. It is your responsibility to find the news
mentioned in class, if you missed a class.
Electronic devices: turn off your mobile phones and all other electronic devices in class.
You can keep your laptop or tablet open only if you use it to take notes in class.
The quizzes, midterm test, and the
final exam may include short essay and yes/no questions, as well
as problem solving (but not programming questions).
The duration of these examinations will be 15, 1h30min,
and 2h30 minutes, respectively.
There will be no supplemental examinations.
Brief quizzes may be given without prior warning
(marks will contribute to class participation).
The final exam will be cumulative and will include all the material covered throughout
the term. Grades are earned for the demonstration of knowledge. If you miss
a midterm test or a final exam for medical reasons, you have to provide an official
medical certificate AND
Academic Consideration form to the department of Computer Science within 3 working
days. You have to bring your documents yourself to the CS front reception desk.
Similarly, all documentation related to special accomodation
or academic consideration should be submitted to the CS program office within
the specified time limits.
Dates are subject to change, all changes will be announced in class and
on the Web course shell.
Assignments should be submitted on or before the deadline
specified in the assignment
(you are encouraged to submit assignments earlier).
Your assignment is considered late if any part of the assignment is late
(even if it is just 1 minute late). The penalty for a late assignment is 10% off.
No assignments will be accepted if more than 24 hours late.
Start solving your assignment on the same day when it is posted. Do not procrastinate.
No make-up assignments. Late assignments: to hand in the printout,
you can give it in person to a secretary at the CS reception desk and ask her/him
to put a stamp on your assignment to confirm that you handed in your assignment in time.
Send email to the TA who is responsible for marking this assignment:
inform that a hard copy of your assignments is available from the front desk.
All assignments have to be submitted electronically using a special purpose script
that you can run on any moon computer (log in Linux operating system to run this script).
You can submit your assignment either locally from labs, or remotely from home.
If you have decided to submit it remotely, it is your responsibility to make sure
that you have ssh software installed at your home computer. You need this software
to login remotely into any of the moons and run a specified script there.
You are expected to know basic UNIX commands and utilities.
Also, it is your responsibility to keep your Computer Science email account
in good standing and know your login/password information. Contact one of system
administrators if you have technical questions.
From time to time, I will hand out exercises.
The students are expected to solve the exercises, but
they will not be graded. However, working on exercises
will improve your understanding of this course
(and will help you to get better marks on tests).
Up to 5% (or less) extra credit may be assigned for active class participation
throughout the term, e.g., a student attends most of the classes, participates
actively by asking/answering questions, solves exercises in class.
Class participation marks are earned for active course participation and
given at discretion of the course instructor; they cannot be requested by the students.
Unexplained lack of attendance can negatively affect one's grade.
Handouts and assignments will be made available on the Web only.
You are responsible for visiting
the course Web pages regularly and reading assignments and tests related information
that is provided or linked from these Web pages. In particular, Frequently Answered
Questions (FAQs) related to cps721 home work will be linked from this Web page.
These FAQs are considered to be an integral part of the assignment.
Before sending your questions by e-mail to the instructor, check these Web pages
whether similar questions have been already answered.
Grades for tests and assignments will be posted on my.ryerson.ca Web site no later than two weeks after the due date (test date). Marking guides, the assignments and some other course related documents will be also posted on my.ryerson.ca only. Graded work will be usually returned to students within two weeks. If an electronic copy of the assignment was marked by a TA using a script, in this case hard copies will not be normally returned. The lead partner who submitted an assignment from a team will receive an email message from the TA who was marking the assignment. This email message will include the mark for the assignment and brief explanations when and why penalties for errors were applied. All other team members have to contact their team leader to get feedback about their assignment.
Limited collaboration in discussing general approaches to problems
is allowed (only with one other student); no collaboration is allowed
between teams. You may discuss assignments only with one another student
currently taking the course.
However, you should never put your name on anything
you do not understand.
you must be able to reproduce and explain all solutions by yourself.
If you cannot explain a solution that you handed in, or if you cannot solve
an exercise similar to questions in your home work, this will negatively affect
your grade. In particular, you might be asked to solve exercises during the office hours,
during one of the labs, or in class (as a quiz). These unscheduled tests or evaluations
can be given at any time without prior notice. Remember that if you work with partners,
you are still expected to know solutions of all exercises from the home work. Grades are
earned for the demonstration of knowledge. In cases when a student fails to demonstrate
knowledge about a home work, the grade for the home work can be decreased to 0.
The first page of your homework should include: the name of all
students with whom you discussed any homework problems (even briefly).
Otherwise, it is assumed that you didn't discuss with anyone except the
instructor. Copied work (both original and copies) will be graded as 0.
Involvement with plagiarism will be penalized in accordance with Academic Policy 60.
Additional penalty for copied work may be assigned as deterrence against plagiarism.
More specifically, additional penalty for a copied assignment (in part or in whole)
can be up to -10% of the final course grade.
Policy on Non-Academic Conduct
No disruption of instructional activities is allowed. Among many other infractions,
the Code specifically refers to the following as a violation:
``Disruption of Learning and Teaching - Students shall not behave in disruptive ways
that obstruct the learning and teaching environment." In particular, the students can
use the laptops (and similar electronic devices) in class only for taking notes.
In difficult cases, penalties can be imposed by the Student Conduct Officer.
- Grades are earned for the demonstration of knowledge.
Read carefully the marking guide for the assignment or test you'd like to be remarked.
Your grade may go up, down, or remain the same.
Fill in this
remarking form (available online).
Give the form and your assignment/test to TA who marked your work or
to the instructor (at lecture time or scheduled
office hour), who will forward it to a TA.
If you are not satisfied with the TA's remarking, you can appeal
to the instructor.
You may not submit a remarking request later than ONE WEEK from the
date on which the assignments/tests were returned in class.
It's your responsibility to pick up your work ASAP.
Your mark can decrease if TA sees something that was incorrectly
awarded too high a mark.
Tentative Course Calendar
(all changes of dates will be announced)
||Grade Value (%)
February 27, Monday, 4-6pm, ENG203
April 26, 15:00, ENG202