Kosta Derpanis, Ph.D.
Assistant Professor, Ryerson University
Department of Computer Science
Room ENG265, George Vari Engineering and Computing Centre
kosta _at_ scs.ryerson.ca

NEWS

Dec. 4, 2014 The NSERC Undergraduate Student Research Award (USRA) competition deadline is in February, 2015. As part of this program, I'm looking to take on interns for the summer of 2015 to conduct research in my lab. Undergraduate students currently in their third year of studies are my preference. If interested, please contact me via email. Note that this award is highly competitive, an A- to A+ is typically required to win these awards. Additional information about the program from last year's competition is available here.

Dec. 4, 2014 I'm currently recruiting graduate students at the M.Sc. level for the fall of 2015.

Dec. 4, 2014 For the Winter 2015 winter term, I will be teaching my introduction to computer vision course CPS843/CP8307.  The course webpage is available here.

Oct. 8, 2014 As part of NVIDIA's Academic Hardware Donation Program I received a Tesla K40 for research in deep learning. Thank you NVIDIA!

Jun. 24, 2014 Recognized by CVPR 2014 as an "outstanding reviewer"

Mar. 1, 2014 I'm a distal fellow of the NSERC Canadian Field Robotics Network (NCFRN)

Jan. 2014 1 paper accepted at ICRA 2014.

Aug. 2013 1 paper accepted at ICCV 2013.

Jan. 2013 I'm an Adjunct Professor at York University

SUPERVISION

Adam Harley (M.Sc.)

Christopher Kong (M.Sc. co-supervisor)

SELECTED PUBLICATIONS

Single Image 3D Object Detection and Pose Estimation for Grasping
We present a novel approach for detecting objects and estimating their 3D pose in single images of cluttered scenes. Objects are given in terms of 3D models without accompanying texture cues. A deformable parts-based model is trained on clusters of silhouettes of similar poses and produces hypotheses about possible object locations at test time. Objects are simultaneously segmented and verified inside each hypothesis bounding region by selecting the set of superpixels whose collective shape matches the model silhouette. A final iteration on the 6-DOF object pose minimizes the distance between the selected image contours and the actual projection of the 3D model. We demonstrate successful grasps using our detection and pose estimate with a PR2 robot. Extensive evaluation with a novel ground truth dataset shows the considerable benefit of using shape-driven cues for detecting objects in heavily cluttered scenes.
Menglong Zhu, Konstantinos Derpanis, Yinfei Yang, Samarth Brahmbhatt, Mabel Zhang, Cody Phillips and Kostas Daniilidis
ICRA 2014
From Actemes to Action: A Strongly-supervised Representation for Detailed Action Understanding
This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, keypoint labels (e.g., position) across spacetime are used in a data-driven training process to dis- cover patches that are highly clustered in the spacetime key- point configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of keypoints and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output. This output sheds further light into detailed action understanding.
Weiyu Zhang, Menglong Zhu and Konstantinos Derpanis
ICCV 2013
Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. Importantly, the descriptor allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template, derived from a single exemplar video, across candidate video sequences. The general approach presented for action spotting and recognition is amenable to efficient implementation, which is deemed critical for many important applications. For action spotting, details of a real-time GPU-based instantiation of the proposed approach are provided. Empirical evaluation of both action spotting and action recognition on challenging datasets suggests the efficacy of the proposed approach, with state-of-the-art performance documented on standard datasets.
Konstantinos Derpanis, Mikhail Sizintsev, Kevin J. Cannons, and Richard Wildes
PAMI 2013
Dynamic Scene Understanding: The Role of Orientation Features in Space and Time in Scene Classification
Natural scene classification is a fundamental challenge in computer vision. By far, the majority of studies have limited their scope to scenes from single image stills and thereby ignore potentially informative temporal cues. The current paper is concerned with determining the degree of performance gain in considering short videos for recognizing natural scenes. Towards this end, the impact of multiscale orientation measurements on scene classification is systematically investigated, as related to: (i) spatial appearance, (ii) temporal dynamics and (iii) joint spatial appearance and dynamics. These measurements in visual space, x-y, and spacetime, x-y-t, are recovered by a bank of spatiotemporal oriented energy filters. In addition, a new data set is introduced that contains 420 image sequences spanning fourteen scene categories, with temporal scene information due to objects and surfaces decoupled from camera-induced ones. This data set is used to evaluate classification performance of the various orientation-related representations, as well as state-of-the-art alternatives. It is shown that a notable performance increase is realized by spatiotemporal approaches in comparison to purely spatial or purely temporal methods.
Konstantinos Derpanis, Matthieu Lecce, Kostas Daniilidis and Richard Wildes
CVPR 2012
Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis
This paper is concerned with the representation and recognition of the observed dynamics (i.e., excluding purely spatial appearance cues) of spacetime texture based on a spatiotemporal orientation analysis. The term “spacetime texture” is taken to refer to patterns in visual spacetime, x-y-t, that primarily are characterized by the aggregate dynamic properties of elements or local measurements accumulated over a region of spatiotemporal support, rather than in terms of the dynamics of individual constituents. Examples include image sequences of natural processes that exhibit stochastic dynamics (e.g., fire, water, and windblown vegetation) as well as images of simpler dynamics when analyzed in terms of aggregate region properties (e.g., uniform motion of elements in imagery, such as pedestrians and vehicular traffic). Spacetime texture representation and recognition is important as it provides an early means of capturing the structure of an ensuing image stream in a meaningful fashion. Toward such ends, a novel approach to spacetime texture representation and an associated recognition method are described based on distributions (histograms) of spacetime orientation structure. Empirical evaluation on both standard and original image data sets shows the promise of the approach, including significant improvement over alternative state-of-the-art approaches in recognizing the same pattern from different viewpoints.
Konstantinos Derpanis and Richard Wildes
PAMI 2012
The Structure of Multiplicative Motions in Natural Imagery
A theoretical investigation of the frequency structure of multiplicative image motion signals is presented, e.g., as associated with translucency phenomena. Previous work has claimed that the multiplicative composition of visual signals generally results in the annihilation of oriented structure in the spectral domain. As a result, research has focused on multiplicative signals in highly specialized scenarios where highly structured spectral signatures are prevalent, or introduced a nonlinearity to transform the multiplicative image signal to an additive one. In contrast, in this paper, it is shown that oriented structure is present in multiplicative cases when natural domain constraints are taken into account. This analysis suggests that the various instances of naturally occurring multiple motion structures can be treated in a unified manner. As an example application of the developed theory, a multiple motion estimator previously proposed for translation, additive transparency, and occlusion is adapted to multiplicative image motions. This estimator is shown to yield superior performance over the alternative practice of introducing a nonlinear preprocessing step.
Konstantinos Derpanis and Richard Wildes
PAMI 2009
On the Role of Representation in the Analysis of Visual Spacetime
The problems under consideration in this dissertation centre around the rep- resentation of visual spacetime, i.e., (visual) image intensity (irradiance) as a function of two-dimensional spatial position and time. In particular, the over- arching goal is to establish a unified approach to representation and analysis of temporal image dynamics that is broadly applicable to the diverse phenomena in the natural world as captured in two-dimensional intensity images. Previous research largely has approached the analysis of visual dynamics by appealing to representations based on image motion. Although of obvious importance, motion represents a particular instance of the myriad spatiotemporal patterns observed in image data. A generative model centred on the concept of spacetime orientation is proposed. This model provides a unified framework for understanding a broad set of important spacetime patterns. As a consequence of this analysis, two new classes of patterns are distinguished that have previously not been considered directly in terms of their constituent spacetime oriented structure, namely multiplicative motions (e.g., translucency) and stochastic-related phenomena (e.g., wavy water and windblown vegetation). Motivated by this analysis, a represen- tation is proposed that systematically exposes the structure of visual spacetime in terms of local, spacetime orientation. The power of this representation is demonstrated in the context of the following four fundamental challenges in computer vision: (i) spacetime texture recognition, (ii) spacetime grouping, (iii) local boundary detection and (iv) the detection and spatiotemporal localization of an action in a video stream.
Konstantinos Derpanis (Supervisor: Richard Wildes)
Dissertation, York University, 2010
CIPPRS 2010 Doctoral Dissertation Award, Honorable Mention

TEACHING