Kosta Derpanis, Ph.D.
Associate Professor, Ryerson University
Department of Computer Science
Room ENG265, George Vari Engineering and Computing Centre
kosta _at_ scs.ryerson.ca


Mar. 6, 2016 Paper accepted to IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016.

Jan. 2, 2016 Interested in visual computing (e.g,. computer vision, graphics and virtual reality)? You may want to enrol in the following elective undergraduate courses. For further guidance, please feel free to seek out myself or Prof. McInerney.

Dec. 17, 2015 Happy to receive the Faculty of Science Dean's Teach Award. video

Dec. 15, 2015 The NSERC Undergraduate Student Research Award (USRA) competition deadline is in February, 2016. As part of this program, I'm looking to take on interns for the summer of 2016 to conduct research in my lab. Undergraduate students currently in their third year of studies are my preference. If interested, please contact me via email. Note that this award is highly competitive, an A- to A+ is typically required to win these awards. Additional information about the program from a previous year's competition is available here.

Dec. 15, 2015 I'm currently recruiting graduate students at the M.Sc. and Ph.D. levels for the Fall 2016.

Dec. 14, 2015 As part of NVIDIA's Academic Hardware Donation Program I received a Tesla K40 for research in deep learning. Thank you NVIDIA!

Nov. 23, 2015 Promoted to Associate Professor.

Aug. 26, 2015 Adam Harley (M.Sc. candidate) wins Best Student Paper Award for our International Conference on Document Analysis and Recognition (ICDAR) 2015 paper.

May 13, 2015 Gave a talk titled "Building Seeing Machines" for Ryerson's Days of Science event for high school students (pdf, clickable mov).

May 13, 2015 Paper accepted at International Conference on Document Analysis and Recognition (ICDAR) 2015.

April 21, 2015 As part of NVIDIA's Academic Hardware Donation Program I received a Tesla K40 for research in deep learning. Thank you NVIDIA!

Feb 1, 2015 Paper accepted by the Journal of Field Robotics.

Dec. 4, 2014 For the Winter 2015 winter term, I will be teaching my introduction to computer vision course CPS843/CP8307.  The course webpage is available here.

Oct. 8, 2014 As part of NVIDIA's Academic Hardware Donation Program I received a Tesla K40 for research in deep learning. Thank you NVIDIA!

Jun. 24, 2014 Recognized by IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2014 as an "outstanding reviewer".

May 10, 2014 Gave an invited talk titled "From 3D Models to Images" at the NCFRN Annual General Meeting (pdf, clickable mov).

Mar. 1, 2014 I'm a distal fellow of the NSERC Canadian Field Robotics Network (NCFRN).

Jan. 2014 Paper accepted at IEEE International Conference on Robotics and Automation (ICRA) 2014.

Aug. 2013 Paper accepted at IEEE International Conference on Computer Vision (ICCV) 2013.

Jan. 2013 I'm an Adjunct Professor at York University.


Adam Harley (M.Sc.)

Hasan Almawi (M.Sc.)

Domenic Curro (M.Sc.)


Christopher Kong (M.Sc. co-supervisor)


Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video
This paper addresses the challenge of 3D full-body human pose estimation from a monocular image sequence. Here, two cases are considered: (i) the image locations of the human joints are provided and (ii) the image locations of joints are unknown. In the former case, a novel approach is introduced that integrates a sparsity-driven 3D geometric prior and temporal smoothness. In the latter case, the former case is extended by treating the image locations of the joints as latent variables. A deep fully convolutional network is trained to predict the uncertainty maps of the 2D joint locations. The 3D pose estimates are realized via an Expectation-Maximization algorithm over the entire sequence, where it is shown that the 2D joint location uncertainties can be conveniently marginalized out during inference. Empirical evaluation on the Human3.6M dataset shows that the proposed approaches achieve greater 3D pose estimation accuracy over state-of-the-art baselines. Further, the proposed approach outperforms a publicly available 2D pose estimation baseline on the challenging PennAction dataset.
Xiaowei Zhou, Menglong Zhu, Spyridon Leonardos, Konstantinos Derpanis and Kostas Daniilidis
arXiv (accepted at CVPR 2016)
Learning Dense Convolutional Embeddings for Semantic Segmentation
This paper proposes a new deep convolutional neural network (DCNN) architecture that learns pixel embeddings, such that for any two pixels on the same object, the embeddings are nearly identical. Inversely, the DCNN is trained to produce dissimilar representations for pixels coming from differing objects. Experimental results show that when this embedding network is used to augment a DCNN trained on semantic segmentation, there is a systematic improvement in per-pixel classification accuracy. This strategy is complementary to many others pursued in semantic segmentation, and it is implemented efficiently in a popular deep learning framework, making its integration with existing systems very straightforward.
Adam Harley, Konstantinos Derpanis and Iasonas Kokkinos
arXiv (Workshop paper at ICLR 2016)
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification and retrieval, using features learned by deep convolutional neural networks (CNNs). In object and scene analysis, deep neural nets are capable of learning a hierarchical chain of abstraction from pixel inputs to concise and descriptive representations. The current work explores this capacity in the realm of document analysis, and confirms that this representation strategy is superior to a variety of popular handcrafted alternatives. Extensive experiments show that (i) features extracted from CNNs are robust to compression, (ii) CNNs trained on non-document images transfer well to document analysis tasks, and (iii) enforcing region-specific feature-learning is unnecessary given sufficient training data. This work also makes available a new labelled subset of the IIT-CDIP collection, containing 400,000 document images across 16 categories.
Adam Harley, Alex Ufkes and Konstantinos Derpanis
ICDAR 2015 (Best Student Paper Award)
Single Image 3D Object Detection and Pose Estimation for Grasping
We present a novel approach for detecting objects and estimating their 3D pose in single images of cluttered scenes. Objects are given in terms of 3D models without accompanying texture cues. A deformable parts-based model is trained on clusters of silhouettes of similar poses and produces hypotheses about possible object locations at test time. Objects are simultaneously segmented and verified inside each hypothesis bounding region by selecting the set of superpixels whose collective shape matches the model silhouette. A final iteration on the 6-DOF object pose minimizes the distance between the selected image contours and the actual projection of the 3D model. We demonstrate successful grasps using our detection and pose estimate with a PR2 robot. Extensive evaluation with a novel ground truth dataset shows the considerable benefit of using shape-driven cues for detecting objects in heavily cluttered scenes.
Menglong Zhu, Konstantinos Derpanis, Yinfei Yang, Samarth Brahmbhatt, Mabel Zhang, Cody Phillips and Kostas Daniilidis
ICRA 2014
From Actemes to Action: A Strongly-supervised Representation for Detailed Action Understanding
This paper presents a novel approach for analyzing human actions in non-scripted, unconstrained video settings based on volumetric, x-y-t, patch classifiers, termed actemes. Unlike previous action-related work, the discovery of patch classifiers is posed as a strongly-supervised process. Specifically, keypoint labels (e.g., position) across spacetime are used in a data-driven training process to dis- cover patches that are highly clustered in the spacetime keypoint configuration space. To support this process, a new human action dataset consisting of challenging consumer videos is introduced, where notably the action label, the 2D position of a set of keypoints and their visibilities are provided for each video frame. On a novel input video, each acteme is used in a sliding volume scheme to yield a set of sparse, non-overlapping detections. These detections provide the intermediate substrate for segmenting out the action. For action classification, the proposed representation shows significant improvement over state-of-the-art low-level features, while providing spatiotemporal localization as additional output. This output sheds further light into detailed action understanding.
Weiyu Zhang, Menglong Zhu and Konstantinos Derpanis
ICCV 2013
Action Spotting and Recognition Based on a Spatiotemporal Orientation Analysis
This paper provides a unified framework for the interrelated topics of action spotting, the spatiotemporal detection and localization of human actions in video, and action recognition, the classification of a given video into one of several predefined categories. A novel compact local descriptor of video dynamics in the context of action spotting and recognition is introduced based on visual spacetime oriented energy measurements. This descriptor is efficiently computed directly from raw image intensity data and thereby forgoes the problems typically associated with flow-based features. Importantly, the descriptor allows for the comparison of the underlying dynamics of two spacetime video segments irrespective of spatial appearance, such as differences induced by clothing, and with robustness to clutter. An associated similarity measure is introduced that admits efficient exhaustive search for an action template, derived from a single exemplar video, across candidate video sequences. The general approach presented for action spotting and recognition is amenable to efficient implementation, which is deemed critical for many important applications. For action spotting, details of a real-time GPU-based instantiation of the proposed approach are provided. Empirical evaluation of both action spotting and action recognition on challenging datasets suggests the efficacy of the proposed approach, with state-of-the-art performance documented on standard datasets.
Konstantinos Derpanis, Mikhail Sizintsev, Kevin J. Cannons, and Richard Wildes
PAMI 2013
Dynamic Scene Understanding: The Role of Orientation Features in Space and Time in Scene Classification
Natural scene classification is a fundamental challenge in computer vision. By far, the majority of studies have limited their scope to scenes from single image stills and thereby ignore potentially informative temporal cues. The current paper is concerned with determining the degree of performance gain in considering short videos for recognizing natural scenes. Towards this end, the impact of multiscale orientation measurements on scene classification is systematically investigated, as related to: (i) spatial appearance, (ii) temporal dynamics and (iii) joint spatial appearance and dynamics. These measurements in visual space, x-y, and spacetime, x-y-t, are recovered by a bank of spatiotemporal oriented energy filters. In addition, a new data set is introduced that contains 420 image sequences spanning fourteen scene categories, with temporal scene information due to objects and surfaces decoupled from camera-induced ones. This data set is used to evaluate classification performance of the various orientation-related representations, as well as state-of-the-art alternatives. It is shown that a notable performance increase is realized by spatiotemporal approaches in comparison to purely spatial or purely temporal methods.
Konstantinos Derpanis, Matthieu Lecce, Kostas Daniilidis and Richard Wildes
CVPR 2012
Spacetime Texture Representation and Recognition Based on a Spatiotemporal Orientation Analysis
This paper is concerned with the representation and recognition of the observed dynamics (i.e., excluding purely spatial appearance cues) of spacetime texture based on a spatiotemporal orientation analysis. The term “spacetime texture” is taken to refer to patterns in visual spacetime, x-y-t, that primarily are characterized by the aggregate dynamic properties of elements or local measurements accumulated over a region of spatiotemporal support, rather than in terms of the dynamics of individual constituents. Examples include image sequences of natural processes that exhibit stochastic dynamics (e.g., fire, water, and windblown vegetation) as well as images of simpler dynamics when analyzed in terms of aggregate region properties (e.g., uniform motion of elements in imagery, such as pedestrians and vehicular traffic). Spacetime texture representation and recognition is important as it provides an early means of capturing the structure of an ensuing image stream in a meaningful fashion. Toward such ends, a novel approach to spacetime texture representation and an associated recognition method are described based on distributions (histograms) of spacetime orientation structure. Empirical evaluation on both standard and original image data sets shows the promise of the approach, including significant improvement over alternative state-of-the-art approaches in recognizing the same pattern from different viewpoints.
Konstantinos Derpanis and Richard Wildes
PAMI 2012
The Structure of Multiplicative Motions in Natural Imagery
A theoretical investigation of the frequency structure of multiplicative image motion signals is presented, e.g., as associated with translucency phenomena. Previous work has claimed that the multiplicative composition of visual signals generally results in the annihilation of oriented structure in the spectral domain. As a result, research has focused on multiplicative signals in highly specialized scenarios where highly structured spectral signatures are prevalent, or introduced a nonlinearity to transform the multiplicative image signal to an additive one. In contrast, in this paper, it is shown that oriented structure is present in multiplicative cases when natural domain constraints are taken into account. This analysis suggests that the various instances of naturally occurring multiple motion structures can be treated in a unified manner. As an example application of the developed theory, a multiple motion estimator previously proposed for translation, additive transparency, and occlusion is adapted to multiplicative image motions. This estimator is shown to yield superior performance over the alternative practice of introducing a nonlinear preprocessing step.
Konstantinos Derpanis and Richard Wildes
PAMI 2009
On the Role of Representation in the Analysis of Visual Spacetime
The problems under consideration in this dissertation centre around the rep- resentation of visual spacetime, i.e., (visual) image intensity (irradiance) as a function of two-dimensional spatial position and time. In particular, the over- arching goal is to establish a unified approach to representation and analysis of temporal image dynamics that is broadly applicable to the diverse phenomena in the natural world as captured in two-dimensional intensity images. Previous research largely has approached the analysis of visual dynamics by appealing to representations based on image motion. Although of obvious importance, motion represents a particular instance of the myriad spatiotemporal patterns observed in image data. A generative model centred on the concept of spacetime orientation is proposed. This model provides a unified framework for understanding a broad set of important spacetime patterns. As a consequence of this analysis, two new classes of patterns are distinguished that have previously not been considered directly in terms of their constituent spacetime oriented structure, namely multiplicative motions (e.g., translucency) and stochastic-related phenomena (e.g., wavy water and windblown vegetation). Motivated by this analysis, a represen- tation is proposed that systematically exposes the structure of visual spacetime in terms of local, spacetime orientation. The power of this representation is demonstrated in the context of the following four fundamental challenges in computer vision: (i) spacetime texture recognition, (ii) spacetime grouping, (iii) local boundary detection and (iv) the detection and spatiotemporal localization of an action in a video stream.
Konstantinos Derpanis (Supervisor: Richard Wildes)
Dissertation, York University, 2010
CIPPRS 2010 Doctoral Dissertation Award, Honorable Mention