The University of Tokyo 1
National Institute of Advanced Industrial Science and Technology (AIST) 2
Input image
Depthmap
Triangular-patch-cloud
3D mesh
We propose a novel and efficient representation for single-view depth estimation using Convolutional Neural Networks (CNNs). Point-cloud is generally used for CNN-based 3D scene reconstruction; however it has some drawbacks: (1) it is redundant as a representation for planar surfaces, and (2) no spatial relationships between points are available (e.g, texture and surface). As a more efficient representation, we introduce a triangular-patch-cloud, which represents the surface of the 3D structure using a set of triangular patches, and propose a CNN framework for its 3D structure estimation. In our framework, we create it by separating all the faces in a 2D mesh, which are determined adaptively from the input image, and estimate depths and normals of all the faces. Using a common RGBD-dataset, we show that our representation has a better or comparable performance than the existing point-cloud-based methods, although it has much less parameters.
We introduce a novel representation, triangle-patch-cloud, composed of triangular patches. It is created by separating all the faces in a 2D mesh, adaptively determined by input image. All of those patche's 3D position (depth + normal) are predicted by CNN. This representation is more efficient than point-cloud representation, while its depth map has better or comparable performance than those of existing CNN-based poin-cloud methods.
Quantitative (top) and qualitative (bottom) results showing our depth map rendered from predicted triangular-patch-cloud. We evaluate our method using NYU Depth v2 dataset [Silberman et al., ECCV'12], and evaluate its depth map using general error metrics. Our results achieved better or comparable performance to that of the existing deep depth prediction methods ([Eigen et al., ICCV'15], [Laina et al., 3DV'16], [Fu et al., CVPR'18]), despite having much less parameters.
Method | RMSE | IRMSE | AbsRel | delta1 | delta2 | delta3 | #Param. |
---|---|---|---|---|---|---|---|
[Eigen et al., ICCV'15] | .5781 | .1028 | .1635 | .7533 | .9444 | .9866 | 307K | [Laina et al., 3DV'16] | .5112 | .0933 | .1368 | .8149 | .9538 | .9881 | 307K |
[Fu et al., CVPR'18] | .5459 | .1007 | .1287 | .8258 | .9431 | .9776 | 307K |
Ours | .5102 | .0905 | .1411 | .8178 | .9577 | .9893 | 30K |
Since our depth map is created by the 2D mesh extracted based on the canny edge, its object boundaries are clearer than those of the pixel-wise deep depth prediction methods.
Input image | GT | Ours | Laina et al. | Fu et al. |
---|---|---|---|---|
These are our predictions of 3D triangular-patch-cloud (top) and its 3D mesh (bottom). 3D mesh can be easily obtained from our triangular-patch-cloud by connecting the adjacent neighbor patches. This process can not only reduce the parameter size (the number of vertex) of 3D structure, but also make visualization result clean.
Our method works well in other indoor scenes. These are some results of our 3D mesh prediction.
@misc{kaneko19tridepth, Author = {Masaya Kaneko and Ken Sakurada and Kiyoharu Aizawa}, Title = {TriDepth: Triangular Patch-based Deep Depth Prediction}, Booktitle = {ICCV Deep Learning for Visual SLAM Workshop}, Year = {2019}, }
We thank Shizuma Kubo for modifying the code of this webpage, whose template design was borrowed from keypointnet webpage .