The University of Tokyo 1
National Institute of Advanced Industrial Science and Technology (AIST) 2
Input image
Depthmap
Triangular-patch-cloud
3D mesh
We propose a novel and efficient representation for single-view depth estimation using Convolutional Neural Networks (CNNs). Point-cloud is generally used for CNN-based 3D scene reconstruction; however it has some drawbacks: (1) it is redundant as a representation for planar surfaces, and (2) no spatial relationships between points are available (e.g, texture and surface). As a more efficient representation, we introduce a triangular-patch-cloud, which represents the surface of the 3D structure using a set of triangular patches, and propose a CNN framework for its 3D structure estimation. In our framework, we create it by separating all the faces in a 2D mesh, which are determined adaptively from the input image, and estimate depths and normals of all the faces. Using a common RGBD-dataset, we show that our representation has a better or comparable performance than the existing point-cloud-based methods, although it has much less parameters.
We introduce a novel representation, triangle-patch-cloud, composed of triangular patches.
It is created by separating all the faces in a 2D mesh, adaptively determined by input image.
All of those patche's 3D position (depth + normal) are predicted by CNN.
This representation is more efficient than point-cloud representation, while its depth map has better or comparable performance than those of existing CNN-based poin-cloud methods.
Quantitative (top) and qualitative (bottom) results showing our depth map rendered from predicted triangular-patch-cloud. We evaluate our method using NYU Depth v2 dataset [Silberman et al., ECCV'12], and evaluate its depth map using general error metrics. Our results achieved better or comparable performance to that of the existing deep depth prediction methods ([Eigen et al., ICCV'15], [Laina et al., 3DV'16], [Fu et al., CVPR'18]), despite having much less parameters.
Method | RMSE | IRMSE | AbsRel | delta1 | delta2 | delta3 | #Param. |
---|---|---|---|---|---|---|---|
[Eigen et al., ICCV'15] | .5781 | .1028 | .1635 | .7533 | .9444 | .9866 | 307K | [Laina et al., 3DV'16] | .5112 | .0933 | .1368 | .8149 | .9538 | .9881 | 307K |
[Fu et al., CVPR'18] | .5459 | .1007 | .1287 | .8258 | .9431 | .9776 | 307K |
Ours | .5102 | .0905 | .1411 | .8178 | .9577 | .9893 | 30K |
Since our depth map is created by the 2D mesh extracted based on the canny edge, its object boundaries are clearer than those of the pixel-wise deep depth prediction methods.
Input image | GT | Ours | Laina et al. | Fu et al. |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
These are our predictions of 3D triangular-patch-cloud (top) and its 3D mesh (bottom). 3D mesh can be easily obtained from our triangular-patch-cloud by connecting the adjacent neighbor patches. This process can not only reduce the parameter size (the number of vertex) of 3D structure, but also make visualization result clean.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Our method works well in other indoor scenes. These are some results of our 3D mesh prediction.
@misc{kaneko19tridepth, Author = {Masaya Kaneko and Ken Sakurada and Kiyoharu Aizawa}, Title = {TriDepth: Triangular Patch-based Deep Depth Prediction}, Booktitle = {ICCV Deep Learning for Visual SLAM Workshop}, Year = {2019}, }
We thank Shizuma Kubo for modifying the code of this webpage, whose template design was borrowed from keypointnet webpage .