TriDepth

Abstract

We propose a novel and efficient representation for single-view depth estimation using Convolutional Neural Networks (CNNs). Point-cloud is generally used for CNN-based 3D scene reconstruction; however it has some drawbacks: (1) it is redundant as a representation for planar surfaces, and (2) no spatial relationships between points are available (e.g, texture and surface). As a more efficient representation, we introduce a triangular-patch-cloud, which represents the surface of the 3D structure using a set of triangular patches, and propose a CNN framework for its 3D structure estimation. In our framework, we create it by separating all the faces in a 2D mesh, which are determined adaptively from the input image, and estimate depths and normals of all the faces. Using a common RGBD-dataset, we show that our representation has a better or comparable performance than the existing point-cloud-based methods, although it has much less parameters.

[ Paper ]

[ Code ]

[ Slide ]

[ Poster ]

Triangular-patch-cloud

We introduce a novel representation, triangle-patch-cloud, composed of triangular patches. It is created by separating all the faces in a 2D mesh, adaptively determined by input image. All of those patche's 3D position (depth + normal) are predicted by CNN. This representation is more efficient than point-cloud representation, while its depth map has better or comparable performance than those of existing CNN-based poin-cloud methods.

Depth map Prediction

Quantitative (top) and qualitative (bottom) results showing our depth map rendered from predicted triangular-patch-cloud. We evaluate our method using NYU Depth v2 dataset [Silberman et al., ECCV'12], and evaluate its depth map using general error metrics. Our results achieved better or comparable performance to that of the existing deep depth prediction methods ([Eigen et al., ICCV'15], [Laina et al., 3DV'16], [Fu et al., CVPR'18]), despite having much less parameters.

Method	RMSE	IRMSE	AbsRel	delta1	delta2	delta3	#Param.
[Eigen et al., ICCV'15]	.5781	.1028	.1635	.7533	.9444	.9866	307K
[Laina et al., 3DV'16]	.5112	.0933	.1368	.8149	.9538	.9881	307K
[Fu et al., CVPR'18]	.5459	.1007	.1287	.8258	.9431	.9776	307K
Ours	.5102	.0905	.1411	.8178	.9577	.9893	30K

Since our depth map is created by the 2D mesh extracted based on the canny edge, its object boundaries are clearer than those of the pixel-wise deep depth prediction methods.

Input image	GT	Ours	Laina et al.	Fu et al.

3D Structure Visualization

These are our predictions of 3D triangular-patch-cloud (top) and its 3D mesh (bottom). 3D mesh can be easily obtained from our triangular-patch-cloud by connecting the adjacent neighbor patches. This process can not only reduce the parameter size (the number of vertex) of 3D structure, but also make visualization result clean.

Our method works well in other indoor scenes. These are some results of our 3D mesh prediction.

Video

Citation

@misc{kaneko19tridepth,
  Author = {Masaya Kaneko and Ken Sakurada and Kiyoharu Aizawa},
  Title = {TriDepth: Triangular Patch-based Deep Depth Prediction},
  Booktitle = {ICCV Deep Learning for Visual SLAM Workshop},
  Year = {2019},
}

Acknowledgements

We thank Shizuma Kubo for modifying the code of this webpage, whose template design was borrowed from keypointnet webpage .