How to Create Volumetric Video Without Owning a Kinect?
Hi everyone, I’m Naoya Iwamoto, a researcher in computer graphics field. I’m interested in character animation technology; deformation, facial animation, motion synthesis and control.  

Recently I’m focusing on Volumetric Video(VV)” technology, which can be captured dynamic movement of the living object. From this year I’ve read several research papers and also had some experiments for making VV. In this article, I would like to share tips got from my experiment shown below. (Fig. 1).


How to Start Volumetric Video?


According to my sources, a system in volumetric video studio normally generates point cloud or mesh using around 100 synchronized DSLR cameras as same as photogrammetry setup. Even if those Multi-View Stereo(MVS)” approach can get dense point cloud, it’s too expensive, right? Don’t worry. You can start from several depth sensors such as Azure Kinect, which can directly and independently get point cloud. It is a little bit noisy, but it can be reduced with post process. 


Dataset using Multi-View Kinect


First of all, I thought to use Azure Kinect for my experiment, but it was still expensive for me… (around 450 USD/Kinect). Additionally it requires 3 or 4 Kinect at least for capturing 360 degree VV. Then, I found PanopticStudio 3D PointCloud Dataset”, which is 3D point cloud dataset captured from 10 Kinect V2. 


However, unfortunately this dataset is distributed not as 3D point cloud but colored video and depth acquired from each camera, camera parameters(written about position, angle, focal length and distortion) and time stamp for camera synchronization. The point is that you need to run code in their GitHub repository for generating point cloud yourself with synchronizing images got from each Kinect. Furthermore their code for generating point cloud were written in Matlab, which requires a licence for users…, so I re-written them with Python with Open3D” library (describe later). It’s free:)


Finally I could export all point cloud acquired from multi-view Kinect as .ply format for each frame as shown in Fig.4. A colored point cloud data was incredible large, so it would be more large for multiple frames. Therefore I also implemented a code to remove point cloud of the wall(dome) part based on a threshold for depth value on depth image. Eventually I could decrease point cloud into 25MB for each frame. 

Making color point cloud was just a little difficult because the resolution of color and depth on Kinect was different. (Color: 1920x1080 (.jpg), depth: 512x424 (.png). To pick-up color information, The point cloud generated from depth is necessary to be projected onto color image space. After picking-up color for each point, I re-projected on 3D space again. Honestly you can also ignore this process because we will apply texture mapping later.

Visualization on Blender


For visualization of point cloud, I used Blender 2.82 with add-on Point Cloud Visualizer”, and it was very convenient to visualize colored point cloud sequences. The time loading point cloud sequences (110 frames in my case) were taken so long, and it may require large memory on your PC. In my case, it is Windows 10 (64bit), 16GB, GeForce RTX2020.

This offline process (mainly for data loading) requires to have patient for you. I’d like to think how to achieve in real-time when I’ve got multiple Kinect in near future:)

3D Library for Meshing


Generating point cloud and meshing can be achieved by using Open3D”, which is 3D computer vision friendly library. I believe this might be the best choice to implement offline VV system for beginners. It was easy to acquire a mesh(.ply) from point cloud [sample]. However the Open3D library has “OpenCV”-like axis(X-right, Y-down, Z-front), so please carefully use on “OpenGL” software. (Fig.5).
I visualized the mesh generated from the dataset with lighting on Blender in Fig. 6. When you use Truncated Sign Distance Function(TSDF)” function in Open3D for meshing, you can choose voxel resolution parameter (I used 5.0 in Fig. 6).  I’ll describe the detail later. The mesh for each frame was about less than 20MB.


As a tips, Blender add-on Stop-motion-OBJ was really convenient for loading continuous mesh (.ply or .obj) sequences even though it takes much time for loading. If you can remove floor vertices, it would be faster for loading. I also implemented floor removal function using Blender Script.

3D Tool for Texture Mapping


Texture mapping was the most difficult part for this experiment because there is no popular library and tools. However, fortunately I’ve found texture mapping tutorial using MeshLab. The system inputs are mesh, camera intrinsic/extrinsic parameters and color texture acquired from each camera. Because I can’t find much detail related to texture mapping process on MeshLab, I spent a time to automatically input a mesh, color images and camera parameter as batch process (because it’s necessary for applying all frames). Finally I achieved texture mapping (2K texture resolution of my setting) for all frames (Fig. 7) by a process found by touch…

  • 1. Synthesizing MeshLab scene file by self-developed Python code
  • 2. Loading it using MeshLab script
  • 3. Applying texture mapping function from MeshLab script