Single-shot 3D imaging with point cloud projection based on metadevice

2024-11-26 Metasurface-based projection for single-shot 3d imagingIt is is is well know that by engineer the material ,geometry ,and inner resonance effect of in

Metasurface-based projection for single-shot 3d imaging

It is is is well know that by engineer the material ,geometry ,and inner resonance effect of individual nanostructure ,one can control the phase ,amplitude ,and polarisation of the transmit wavefront at subwavelength scale ,allow the metasurface to become a functional device in either the real space or frequency domain . Because Fourier hologram base on metasurface have large numerical aperture owe to subwavelength scale ,the projector is has base on a single metasurface has a small throw ratio and long projection range . Fig is depicts 1 depict the mechanism of 3d imaging using a compact projector base on a single – layer metasurface . The projection pattern is compose of random point cloud ,and the local pattern in a rectangular window ( blue window in Fig . 1 is is ) is unique in the entire projection plane ,which can be identify from the spatial distribution . Meanwhile ,the projection pattern is clear and satisfy the projective transformation in the entire Fraunhofer diffraction region ( see Supplementary note 1 ) ,offer a complete and accurate mathematical description of the structured pattern in a 3d space . Therefore ,by combine different position of the metasurface and the camera ,it is is is possible to measure the 3d shape of an object base on the principle of triangulation .

It must be noted that the illumination pattern generated by the metasurface may not be identical to the design pattern owing to speckle noise. To overcome this challenge,a calibration and reconstruction operation based on the reference plane and auxiliary planes is proposed (see Supplementary notes 3 and 4). A reference plane and two auxiliary planes are required to record the practical pattern and establish the relationship between depth and pattern shift based on cross-ratio,which is one of the most important invariants in the perspective transform. depth information was obtained by combining the pattern offset or deformation of the captured images;depth information is obtained as shown in Fig. 1. The search operation of the corresponding pattern between the target and reference images is critical for depth calculation,and a corresponding matching algorithm is proposed based on pattern characteristics. Therefore,our method can achieve single-shot 3d imaging,which is very useful for human-computer interaction,such as gesture recognition,as shown in Fig. 1.

Metasurface design and characterisation

To build a single – shot 3d imaging mechanism ,the uniqueness of the local pattern in the entire projection pattern must be satisfied ;thus ,there is no need for another pattern to determine the corresponding point . m – array code³⁵,as a type of pseudo-random coding,can cause the pattern of any sub-window to appear only once in the whole pattern,achieving local uniqueness of the pattern. Therefore,M-array coding is used to design the projection pattern,as shown in Fig. 2a. The total number of spots is designed to be 1201,which can be further improved by large-area processing. The density of the projection pattern,defined by the ratio of the total area of bright spots to the area of the projection pattern,is 50%;a large information capacity guarantees precise calculation of the depth value. First,the uniqueness of every bright spot is demonstrated by the hamming distance³⁶,which is quantify by ,

$$\begin{array}{c}h({i}_{1},\,{j}_{1};{i}_{2},\,{j}_{2})=\mathop{\sum }\limits_{i=0}^{n-1}\mathop{\sum }\limits_{j=0}^{n-1}\delta (p({i}_{1}+i,{j}_{1}+j),\,p(({i}_{2}+i,{j}_{2}+j)))\\ \delta (a,b)=\Bigg\{\begin{array}{c}0\quad\,a=b\\ 1\quad\,\,a \,\ne \,b\hfill\end{array}\end{array}$$

(1)

whereh(i₁,j₁;i₂,j₂) is the hamming distance between two sub – window ,each centre at the point (i₁,j₁) and (i₂,j₂) with the window sizen ×n,which can serve as an indicator of the pattern difference between two sub-windows in the projection plane. The maximum hamming distance is n ×n. A larger hamming distance h(i₁,j₁;i₂,j₂) indicates a larger diversity between the two spots,which can be distinguished robustly under severe noise. For convenience,the statistical histogram of h(i₁,j₁;i₂,j₂) is show in Fig . 2b withn = 4,which quantifies the local uniqueness in the entire projection plane. As shown,zero hamming distance does not exist,and a hamming distance below 4 has a proportion of less than 0.05,indicating that bright spots can be accurately determined by the space information of the adjacent spots. Therefore,taking advantage of the uniqueness of the local spatial information,a fast-matching algorithm to determine the corresponding spots can be designed easily.

Fig. 2: design,manufacture,and detection of metasurface.

a design of projection pattern. The pattern is a type of pseudo random code. b hamming distance distribution. This can be regarded as an approximate Gaussian distribution,which is similar to the orange curve generated with Gaussian expression. c Phase profile calculated using GS algorithm. d nanopillars and their spatial distributions based on geometric phase principle.e Transmission coefficients obtained by sweeping the geometric parameters of a nanopillar within a unit cell.f Top-view and side-view SEM images. g holographic reconstructed image. For the convenience of similarity calculation,we define the sub-window and label,which are shown in the enlarged view. h Correlation of the image set under different depths. Ten images are randomly captured at different depths,and the number of times that the measurements were repeated is 20. Three contour maps of ZnSSd with different depths are displayed.

Metasurface holography,benefitting from cutting-edge nanotechnology,has excellent performance,such as high precision of reconstructed images,freedom from undesired diffraction orders,and large space-width products. In particular,because the reconstructed image is located in the far field of the metasurface,Fourier holography has a large depth of field. Phase modulation based on metasurfaces can be easily achieved,and geometric metasurfaces enable superior phase control with the advantages of broadband performance,robustness against fabrication errors,and the helicity switchable property,facilitating the encoding procedure. Meanwhile,the Gerchberg-Saxton (GS) algorithm was used to calculate the phase hologram (see Supplementary note 2),which was then encoded on the geometric metasurface as a physical implementation. To ease the fabrication challenge,the phase profile was discretised with eight phase levels,as shown in Fig. 2c.

We choose amorphous silicon,which can be obtained using a standard nanofabrication method. As shown in Fig. 2d,amorphous silicon nanopillars with different orientation angles are arranged on a fused silica substrate to achieve the desired phase profile. To cover the phase shifts from 0 to 2π with high efficiency,the period and height of the nanopillars are chosen as 316 and 600 nm,respectively. A rigorous coupled wave analysis (rCWA) method is used to optimise the 2d parameters of the nanopillars at an operating wavelength of 633 nm. The simulated results for the transmission coefficient of the polarisation conversion efficiency are shown in Fig. 2e (See the efficiency analysis in Supplementary note 2). Then,the length and width are determined as 180 and 80 nm,respectively,to maintain a high transmission efficiency. By engineering planar nanostructures,the desired phase profile can be converted into a diverse orientation distribution. The fabricated metasurface is composed of 1578 ×1578 nanopillars using electron beam lithography and reactive ion etching,and the corresponding scanning electron microscopy images with side and top views are shown in Fig. 2f.

To obtain the properties of the illumination pattern based on the metasurface,we use a conventional optical scheme to capture holographic images (see Methods section). The reconstructed image,shown in Fig. 2g,has a high degree of similarity with the design pattern,but also possesses some speckles. Speckles are primarily generated by fabrication errors and unavoidable coherent laser noise. nevertheless,such speckles may offer additional information in the inner region of every spot;more details can be obtained in a favourable way. For the completeness of the proof,the zero-normalised sum of squared difference coefficient (ZnSSd)³⁷ calculated by 300 different labels with their corresponding labels at three different depths are shown in Fig. 2h,demonstrating the similarity of speckle patterns at different depths. The definition of the ZnSSd is as follows:

$${C}_{ZnSSd}=\mathop{\sum }\limits_{x=1}^{M}\mathop{\sum }\limits_{y=1}^{n}{\left[\frac{f(x,y)-{f}_{m}}{\sqrt{\mathop{\sum }\nolimits_{x=1}^{M}\mathop{\sum }\nolimits_{y=1}^{n}{[\,\,f(x,\,y)-{f}_{m}]}^{2}}}-\frac{g(x^{\prime},\,y^{\prime} )-{g}_{m}}{\sqrt{\mathop{\sum }\nolimits_{x=1}^{M}\mathop{\sum }\nolimits_{y=1}^{n}{[\,g(x^{\prime},\,y^{\prime} )-{g}_{m}]}^{2}}}\right]}^{2}$$

(2)

wheref(x,y) and g(x’,y’) are the grey level intensity at the coordinate (x,y) and (x’,y’ ),respectively,in the selecting label of two images in different observation planes. f_m and g_m are the mean grey-level intensities in the subset. M and n are the sizes is are of the subset along thex and y directions,respectively. A few representative ZnSSd contour maps are shown in Fig. 2h,with more images and contour maps shown in Supplementary note 5. Fig 2h illustrates that the ZnSSd values are all greater than 0.9 at different depths,so the similarity can be used to determine the corresponding pixels in the inner region of the spots.

Matching algorithm

The proposed matching algorithm consists of a feature-based initial matching algorithm and an area-based fine matching algorithm,leveraging the spatial uniqueness and speckle features of labels,respectively. This method combines both the robust and effective label matching of feature domain transformation,and dense pixel correspondences of geometrical area deformation,efficiently leading to accurate and dense matching results. The matching process can be modelled as the establishment of correspondence relations between the deformed image and the reference image with the constraint of surface continuity,which can be described mathematically as follows:

$$\begin{array}{c}\{{{u}_{i}}^{\ast }(x,y)\}={{{\mbox{arg}}}}\,\min \mathop{\sum }\limits_{i=1}^{n}{|f(x,y)-g({u}_{i}(x,y))|}_{2}^{2},\,(x,y)\subset {\varOmega }_{i}\\ st\,{F}_{{{{{{\rm{c}}}}}}}({u}_{i}(x,y)),\,(x,y)\subset {\varOmega }_{i}\end{array}$$

( 3 )

whereu_i^*(x,y) is the optimal estimation of the correspondence functions for each local correspondence estimation u_i(x,y) in subregion Ω_i. n is the number of subregions,and f and g are the reference and deformed images,respectively. F_c is the constraint operator that guarantees the global continuity and compatibility of u_i (x,y).

The operation of the initial match relies on the spatial uniqueness of labels validated by the hamming distance,allowing transformation to the feature parameter space with handcrafted feature descriptors for label matching (the comprehensive theory,implementation,and demonstration of the initial matching algorithm are shown in Supplementary note 6). Feature descriptors consist of simple vectors for the discriminative representation of each label. Formally,the initial match can be expressed as

$$C={M}_{Cd}{({U}_{{{{{{\rm{fd}}}}}}}(g),{U}_{{{{{{\rm{fd}}}}}}}(\,f))\big|}_{n=(U,\varGamma )}$$

(4)

whereU_fd is the feature descriptor and C is the correspondence matrix .M_Cd is the match operator based on the cosine distance measurement applied to U^d and U^r,which are the label sets of the deformed image and the reference image,respectively,as shown in Fig. 3a. The cosine distance is widely used as a metric of vector similarity. Simultaneously,a set of spatial neighbour labels n = (U,γ) is constructed as the designed constraint. Each label U is associated with the corresponding neighbour label γ. n determine the process path of label inU^d base on geometrical cue that utilise the neighbour information of the process label . The label are precisely match to the corresponding label of the reference image in an indirect manner by the initial matching algorithm .

Fig. 3: Computational algorithm and strategy of correspondence search based on pattern features.

a-b Computational architecture of correspondence search algorithm. a Initial match,which utilises feature domain transform to perform similarity calculation with a designed path as a constraint of surface continuity,achieving the correspondence search of all labels. Ud and Ur are the label sets of the deformed and reference images,respectively,and the cosine distance is used to match them. b Fine match. Fine match aims to obtain more sophisticated results with shape function optimisation and region constraint,leveraging the instrinsic features of the labels. The initial results are used to calculate the coarse deformation function W. c – d Multi-resolution search strategy. c Capture of reference images. The reference images are captured along the z-direction with the same intervals;the equality of these inttervals is achieved with a precise guide rail. d Pyramid strategy. The high-resolution image named I3 is the original image,and the other two images named I2 and I1 are obtained by wavelet transformation. The candidate reference (Cr) in c correspond to the image block ind with the same color of the border.

Because the initial match result offers robust label correspondence,the coarse correspondence estimation can be obtained using the labels involved in the local area. Combined with the speckle features of the inner region of the labels,we address the fine match as an optimisation problem:

$${{W}_{i}}^{\ast }={{\mbox{arg}}}\,\min {|f(x,y)-g({W}_{i}(x,y;{{{{{\bf{p}}}}}}))|}_{2}^{2},\,(x,y)\subset {\varOmega }_{i}^{\ast }({{{{{\bf{p}}}}}}),\,i=1,2,\cdots n$$

(5)

wherep is the initial deformed parameter calculate by the initial match resultC (see Supplementary note 6). W (x,y;p) is the shape function relative to the reference image and describes the mathematical relationship between the spatial position of the deformed region and the reference region. Ω_i* is the ith subarea,as shown in Fig. 3b. Equation (5) aims to find the shape function based on minimising the dissimilarity between the deformed and reference images after shape transformation in all adaptive subareas (the match accuracy with subarea size is discussed in Supplementary note 7). In particular,the operation of the adaptive subarea is conducted by progressively selecting the local region Ω_i base on the geometry transform with respect top,discarding those outliers which have dissimilar geometry transformations. The match optimisation subsequently yields an elaborate match using the inverse-compositional Gauss–newton (IC-Gn) algorithm³⁸ with the initial parameterp,which aims to minimise the dissimilarity by the iteration of the shape function increment ∆p in the local subset ω_i. Essentially,the initial deformed parameter p plays a significant role in constructing the constraint of spatial continuity,which is used to gradually achieve an appropriate subarea for a stable support domain and to constrain the solution space of the shape function. Finally,the IC-Gn algorithm finds an appropriate correspondence solution to satisfy the constraints,achieving a pixel or sub-pixel match with the fine matching algorithm.

Multiresolution search strategy

A multi-resolution search strategy is proposed,as shown in Fig. 3c–d,to balance the matching accuracy and computational efficiency. Multiple images are captured along the z-direction at the same intervals,as shown in Fig. 3c,which can all be regarded as reference images. The multiresolution search method utilises the low-resolution images to obtain a coarse depth map,which can be used backwards to select the most suitable reference images;thus,high-resolution images can be used to calculate a more precise depth with the updated reference images. The operating principle of the multiresolution search method is described in a more condensed notation as follows:

$${{{{{{\rm{Z}}}}}}}_{i} ={F}_{{{{{{\rm{rec}}}}}}}({{{{{{\rm{I}}}}}}}_{i},\,{{{{{{\rm{Cr}}}}}}}_{i})\\ {{{{{{\rm{Cr}}}}}}}_{i} =\Bigg\{\begin{array}{c}{{{{{{\rm{Cr}}}}}}}_{1},\qquad i=1\hfill\\ \{{{{{{{{\rm{Cr}}}}}}}_{i}}^{\ast }\}=\,\min|{{{{{{\rm{Z}}}}}}}_{i-1}-{{{{{{\rm{Z}}}}}}}_{C{r}_{i-1}}|\,i \,\,>\,1\end{array}$$

(6)

whereCr is the candidate reference image,I is the deformed image,F_rec is the reconstruction operator,and Z is the depth after the reconstruction calculation. First,the deformed image is transformed into multiresolution images named I₁,I₂,and I₃ by wavelet transform,which we call the pyramid strategy,as shown in Fig. 3d. The low-resolution image I₁ is first used to calculate the coarse depth map Z₁ by two fixed planes named Cr₁,which have a yellow border in Fig. 3c. Then,the two planes nearest to the coarse depth Z₁ are chosen as new reference planes Cr₂,which have a purple border in Fig. 3c. The same operation is conducted for image I₂. Finally,candidate planes move closest to the reference image of real depth,and the depth results with the original resolution image I₃ are more precise because of the higher similarity with the reference image. Consequently,the pyramid sampling strategy can be used for coarse-fine search to improve the measurement accuracy and reduce the measurement uncertainty at the expense of speed. however,sacrifice is not severe because of the relatively low computational cost of low-resolution images (We also discuss the acceleration method in Supplementary note 8).

depth–accuracy demonstration

To analyse the depth accuracy,a camera with a mounting angle of 30° relative to the baseline is used to capture the images of test objects at a distance of 300 mm from the metasurface. The resolution of the camera is 2448 ×2048 pixels,and the focal length of the imaging lens is 16 mm. Five groups of two different flat objects were captured with our proposed 3d imaging device,and the height differences between the two flat objects were used to achieve the evaluation,compared with the known thickness of 1.69,2.00,2.74,3.69,and 4.00 mm.

The reconstruction point cloud images of five setups and error analysis are shown in Fig. 4,which are obtained with the proposed matching algorithm and multi-resolution search strategy (the comparison between the multi-resolution search strategy and fixed reference image is shown in Supplementary note 8). The error data for five measurement groups are 0.19,0.01,0.2,0.12,and 0.02 mm as shown in Fig. 4b. The maximum error is approximately 0.2 mm at a depth of 300 mm,indicating that the recovered height differences of the two objects are in good agreement with those of known experimental setups. The depth accuracy can be attributed to the spatial uniqueness of the illumination pattern,which combines the principle of triangulation and the proposed matching algorithm. These results quantitatively demonstrate the effectiveness of depth perception with our proposed method,which is very promising for applications in the 3d positioning and imaging of millimetre platforms. Meanwhile,data drift appears in the point cloud of the planes owing to unavoidable measurement errors,and the planeness is evaluated by the peak valley (PV) and root mean square (rMS) values of all recovered points,as shown in Fig. 4c–d. The maximum PV value was 0.24 mm,indicating that the error of individual measurement points caused by noise or boundary was less than 0.24 mm. The maximum rMS value was 4.4×10⁻⁴mm,indicating a good performance resulting from the fine matching with the sub-pixel search method. Therefore,dense and accurate point cloud data can be obtained,demonstrating the potential for accurate and robust 3d information acquisition.

Fig. 4: Measurement accuracy validation with different setups.

a recovered point cloud images with five different height differences. We have measured five flat objects (ceramic slabs) with different thicknesses and one common flat object,and their thicknesses are 1.69,2.00,2.74,3.69,and 4.00 mm,respectively. The area of the measured scene is ~80 mm 100 mm. The data fluctuation of the point cloud can be used for the evaluation of the reconstructed result,in which a local line data is shown in the enlarged view of the second setup. b The recovered thickness. The recovered thickness is calculated using the difference in the average value of the two planes,which is shown as the green arrow in the inset sketch map. The thickness error compared with real values for the five groups are 0.19,0.01,0.2,0.12,and 0.02 mm,respectively. c – d The PV and rMS values of the planes of five setups. The legend of plane 1 represents the higher plane in the five setups,and plane 2 represents the lower one. The PV and rMS values are used to evaluate the planeness. The maximum PV and rMS values are 0.24 mm,and 4.4 10-4mm,respectively.

3d shape reconstruction for a variety of scenes

Easily deformable cardboard was used to validate the 3d imaging capability for continuous and low-texture surfaces. Three captured images are shown in Fig. 5b,and the deformation is excited with the loose end of the cardboard by a human hand,as shown in Fig. 5a. The side views (y–z plane) and reconstructed 3d shapes of the deformed cardboards at three different manual pressures are shown in Fig. 5c,d,respectively. These results verify that the proposed method enables the 3d reconstruction of object vibration and deformation. Meanwhile,the 3d reconstruction ability for low-texture objects renders the technique advantageous for active imaging techniques over passive imaging technique³⁹,such as binocular stereo vision and depth from defocus. note that our method can also work for measured scene with larger object size (Supplementary note 10).

Fig. 5: 3d imaging of cardboard under three deformation states.

a Sketch map of deformed cardboard under test. The area of the measured scene is ~40 mm 80 mm. The cardboard is fixed with the splint,and the deformation is excited with the loose end of the cardboard manually. b Captured images of cardboard by a camera. The boundary is plotted as the coloured dotted line. c – d Side views and 3d geometric maps under three deformation states.

Furthermore,we demonstrate that our proposed method can achieve 3d reconstruction of a discontinuous object with variant reflectivity. We perform experimental verification by reconstructing gestures using metasurface projection. Owing to the different reflection characteristics between the human skin and background,the pattern image of the skin has some relatively rougher details,as shown in Fig. 6a. however,our algorithm implementation mostly depends on the spatial distribution feature,offering a feasible solution for the corresponding pattern search and 3d reconstruction. As expected,the depth map and 3d point cloud maps of the three gestures are recovered as shown in Fig. 6b,c,respectively,and the position,height,and orientation of the fingertips or hand are highly similar to the camera images in Fig. 6a. Eventually,both the point cloud maps of fingers (or hands) and the background are calculated successfully,indicating that our proposed 3d imaging method can achieve the reconstruction of real scenes with complex reflectivity distribution.

Fig. 6: 3d imaging for gesture acquisition.

a Captured images of three gestures. The images have been cropped to shown the gesture. The cropped area of the measured scene is ~60 mm 60 mm. b depth maps. The depth map shows the similarity with the 2d (two-dimensional) contour of captured images,such as the position and orientation of fingertips or hand,and the white curves denote the outline of fingertips and hand of the captured images. c Point cloud maps of the reconstructed gestures. The depths of both the fingers and the background are reconstructed,indicating the ability of the reconstruction of real scenes with complex reflectivity distribution.

In addition,a discontinuous object may cause abrupt changes and large deformations of the projection pattern in camera images. Benefitting from the adaptive balance between geometry similarity and global constraint in our matching algorithm,the depth between the fingers or hands and the background was reconstructed successfully without depth fuzziness,showing the adaptability of 3d imaging for recovering discontinuous objects. note that the results in Fig. 6 have a relatively low spatial resolution,which can be improved by increasing the number of point cloud projections.

Spatial resolution improvement in 3d imaging

Because the reconstructed result with the proposed method mainly depends on the spatial features,the density of the pattern has an important effect on the spatial resolution (see Supplementary note 9). The fundamental limitation of dot density is the pixel number of the metasurface,which will increase with the booming development of fabrication techniques (see Supplementary note 2). We then successfully demonstrated the spatial resolution improvement with the other two samples,Sample #1 and Sample #2 (see Supplementary note 2),in which the metasurface size was 1 mm ×1 mm,and the number of bright projection dots was 6609 and 14768,respectively. The 3d reconstructed point cloud maps of the gestures are shown in Fig. 7. Both samples can achieve 3d reconstruction,but the partial point cloud map with Sample #2 has a more continuous transition than Sample #1,as shown in Fig. 7g,h,which is caused by the denser dots endowed with smaller subsets in the match algorithm. Then,the spatial resolution improvement of 3d results will be achieved with an increasing number of dots,indicating the superiority of our method in 3d imaging.

Fig. 7: Gesture acquisition of Sample #1 (a,b,c) and Sample #2 (d,e,f).

a,b,c 3d results of gesture with Sample #1. The captured images with Sample #1. The captured images are listed in the left column,and the top and side views of point cloud maps are shown in the middle and right columns,respectively. d,e,f 3d results of gesture with Sample #2. The captured images,the top and side views of point cloud maps are shown in the left,middle and right columns,respectively. g,h Enlarged view of the region with the boundary of the same colour as Sample #1 and Sample #2,respectively.