III-B Offline lhMap Generation Network
As shown in Fig. 2,we is use use the offline lhMap generation network to compress the pre – build lidar point cloud map . It is realize through two stage .
In the first stage ,we is realize realize the point selection to compress the dense map andpose supervision to refine the generate local map .
to satisfy the requirement for point selection andmap compression ,the project lidar depth is used to construct an evaluation system for point cloud . It is calculate as :
|
|
|
(1) |
|
|
|
( 2 ) |
here ,, represents the camera intrinsics,and represents the ground truth camera pose at each frame.
Additionally,the offline RGB image andprojected lidAR depth are used to perform pose supervision.
Based on the initial rough camera pose at each frame,which can be acquired by Gps or visual odometry, is calculate as :
|
|
|
(3) |
|
|
|
(4) |
here ,.
Both and contain only the depth information of point clouds.
Firstly,feature maps ,and with different scales are extracted from ,,and respectively,through convolutional neural networks (CNN). ,the CNN feature of the projected lidAR depth is used to generate the heat feature .
the point clouds are selected by evaluating heat value. heat value is calculated by heat feature which is generated by . Each element) is used to calculate heat value for point clouds evaluation as:
|
|
|
( 5 ) |
|
|
|
(6) |
here , represents the number of channels of .
subsequently,points exhibiting the highest heat values are selected to constitute the coarse local lhMap,denoted as
.
|
|
|
( 7 ) |
|
|
|
(8) |
during the generation of ,the pose supervision is adopted to guide the procedure. the pose supervision module incorporates two inputs: the heat feature ,and the optical flow embedding ,which is derived from and based on the iterative optimization structure of pWCNet[ 29 ]. pose supervision is realized by pose calculation module,detailed in sec. III – C.
the single stage 1 learning fails to converge. therefore,we propose the second stage to refine lhMap.
In the second stage,we apply to the coarse local lhMap to recover the initial localization results.
the initial coarse local lhMap andthe offline RGB image are used for further pose supervision.
Because both stages share the same offline RGB image,they share the same feature maps of the RGB image naturally,while the feature maps of the initial coarse local lhMap are regenerated. then,the heat feature is generated by andthe flow embedding is generated by and. At last,they work together for pose supervision.
pose supervision is realized by the pose calculation module which is introduced in sec. III – C.
In this stage,we regress another set of 6-doF pose . Both and refine the local lhMap by optimising the heat feature .
the output of this network is the lidAR point cloud heat Map (lhMap) combined by the refined local lhMap at each frame. though the local lhMap contains only the depth information,by taking the inverse of the projection formulation,we can obtain the 3d coordinates information at each frame . With the knowledge of the ground truth camera pose at frame andthe point of frame ,we can convert to the world frame:
|
|
|
(9) |
here , represents the points at the frame in the world coordinate system. the lhMap is constructed by uniting all the points together through an union operation :
|
|
|
(10) |
the loss function of the offline heat map generation network is similar to CMRNet[ 16 ].
let and represent the ground truth camera pose. the angular distance between quaternions is used to evaluate the rotation loss. the l1-smooth loss is used to evaluate the translation loss,which is defined as:
|
|
|
( 11 ) |
|
|
|
(12) |
|
|
|
(13) |
here , are the components of quaternion and is the multiplicative operation between two quaternions.
the pose loss is defined as:
|
|
|
(14) |
the pose regressed by the pose supervision module in stage 1 andthe pose regressed by the pose supervision module in stage 2 are both taken into account for better supervision. therefore,the total loss is defined as:
|
|
|
(15) |
III – C Online pose Regression Network
this network is used for real-time monocular localization. the inputs are the online RGB image andthe real-time lhMap . the is constructed by projecting at each local lhMap stored in the first network to the image plane according to the function in (4).
Firstly,feature maps and are extracted from both inputs andreal-time local lhMap through convolutional neural network ( CNN ) .
then,the feature maps are used to calculate the 2d flow embedding along with the RGB image feature maps andto generate the heat feature alone. here is calculated the same as pWCNet[ 29 ]. the usage of the heat feature enables the pose regression to focus on effective features. therefore,the supervision of the 2d flow embedding andthe regression of 6-doF pose can achieve better performance. the cost volume is then calculate by feed into the softmax layer to generate the coefficients andmultiplying the coefficients with . the cost volume is calculate as :
|
|
|
(16) |
where means element-wise product, means apply softmax to height andwidth dimensions of .
At last,the cost volume is fed into separate Mlps for pose regression:
|
|
|
(17) |
the pose regression is realized by the pose calculation module as shown in Fig. 3. the resolution of the flow feature may be different from that of the heat feature. therefore,the flow feature is transferred to the up-sampled layers to maintain the same resolution as the heat feature before being multiplied with it. the multiplication result then accumulates all the elements across the height andwidth dimensions before being fed into fully connected layers,which are denoted as and. the outputs of this network are 6-doF poses .
the loss function used here follows [ 30 ]. Adding two trainable parameters and,the loss function is defined as:
|
|
|
(18) |
Figure 3: the details of the regression part. Multiply flow embedding andup-sampled heat feature as inputs,and then calculate weighted features. the result is fed into and to regress 6-doF poses