Nonlinear optimization generating the Tomb Mural Blocks by GANS

Tomb murals are the special kind of murals that are buried underground. Due to the narrow exit of the tomb passage, the tomb murals were excavated by dividing the whole mural into blocks, which made lots of information missing between the blocks.The digital restoration technology Image inpainting uses the edge information around the missing parts to spread the information inside of the defect area and ﬁlls the information from the outside to the inside. But it is not suitable for ﬁlling the missing parts between the tomb mural blocks. Because these parts are large for exemplar-based inpainting which may make texture dislocation and for PDE which may make cartoon blur. It is a need to generate the information outwards to complete the information.The generative adversarial network uses deep learning training by the murals remains to generate the information from inside to outside, but the typical GAN doesn‘t have a good nonlinear feature.This paper provided a generating technology based on the deep convolution generative adversarial network to rebuild the missing information between the tomb mural blocks. It built the training data set of the simulation platform with Keras and designed a whole mural generation scheme based on DCGAN. In order to get better generated results to avoid the bad artifacts; it adds the nonlinear layers by choosing 13 layers convolution and 2 deconvolution layers of the generator and contained 5 layers convolution discriminator; it designed a new phased nonlinear loss function by using Pycharm pretreatment for Numpy array ﬁle data sets; ﬁnally, it completed the generate tomb mural information to obtain the good simulation effect.


Introduction
The tomb murals buried underground thousands of years reflect the living conditions, social customs, and artistic tastes of the ancient royal vivid. Because the tomb murals are different from these kinds of murals such as the cave murals and temple murals, the cave murals are mainly effected by windy, dusty, and ultraviolet light, and the temple murals are mainly effected by human-made damage. The tomb murals are completely enclosed spaces before being excavated, and the residual information is highly reliable. But they are too huge to excavate once for a whole mural. So, the archaeologists always make the whole mural into several blocks to move to the museum. It caused a lack of information between the mural blocks, which affected the coherence and integrity of the entire mural. Figure 1 is the blocks of playing polo mural from Crown Prince Zhanghuai. This mural is 6.88 m long and 2.29 m high and dived into 5 blocks. There is some information missing between the neighbours. It is difficult to restore the mural directly because it may lead to stitching dislocation. The first thing should do is that using information extension get the missing parts to help reconstruct the whole mural. The current method of computer-aided ancient murals for digital information restoration is digital image inpainting. It uses the residual mural information to diffuse to the filling edges and fills the information to the missing parts like peeling onions, and finally completes the mural information reconstruction. This kind of technology is mainly completed from two technical directions.On the one hand, the inpainting algorithms based on partial differential equations (PDE) use different diffusion equations to spread the residual information to the missing parts.Such as J Shen and TF Chan [1] proposed a diffusion equation of Total Variation (TV) to solve the diffusion limited optimization problem. Chan et al. [2] added Euler-Lagrange equation to Total Variation to constrain the diffusion direction.Humphrey et al. [3] added auxiliary variables and replaced a function of one variable with a function of two variables to improve inpainting efficiency. Barbu T. et al. [4] proposed a novel TV model based on the second order PDE using the nonlinear second order the Euler-Lagrange diffusion model. They are all belonging to the PDE kind of algorithm. The biggest advantage of these kinds of algorithms is that it is convenient and quick to use, but the disadvantage is that it makes blurry when the missing part is large as the interspace between the mural blocks. On the other hand, it is a texture synthesis model based on exemplar filling. Such as Criminisi et al. [5] proposed an algorithm based on exemplar filling that compares the priority values of each exemplar. [6] changed the exemplars source from global to local using the content perception Markova model. Siadati S Z et al. [7] redefined the priority value of Criminisi's algorithm by adding the weight of structure tensor. They are all belonging to the exemplar-based algorithm. The disadvantage is that the filling of a large number of similar exemplars will produce a mosaic effect. Whether the inpainting algorithm based on PDE or exemplar filling does not help to rebuild the whole mural information because the limit residual information and the large missing parts between the blocks. It is a more challenging task than recovering the deleted part of the image in the past. It needs to deduce the texture information of the extension part based on the information inside, and the generated information needs to be as real as possible. There are some new techniques to help generate the information using deep learning, such as S. Iizuka et al. [8] set up a CGAN to generate the information by adding predefined conditions, and Sabini M et al. [9]set up a DCGAN to generate the information out of the image. That kind of information generation algorithm is good for getting the losing information between the blocks. The generative adversarial network is suitable as a deep learning technique compared with the inpainting methods, and it has an excellent data fitting effect, which is manifested in higher fitting efficiency and sharper effect of generated image samples. But the traditional GAN is difficult to train and unstable, so it needs to be modified to achieve better results for the huge tomb mural restoration. So in this paper, we set up a mural generating method based on DCGAN to restore the missing information, and change the construction of the layers in convolution and deconvution by optimizing the nonlinear activation function. After adjusting the pooling layer of the structure in GANs, the tomb mural blocks extend to the two sides of the mural and get the missing information.

Set up the mural information generating network
The generative adversarial network belongs to the typical deep learning network [10]. The main parts are the generator module G and the discriminator module D occupied two pillar positions respectively in the network architecture. The generator constantly captures input image samples and generates samples similar to frame number images itself. The discriminator makes accurate judgments and scores the input data.The work of the generator and discriminator is iterated repeatedly, and the final result is more and more realistic samples generated by the generator, but it is more and more difficult for the discriminator to distinguish between true and false input data.The aim is to get the Nash equilibrium which is reached that the image generated by the generator is infinitely close to the distribution of the real image, while the discriminator can't distinguish the new image with the original image, and the prediction probability of the image output by the generator is close to 1/2 [11]. The following figure 2 is the essential GAN structure. Both the generator model and the discriminator model shown in Figure 2 can be constructed with differ-entiable functions. The sampling of the original image is used as the input data of the identification model, and the random variable is used as the input data of the generation model. The output is as close as possible to the sample G(Z) of the real data distribution set. The goal of the discriminator is to realize the dichotomy identification of data sources. If the input of the discriminator comes from real data, the judgment is true; if the input source is sample G(Z) , the judgment is false. The goal of the generator is to make the D(G(Z)) displayed on the discriminator of the fake data G(Z) generated by itself be the same as the D(G(Z)) displayed on the discriminator of the real data x.
There are more and more extension models based on GANs, and different models have different improvements [12]. The comparison of the different typical GANs has different characteristics, and they are shown in table 1. The proposal of deep convolution generating confrontation network (DCGAN) largely overcomes the technical difficulties of unstable training of original generative adversarial network and prone model collapse. Therefore, this paper adopts this network model to carry out mural information restoration. The whole structure of the generator is a convolutional neural network structure [13]. Its input is different from that of an ordinary convolutional neural network, which uses images to replace random noise. The convolutional layer mainly uses deconvolution to replace convolution operation. The activation function for the output layer is set to Tanh and the activation functions for the other layers are set to nonlinear ReLU. The discriminative model eliminates the full connection layer and uses another functional layer to replace the full connection layer. The convolutional layer set up in this way can also achieve a good sampling effect. Similar to the generative model, a batch normalization operation is added at all the layers except for the first input, setting the generative function for each layer to nonlinear LeakyReLU. Figure 3 is the network structure of DCGAN. The proposal of this network has a great promotion effect on the development of GAN, and the organic combination of convolutional neural network (CNN) and generative adversarial network (GAN) ensures the quality and diversity of images generated by this kind of structure. In the training process of DCGAN, the special technique of batch normalization (BN) is adopted to make the training process more stable and reliable. After each convolutional layer or deconvolution layer, the corresponding activation function is adopted to effectively solve the problem of gradient disappearance. But the (BN) may generate bad artifact, so it is needed to and more nonlinear layers to make the network more complex. So it is needed to normalize the image batches to a typical Gaussian noise which values from zero to one. By removing the pooling layer, the convolutional neural network can retain the advanced feature information of the image more effectively. The most important thing is that it should take balance to the generator and discriminator training throughout the training process.
3 The nonlinear optimization for generating mural information

The training scheme for the tomb mural blocks
This paper chooses DCGAN as the essential (D, G)construction, the generative model G uses the encoderdecoder convolutional neural network, and the discriminative model D uses marching convolution repeatedly to down-sampling the image for binary classification. It has a training heart CNN to capture the features of the mural blocks. The way of dimensionality reduction in the deep convolutional networks is similar to that of Galerkin reduced order model (ROM) [14], Variational Autoencoder (VAE) [15] and cluster-based reduced order model (CROM) [16]. Nevertheless, while CNN is able to capture the nonlinear image features since the effective loss functions are non-linear.
The following figure 4 is the flow chart for mural information generation based on DCGAN.
For each training image data I d , we pre-processed them to Numpy array to get I n and I p as the testing group and training group. By running generator G to load I p to obtain the output image I o = G(I p ) ∈ [0, 1] 128×128×3 , and the inside of the generator G includes 13 layers convolution and 2 layers deconvolution. The activation function of convolution is chosen nonlinear LeakyReLU and the activation function of deconvolution is chosen nonlinear ReLU, and the activation function of the final output layer is chosen nonlinear Tanh. The inside of the Fig. 4 The information generation flow chart on tomb mural discriminator D just includes 5 layers convolution and its activation function is nonlinear LeakyReLU, and the output layer is Flatten. This GAN will generate new mural information constantly to discriminate.

The design of the generator G
Since the aim is to predict the original size of the tomb mural, and the convolution often reduces the size of the mural images, a 2-layer transposed convolution is used to restore the original size, and the transposed convolution can generate images from specific input data.

Convolution function
Conv is to calculate the sum of N-dimensional convolution:The Layer-input is the input of each layer;Filtersize is the size of the filter; and Kernel -size choose 4; strides choose 2; activation function is LeakyReLU; the dilation of the convolution kernel expands is 1.
AtrousConvolution2-D is to calculate four-dimensional input and the 4-dimensional convolution kernel operations, and do a 2-D convolution for the input data. All the parameters are the same as Conv. The definition of the convolution function for the generator G shows in figure 5.

Deconvolution function
Deconv is to calculate the sum of N-dimensional deconvolution: The Layer-input is the input of each layer; Filter-size is the size of the filter; and Kernel-size choose 3; strides choose 2; activation function is ReLU. The define of the deconvolution function for the generator G shows in figure 6.
For the generator G , the original encoder-decoder structure is still maintained, and the convolution is expanded to increase the receiving field of neurons and improve authenticity. By doing 13 times convolutions and 2 times deconvolutions for the original mural image data the total parameter construction shows below. The detailed parameters of the generator are shown in Table 2 below.

The design of the discriminator D Convolution function
The convolution function for discriminator D is the same as the generator G . Dcrm-loss is the loss function for discriminator, set two tensors y-true and y-pred. Set I l is the left part of I d , and set I l is the right part of I d . Let it fast flip around the ordinate axis to help the input D l have the feedback in the left domain. For the aim of generating the prediction on I d , the discriminator D calculate the D g (I d ) , D l (I l ) and D l (I r ) separately. And put these three outputs into the connector C , and then get the final result of the discriminator D .
The discriminator uses 5 layers of 5 × 5 convolutional layers to repeatedly down-sample the mural images and performs dichotomy. The final parameters of the discriminator are shown in table 3 below. The discriminator is used for distinguishing the real mural images I d from the generated outputs mural images I o . In the discriminator, D g (I d ) = 1 if the targeted outputs are accepted while D g (I d ) = 0 if they are rejected.

Training and define the loss function
In general, the training process in GAN can be considered as a minimization-maximization problem based on across entropy loss function J(D, G): Where p u (u) is a prior distribution for the random datasetu ,and p data (I d ) is the corresponding probability data distribution for the targeted outputs I d . In particular, the min-max process consists of two steps. Stept 1: Given a fixed generator, updating the discriminator by maximizing the discriminator function max J (D) ((θ ) (D) , (θ ) (G) ).
The min-max game has a global optimum forp g (I) = p data (I d ) , ifD and G have enough capacity. The optimal discriminator for theJ (G) (G, D) can be characterized by: The min-imax game in (2) ] + E I∼P g [ p g (I) p data (I d ) + p g (I) ] When the D * (I d ) = 1 2 , the generator can use any random dataset u to synthesis the generated outputs mural image I , and the discriminator loses the ability to distinguish between the real dataset I d and generated new mural image I.
In order to stabilise the training process, the three typical loss functions are defined.
J MSE refers to the mean square error of the original image I n and the restored imageI p ,the smaller the better.J D is the loss function of the discriminator. It is hoped that the discriminator's ability to discriminate whether the image is generated by the generator or the real image, that is the probability of discrimination is infinitely close to 1/2. J G is the generator loss function, and αis an adjustable hyperparameter, which is used to weigh the error loss of the mean square and the generation of the standard generator against the network loss.Finally, the generator weight is updated by the training iteration J MSE to adjust the generator. The weight of the discriminator is updated according to the iterative L D to adjust the discriminator.

Choose training images for pre-processing and post-processing
The tomb mural is huge for digital, so it is better to capture it by 114 sub-lens images which are divided into nine groups. Each mural image is divided into 4 × 416 non-overlapping blocks, and then 114 × 16 = 1824 blocks are compressed into 256 × 256 images for unified numbering. Here we choose the tomb mural data set with a total number of 1487 images were created. The normalized data set is shown in Figure 7 below. There are totally more than 20 GB data for training in the five big blocks in figure 1 shown. In order to prepare for the image training, this paper uses the following pre-processing to give a training image I tr . Firstly the image is normalized toI n ∈ [0, 1] 128×128×3 , and then defined a mask M ∈ [0, 1] 128×128 , and let M i j can cover the middle part of the image. Secondly, it should calculate the average pixel strength u in the non-mask part I n (1 − M) . Thirdly it set as the external pixies value for each channel. To improve the quality of the final output effect, it should do little post-processing for the final I o . Using I o = 255 · I o to reformatI o , and using seamless clone to fuse I tr and I o , to output the final resultI op .

Define the Mask
After preparing the data set, normalize the data set to I n ∈ [0, 1] 128×128×3 , it should define a mask that uses a binary mask M (mask) with only two values of 0 and 1. The value of 1 means the effective information on the mural which needs to remain, and the value of 0 means the missing part which needs to be restored. Multiply the elements in I n by the elements in M. The product of the multiplication of the elements at the corresponding positions of the two matrices is the Hadamard product. Define the I m as bellow: Where the parameterµ is the average pixel value and M is the mask area. Area I n (1 − M) is an area without the mask. Add the two results toI m , and use I m and M to generate the I p ∈ [0, 1] 128×128×4 . The program outputs ( I n , I p ) is the image for generating a confrontation network training.
4 The mural generation results and analysis

The information generation for mural clocks
The mural Polo in Crown Prince Zhanghuai tomb has a total of 1487 pieces of 256 × 256 images, which requires more than 40 hours for training. The continuity of the generated information is better, and the color matches with the initial background are higher, and the pattern texture of the mural can be well restored. And, as the training time increases, the more iterations of the training cycle, the training effect will gradually improve, and the overall image integration will be better.  figure 8 it shows that different training cycle can lead to different generation results. The mural is too huge to training for a long time. It depends on the computer's operation capability. This experiment is output by a computer that has Intel(R) Xeon(R) CPU E5-1650v3@ 3.50 GHz and NVIDIA GeForce RTX 2080 Ti. The training module is set to save the log file during the repair process. The stored data is mainly the loss data of the generator and the discriminator and the recognition loss. The data file is exported and visualized statistical analyzing. The changes in the parameters of the generator loss and the discriminator loss are shown in figure 9. The generator loss parameters around 1.03 and the recognition loss parameters fluctuated around 0.02. The execution time of each step pre-sets as two minutes. With the superposition of the number of iterations, the fluctuation of the final loss function will become smaller and smaller, and eventually tend to be a stable value. The quality of the generated information is getting better and better, and the detailed features are restored. The following figures are the mural information extension on this platform.

The analysis of generated information
The experiment used Keras and Tensorflow to do the tomb mural blocks image extension. Keras is an open source library written by Python. Its biggest advantage is able to use the function library to complete the design of deep learning model debugging and successfully applied to solve specific problems. Tensorflow is a very good numerical open source library. It can clearly point to deal with the process of generating the training process. There are two groups of experimental effects. One is generated by the essential outpainting algorithm with 8 convolutions and 1 de deconvolutions DCGAN in the article [17], the other is our existential algorithm with 13 convolutions and 2 de deconvolutions DCGAN. The generating results in figure10(a1) ∼ (d1) are generated by article [17], and our DCGAN generating results are shown in figure10(a2) ∼ (d2). We choose structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) to evaluate the results.
SSIM is usually used to measure the structural similarity of the two images. The formula derivation mainly involves three measurement indicators: brightness, contrast, and overall image structure. Specifically, the average value can represent the brightness, the standard deviation represents the contrast, and the covariance represents the degree of structural similarity. It is shown in formula 11.
µ x is the average value of existing mural information x, µ y is the average value of extended mural information y, σ 2 x is the variance of x, σ 2 y is the variance of y, and σ xy is the covariance of x and y. c 1 and c 2 are constants. The range of structural similarity is 0 to 1. When generating information and existing information is exactly the same, the value of SSIM is equal to 1.
PSNR means the peak signal that reaches the noise ratio. It is generally used for evaluating the total visual result. It can measure the generated information that has the same quality as the original mural image. The definition of PSNR is shown as formula 12. It can be seen from Table 5 the different results in SSIM and PSNR. Table 5 The evaluation of the generation results in article [17] and ours Evaluation figure a figure b figure c figure d SSIM value [17] 0 By reconstructing the DCGAN for extending the inner information to the mural block side and generating the missing information to get the total mural. Our algorithm is better not only in the visual effect but also in the quantify evaluation.

Conclusion
Traditional image inpainting is to fill the holes in the image, but our restoration technique can help to join the separated mural block by generating the missing information between the mural blocks. The information training based on the modified DCGAN model using the tomb mural residual image can be used to generate the extension part of the mural blocks. It adds CNN to the typical GAN to have a better nonlinear feature. For the generatorG , we keep the original encoder-decoder structure, and exchange convolution 13 layers to 8 layers and exchange deconvolution 2 layers to 1 layer to increase the reception of neurons and improve the authenticity. The more nonlinear convolution layers make it get better results in completing the entire screen.
However, due to the scarcity of the mural residual information, there are not enough images fit for training. The final image restoration results are not perfect. The mural data set can be expanded later to optimize the restoration effect. This program only can process images with 256 × 256pixels. It makes the pixels limit by the data size of the image and makes the generated information not be clear and blurry. So to increase the tomb mural training database is the next thing to do.
T h i s p a g e i s i n t e n t i o n a l l y l e f t b l a n k