Sparsity-based compressive reservoir characterization and modeling by applying ILS-DLA sparse approximation with LARS on DisPat-generated MPS models using seismic, well log, and reservoir data

Abstract. In the earth sciences, there is only one single true reality for a property of any dimension whereas many realization models of the reality might exist. In other words, a set of interpreted multiplicities of an unknown property can be found but only one unique fact exists and the task is to return from the multiplicities to the uniqueness of the reality. Such an objective is mathematically provided by sparse approximation methods. The term "approximation" indicate the sufficiency of an interpretation that is close enough to the true mode, i.e. reality. In geosciences, the multiplicities are provided by multiple-point statistical methods. Realistic modeling of the earth interior demands for more sophisticated geostatistical methods based on true available images, i.e. the training images. Among available MPS methods, the DisPat algorithm is a distance-based MPS method which generate appealing realizations for stationary and nonstationary training images by classifying the patterns based on distance functions using kernel methods. Advances in nonstationary image modeling is an advantage of the DisPat method. Realizations generated by the MPS methods form the training set for the sparse approximation. Sparse approximation is consisted of two steps, i.e. sparse coding and dictionary update, which are alternately used to optimize the trained dictionary. Model selection algorithms like LARS are used for sparse coding. LARS optimizes the regression model sequentially by choosing a proper number of variables and adding the best variable to the active set in each iteration. Out of numerous training dictionary methods given in the literature, the ILS-DLA is a variant of the MOD algorithm where the latter is inspired by the GLA and the whole trained dictionary is sequentially updated by alternating between sparse coding and dictionary training steps. The ILS-DLA is different from the MOD for addressing the internal structure of the dictionary by considering overlapping or non-overlapping blocks and modifying the MOD algorithm according to the internal structure of the trained dictionary. The ILS-DLA is faster than the MOD in the sense that it inverts for smaller blocks constructing the trained dictionary rather than inverting for the entire block. The subject of this paper is an integration study between sparse approximations from image processing and compressed sensing, multiple-point statistics from the field of geostatisitcs, and the geophysical methods and reservoir engineering from the branch of petroleum science. This paper specifically emphasizes the utilization of image processing in solving reservoir complexities and enhancing reservoir models.


Introduction
In geoscience, there is only one single true model for a property of the earth but numerous models of the real earth property exists. This concept is referred to as uncertainty in the geoscience and reservoir modeling. In geoscience and reservoir modeling it is mostly acknowledged to manipulate the models and find the best one among a finite set of possible models by performing further experiments and observing the results. It is neither guaranteed that 5 the best model of a finite set of models is actually the true model nor that it is the best practically achievable model.
Trivially, the more available models as the population samples, the more it is guaranteed that the selected best model is a closer translation of the reality, i.e. the true model.

Model multiplicities by the MPS methods
In reservoir characterization and modeling, providing a large set of population samples is practically impossible 10 unless introducing stochasticity in the models using the multiple-points statistics (MPS) methods. It is known that simple two-point geostatistics methods (Deutsch and Journel, 1998;Strebelle, 2000) fail to properly generate stochastic models especially for complex reservoirs, e.g. a reservoir which is deltaic in one part and fractured carbonate ramps in other parts. Delta properties are nonstationary and fracture network properties can also be nonstationary. Two-point geostatistics methods use variogram to model property variations between two points in 15 the reservoir based on spatial distance. It will fail to properly model for farther distances if the changes in property is abrupt or nonstationary. To alleviate, different variogram models can be built for a neighborhood to capture property variations, but even this method does not completely capture the variations.
As a more advanced variant, a location-independent model can be used as a source of information and statistical changes for multiple-point statistics (MPS) algorithms to train and produce reliable reservoir static models. This 20 location-independent model is called training image and its location-independence property will lead to producing various stochastic reservoir models (Haldorsen and Damsleth, 1990;Mariethoz and Caers, 2015). More realistic training images and more reliable stochastic models will be generated if wide and accurate prior information about the reservoir is available. Pattern-based methods like SimPat (Arpat, 2005), FilterSim (Zhang, 2006), Direct Sampling (Mariethoz and Renard, 2010), DisPat (Honarkhah, 2011), and WaveSim (Chatterjee et al., 2012) are 25 generally best methods for MPS reservoir modeling. The SimPat algorithm is time-consuming and in the FilterSim, defining the filters adds complexity to the algorithm. The WaveSim algorithm is similar to the FilterSim and its complexity is in selecting an optimum scale for wavelet decomposition in the task of pattern classification. DisPat algorithm is known as the best of this kind for its ability to integrate data based on visual system of the human being and representing new algorithms for modeling images with nonstationary properties.

Sparse approximation as mathematical tools to sparsity-based compressed model
The MPS methods are used to generate the set of multiplicities based on one single training image which itself is not known as the true model. Therefore, the MPS generated realizations are all translation of an unknown true model. The task is to achieve one single model image as the representation of the true model from a large set MPS realizations which are considered as the manipulations of the true model. The mathematical tools to perform such 35 a task is known as the sparse approximation from the field of image processing. The term approximation reminds the impossibility or unnecessity of obtaining the exact true model. Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys. Published: 12 September 2016 c Author(s) 2016. CC-BY 3.0 License.
In a wider branch of image processing, the large set of stochastic reservoir MPS model images (generated by any of the MPS methods) could be used as a set of training images for dictionary training in the context of image compression (Aharon et al., 2006a and2006b;Bryt and Elad, 2008;Cheng, 2015;Elad, 2010;Elad and Aharon, 2006;Horev et al, 2012;Khaninezhad and Jafarpour, 2013;Mairal et al., 2009 andRubinstein et al., 2010aRubinstein et al., , 2013Rubinstein et al., , 2010bRubinstein et al., and 2008Skretting and Engan, 2011a, 2011bSkretting and Husøy, 2003;Skretting et al., 5 1999;Starck et al., 2010). There are generally two types of dictionaries that can be employed to compress a set of model images: explicit and implicit dictionaries. The explicit dictionaries are out of the shelve dictionaries with fixed elements usually explicitly stated by analytical equations. The implicit dictionaries are different from the explicit ones in the sense that they are trained and adapted by the specific set of training images. Therefore, the explicit dictionaries are general but less effective and the implicit dictionaries are trained on specific training 10 images but more effective on that set or similar sets of training images.
The process of sparse approximation basically deals with implicit dictionaries which are trained and adapted on specific sets of training samples (images or signals). The sparse approximation is itself a sequential alternation between two steps: sparse coding and dictionary update. The sparse coding step deals with selecting the best vector (or model) from a large set of vectors (or models) which minimizes an optimization problem under some sparsity 15 constraint. In other words, in each iteration, the sparse coding methods like BP (Chen et al., 1999), MP (Mallat and Zhang, 1993), OMP (Rubinstein et al., 2010b;Tropp, 2004;Tropp and Gilbert, 2006), and ORMP (Gharavi-Alkhansari and Huang, 1998) and model selection algorithms like LARS (Efron et al., 2004;Tibshirani, 1996) and Lasso (Hastie et al., 2009) are employed using a fixed dictionary to obtain the sparse coefficients. The obtained sparse coefficients are alternately used in the dictionary update step.

20
Different methods of dictionary updating are presented in the literature, e.g. MOD (Cheng, 2015;Elad and Aharon, 2006), K-SVD (Aharon et al., 2006b;Bryt and Elad, 2008;Rubinstein et al., 2010b;Skretting and Engan, 2011a and 2011b), ODL (Mairal et al., 2009), ILS-DLA (Engan et al., 2007;Skretting and Engan, 2011b), and RLS-DLA Engan, 2011a and2010). The MOD algorithm is inspired by the GLA to sequentially update the trained dictionary by alternating between the sparse coding and dictionary update steps inverting for the whole set 25 of training data. The K-SVD algorithm is the generalization of the k-means clustering algorithm. In the K-SVD, the dictionary is updated sequentially by applying the singular value decomposition on each column of the trained dictionary in each iteration. The Online Dictionary Learning algorithm also follows the two step procedure and it is specifically designed to handle large scale data sets under the scope of online optimization algorithms. The ODL is fast for being based on stochastic approximation and updating the dictionary recursively by using the previous 30 one as the warm restart. The ILS-DLA is a variant of the MOD algorithm and it is different from the MOD for addressing the internal structure of the dictionary by considering overlapping or non-overlapping blocks and modifying the MOD algorithm according to the internal structure of the trained dictionary. The ILS-DLA is faster than the MOD in the sense that it inverts for the smaller blocks which are constructing the trained dictionary rather than inverting for the whole block. The RLS-DLA exploits the same methodology as in the MOD algorithm where 35 the RLS-DLA starts with an initial dictionary and at each step a new vector of data is introduced into the training set and the dictionary is updated accordingly. A forgetting factor is introduced into the algorithm to forget the early stages dictionaries and focus on the late stages dictionaries. Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys. Published: 12 September 2016 c Author(s) 2016. CC-BY 3.0 License.

The case study
In this paper, the sparsity-based compressive reservoir characterization and modeling workflow using the ILS-DLA training dictionary with LARS sparse coding methodologies is explained and applied on a gas injection case of an Iranian oilfield located southwest of Iran. The gas is injecting into the Asmari reservoir of this oilfield to maintain pressure and preserve the recovery factor. The Asmari reservoir in this oilfield is described as a very 5 complicated reservoir because of the presence of interbeds of sandstones and carbonates, the possibility of a fracture network connecting the reservoir along the crest, and the presence of a deltaic depositional system in the west of the reservoir. Modeling such a complicated reservoir demands for new stochastic MPS methods rather than simpler two-point geostatistical methods.
Seismic, well log and reservoir data are used to build semi-industrial reservoir model images and the DisPat MPS 10 method is applied to manipulate the model images and generate a large set of stochastic reservoir model images for two cases of fracture and deltaic systems. The gas injection process is simulated on couple of 2D models extracted from the fractured part and the deltaic part of the reservoir. A set of stochastic realizations based on the training images from the fractured part were generated under the stationarity condition and for the deltaic part under the nonstationarity assumption. Each of these training sets were used to train dictionaries by the ILS-DLA 15 method. The offline trained dictionary was then utilized in a sparsity-based image compression scheme using the LARS method to represent a single model, i.e. the compressed sparsity-based model. Proper experiments for comparison of goodness between the MPS generated stochastic realizations and the resultant compressed sparsitybased model image is to run the simulation for each model and compare the production and pressure profile with those of the true model. Experimental results show that the compressed sparsity-based model images, in almost 20 every cases and testing all the methods, are superior to the majority of the MPS stochastic realizations, 89.58% of the experiments falling in the area of 90%-90% (upper 10%) superiority, and 95.83% in the area of 85%-85% (upper 15%). The results are even more encouraging considering the fact that the MPS realizations are quite stochastic and are not conditioned with hard or soft data, and that the interfering parameters involved in sparse approximation processes could be optimized to achieve much better results.

25
The subject of sparsity-based compressive reservoir characterization and modeling is an integration study which involves the geostatistics (for multiple-point statistics modeling), image processing (for dictionary training, sparse approximation, and image compression), seismic data interpretation and inversion (for providing AI along with saturation and porosity inverted 3D cubes, seismic spectral decomposition 3D cube, interpreted horizons), petrophysics (for providing well log data and interpretation), and reservoir simulation (for providing criteria to 30 quantify the goodness of model image).

Paper structure
In this paper, first the major blocks of the sparsity-based compressive workflow in reservoir characterization and modeling is presented. These main blocks are consisted of the DisPat multiple-point statistics method and the sparse approximation algorithm. The sparse approximation algorithm is consisted of sparse coding and dictionary 35 training steps. For the sparse coding step, the LARS algorithm and its variants are introduced and for the dictionary update step, the ILS-DLA algorithm is represented. The application of the sparsity-based compressive workflow Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys. Published: 12 September 2016 c Author(s) 2016. CC-BY 3.0 License. on a case study is discussed afterwards and the results are presented. The paper is finalized by some concluding points.

DisPat MPS Algorithm
DisPat is a distance-based pattern-based multiple-point geostatistical method which was first introduced by Honarkhah (2011). In this algorithm, alike the other pattern-based MPS methods, the training image is scanned by 5 the designed template and the patterns are extracted from the training image. The patterns are then classified based on distance functions using the kernel methods. A distance function measures the distance between each pair of patterns in the metric space. Any two close points in the metric space refer to two similar patterns from the pattern database. The kernel k-means clustering algorithm is used to classify the patterns in the pattern database.

10
Having the patterns classified, the sequential simulation is performed on the realization grid. The closest cluster to the data event is determined and one pattern from that cluster will be randomly chosen to be replaced on the node location. The distance function which is used to find the closest cluster to the data event has to account for the informed nodes, frozen nodes, and hard data nodes by specifying different weights to each type of the data. The SimPat method uses Euclidean distance function to find the most similar pattern to the data event. Filtersim method 15 uses the Manhattan distance function whereas DisPat algorithm uses two distance functions for two different purposes; the proximity distance transform is used to build the distance matrix and the Manhattan distance function is used to find the most similar pattern to the data event.

Multigrid and Multiresolution
To improve the long-range spatial characteristics of the realizations, DisPat utilizes the widely used multigrid 20 approach and the newly invented multiresolution approach. In the multigrid approach, the grid size does not change but the template size changes from one level to another level. Sequential simulation starts from larger scale and the final realization at each level will be transferred to the lower level considered as informed nodes. As a result, large-scale properties are transferred from one grid level to the next finer grid level. In the multiresolution method, on the other hand, the template size is fixed and the grid size is changing from one level to the next. The simulation 25 starts from the coarsest resolution and continues to the last finer resolution grid. The final realization at each level is informed to the next resolution level where a one-to-one correlation between two consequent grids is not held.
Correspondingly, the realization and the training image are rescaled and interpolated to the next resolution level.

Hard Data Integration
The DisPat algorithm considers two hard data integration scopes for the two multigrid and multiresolution 30 approaches. For the multigrid approach, the DisPat will strictly specify each hard data to the closest possible node and if the node has been introduced before, it will be specified to the next closest enclosing node. The hard data which are left out of this procedure will be rejected. In the multiresolution approach, the grids are different in size and the location of nodes changes from one level to the next level and a one-to-one correlation does not hold. The cubic interpolation is used to transfer the hard data from one level to the next level. For interpolation, either kriging 35 Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys. Published: 12 September 2016 c Author(s) 2016. CC-BY 3.0 License. or inverse-distance weighting method is used. Hard data integration based on multiresolution approach will result in more appealing MPS realizations. The reason for such an improvement is that the multiresolution approach captures the statistical information of the training image better than the multigrid approach. Furthermore, the inverse-distance weighting will better represent the data structure in lower resolution grids.

5
Hard data refers to well data and soft data refers to seismic data. In SimPat, along with the training image, a soft training image is considered and the soft pattern data base is sought and formed as it is done for the training image itself. A weight is given to the soft data to diminish or increase its role in building the final pattern based on the reliability wright given to the soft data. In Filtersim, Zhang (2006) and Wu (2007) proposed a new method for continuous images. They do the conventional unconditional simulation and find the most similar pattern for the 10 node location. Simultaneously, the corresponding soft data event is found and pasted on the uninformed part of the data event and hence, the soft data is injected into the final simulation. For categorical images, the method in SimPat is integrated with the model from the SNESim and a new data event is produced. The algorithm searches for the most similar prototype to the data event and the model is used to combine this data event with the soft data event. Another search is done to find the closest prototype to this newest data event produced by the model.

15
In DisPat, the soft data integration is performed through distance calculations. The pattern databases are formed for the two sets of training image and the soft training image, patterns are clustered and the prototypes are specified.
For each data event on a node location, the distance to each of the prototypes in either sets of the data (training image and the soft data) are calculated. These two distances are combined to form a new distance function in which the soft distance (i.e. the distance of the data event to the prototypes of the soft data) is weighted. This specified 20 weight is itself a factor of the soft data reliability multiplied by the fraction defined as the ratio of the number of the informed nodes in a data event to the total number of nodes in that data event. It can be proven that this distancebased soft data integration is a special case of the model.

Nonstationarity
Training images could be stationary or nonstationary. In a stationary image the patterns are repeated and the 25 statistical properties are steady, whereas in a nonstationary image, no prior information is expected for any location.
In SimPat, a reservoir is subdivided into different regions upon the user's decision and based on the morphological characteristics of the reservoir. Each subregion is scaled or rotated such that the transition to the adjacent subregions are smooth and gradual. In Filtersim, additional morphological information is provided for grid nodes through the two other distinct grids conveying the scaling and rotation information. The drawback to these methods 30 is the discontinuities which appear at the borders of these subregions where their corresponding training images are not coherent. Honarkhah (2011) has proposed three algorithms to generate nonstationary realizations needing only the training image as the prerequisite.
In the Spatial Similarity Method (SSM), the DisPat algorithm saves the patterns in the database along with the location of each pattern in a separate location database. This method is based on the fact that the patterns closer to 35 the data event are more similar to it than the farther patterns. The SSM searches all over the grid for the most similar pattern. The most similar pattern to the data event is the one which minimizes a distance function and is Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys. Published: 12 September 2016 c Author(s) 2016. CC-BY 3.0 License. defined as combination of the weighted pattern distance with the weighted location distance. These weights are complementary and add up to 1. The neighborhood-radius method (NRM) is principally similar to the SSM method except that the algorithm searches within a definite circle of neighborhood and during the simulation, similar patterns inside this neighborhood will be searched for. The distance function will be the distance between the data event and the patterns; the location criteria is applied on the boundary within which the patterns are selected. This 5 method is most suitable for semi-stationary training images where the circle of neighborhood controls the distance for which the statistical properties can be assumed stationary. The automatic segmentation method (ASM) is used for highly nonstationary images in which stationary regions cannot be defined for any border locations. As a result, such images must be automatically segmented into supposedly stationary regions. The approach to such a task is to extract several features of the property in the image and cluster the training image based on these extracted 10 features by the k-mean clustering method. Gabor filter banks are used to extract image properties in few orientation angles at constant frequencies. The energy of each filtered images is then exposed by applying the sigmoid function and convolving with the Gaussian function as the smoothing filter. Additional Spatial component images are added to the set of features as input to the k-mean clustering algorithm. The k-means clustering algorithm will automatically segment the training image into k subregions based on the set of input features. The number of 15 subregions, k, is defined by the user.

Sparse Approximation via Dictionary Learning
The exact way to represent the signal ∈ as a linear combination of the columns of matrix ∈ is given by , where ∈ . The sparsest solution to this problem is expressed as min ‖ ‖ for which the ‖. ‖ counts the number of non-zero entries in the solution. The most desired solution to this system of equations is the 20 one with the fewest number of nonzero coefficients.
In an image processing problem, matrix is called dictionary and its columns are called the atoms. These atoms are the building blocks of the dictionary and are normalized. Vector can be represented as a linear combination of the dictionary atoms, . Curvelets, contourlets, wedgelets, bandlets, steerable wavelets, and short-time The solution to the exact problem is and this is only achievable if is invertible. In real problems, the exact solution is usually unachievable or it is not favorable and the approximate solution suffices. Obtaining the approximate solution to which bears the criterion of being the sparsest, is called the sparse approximation.

30
The formulation for sparse approximation is represented as In this case, matrix is an overcomplete dictionary. The whole process is called compression technique. In the well-known cases of transform coding, the DCT or wavelet dictionaries are used to compress the data. The Sparse transform has wide applications in compression, feature extraction, regularization in inverse problems,

5
denoising, dynamic range compression in images, separation of texture and cartoon content in images, inpainting, facial imagery and more.
Optimal dictionaries are considered as flexible, simple, efficient for which the objective functions are well-defined.
It is appropriate to have a structured dictionary rather than a free one. The proper dictionary is either drawn from a prespecified set of linear transforms or it is adapted to a set of training signals. The prespecified dictionaries are 10 simple and fast and can be easily pseudoinversed if tight frames are used. These predetermined dictionaries are also called analytic dictionaries because the algorithm to derive the coefficients is known and analytically stated.
For the same reason, they are called implicit dictionaries because the coefficients are implicit analytical statements instead of explicitly stated numbers. These dictionaries are highly structured, practically fast and efficient but they lack adaptability and they are only used for general-purpose image compression tasks.

15
Alternatively, the trained content-specific dictionaries outperform the prespecified ones. One drawback to the content-specific dictionaries is its computational time. Whereas this drawback is downplayed nowadays by the ever-growing computational capabilities, but still their drawback of loss of generality remains as these contentspecific dictionaries are optimized for specific classes of images. These learning-based explicit dictionaries are adaptable but costly and inefficient. The coefficients of the learning-based dictionaries are explicitly computed by 20 using machine learning algorithms which train the dictionary on a set of examples. The explicit dictionaries are tuned much finer than the implicit dictionaries and they perform significantly better. The size of the trained dictionary and the processed signal are limited by the complexity constraints.
As a remedy, an input-adaptive approach can be designed to restore generality while preserving adaptivity. A novel dictionary, , with a parametric structure is defined as the product of a fixed non-adaptive base dictionary, , and 25 a sparse atom representation matrix, , i.e. . The choice of imposes a structure on the process of dictionary training which acts as regularizer and reduces the overfitting and instability in the presence of noise.
The universal base dictionary consists of a fixed set of fundamental signals from which all observable dictionary atoms are formed. The generic matrix can have any number of atoms but they are assumed to span the signal space. The matrix provides efficient forward and adjoint operators which bridges the gap between implicit and 30 explicit dictionaries (Horev et al, 2012).
Adaptive methods prefer explicit dictionary representation over the structured ones. In fact, the atoms in is a sparse combination of the atoms in . In a sense, the dictionary can be viewed as an extension to existing dictionaries (matrix ) adding them a new layer of adaptivity. The adaptive structure ( ) is significantly more efficient than the generic dictionary ( ) depending on the -choice. It is evidently more compact to store and 35 transmit. Parametric dictionaries bridges the gap between complexity and adaptivity for gaining a higher degree of freedom in training but sacrificing regularity and efficiency of the result. As a result, the input-adaptive approach compressions are applicable to a wide range of images (Horev et al, 2012) and it provides a simple, flexible, adaptive and efficient dictionary representation for its low complexity, compact representation, stability under noise, and reduced overfitting.
The main difficulty of the sparse representation is the problem of dealing with norm. The basis pursuit (Chen et al., 1999) turns this problem into an norm which is an optimization problem, i.e. Eq. (1) and Eq.
(2) are convexicated by replacing semi-norm with an norm. Therefore, the goal of the basis pursuit is to find a 5 solution to the sparse representation problem for which the number of non-zero coefficients is minimized. On the other hand, Lasso (Tibshirani, 1996) does not minimize the number of non-zero coefficients but it nulls the coefficients by changing the non-zero coefficients until the number of non-zero coefficients reaches a definite threshold. Lasso is in fact a sparsity-constrained sparse coding problem. Lasso is a modification of the Least Angle Regression and Shrinkage (LARS) algorithm and both are described under the scope of model selection algorithms.

Ridge Regression
The ridge coefficients are obtained by minimizing the residual sum of squares which are also penalized by imposing a constraint on their size where is a shrinkage controlling complexity factor and its larger values leads to larger shrinkage. The following is an equivalent way to write the ridge problem and explicitly state the size constraint The inputs to the ridge regression should be centered and standardized before solving for the coefficients. The

25
ridge problem can be rewritten in matrix form as (5) And the solution will be (6) This formula indicates that the ridge estimates are just the ordinary least squares solution scaled by a factor of 30 1 . Writing the SVD decomposition of the matrix as , the least squares fitted vector can be written as (7) and the ridge solution is equivalently Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys. Published: 12 September 2016 c Author(s) 2016. CC-BY 3.0 License.
where is the column of matrix . Knowing that 0, the coefficient is obviously less than 1.
By comparing the ridge regression to the linear regression we conclude that the coordinates are shrunk by a factor of , which suggests that greater shrinkages are corresponding to smaller 's. Small singular values correspond to the directions in the column space of having small variance.

Lasso
The Lasso is in fact a constrained version of ordinary least squares. In image processing, the Lasso is known as basis pursuit. Lasso is similar to the ridge regression and it is given as Similar to the ridge regression, the inputs can be centered and reparametrized to eliminate the role of in the 10 formula. The size constraint ∑ in ridge regression is replaced by the size constraint ∑ .
The Lasso Lagrangian form is given as The parameter should also be adaptively set alike the factor. Each of the subset selection, ridge regression, and . The Lasso solution truncates the least squares solution at zero and translates the rest by a factor of , i.e. | | , which is known to be soft thresholding.
The constraint region for the ridge regression is the disk , and for the Lasso is the diamond | | 20 | | . The solution is the first point where the elliptical response contours of the minimization function touches these constraint regions. In the case of Lasso, since the diamond has corners, the corners are specific solutions to the problem where one of the coefficients is zero. In higher dimensions, the constraint region will be a rhomboid and the number of specific (sparse) solutions will be more and more coefficients will equal to zero at those locations.

25
A generalization to the Lasso and ridge regression solutions is given by compromise between Lasso and ridge regression while keeping the property of having sharp solutions at the corners.

LARS
The least angle regression and shrinkage (LARS) method is introduced by Efron et al. (2004)  Lasso and Forward Stagewise take more steps to finish such that Lasso stays between the other two.

ILS-DLA Algorithm
PCA (Zou et al., 2006), Generalized PCA (Vidal et al., 2005), MOD (Cheng, 2015;Elad, 2010), K-SVD (Aharon et al., 2006b;Bryt and Elad, 2008;Rubinstein et al., 2013 andSkretting and Engan, 2011a), ODL (Mairal et al., 2009), ILS-DLA (Engan et al., 2007;Skretting and Engan, 2011b), and RLS-DLA (Skretting and Engan, number of submatrices which are either blocky or overlapping. The difference between the ILS-DLA and the MOD is that in the MOD, the whole dictionary is updated to be used along with the entire corresponding coefficients and the entire training set in a single inversion step; whereas in the ILS-DLA, the dictionary is structured and for each block inside the dictionary, the corresponding data and coefficients are involved in the inversion step to update for the block dictionary. This block dictionary is consistent over the whole dictionary. Therefore, in each inversion 5 step, the process of dictionary update is reduced to updating a very smaller dictionary corresponding to a block.
On the other hand, the RLS-DLA is quite different in the sense that it starts with an initial dictionary and it updates the dictionary at each step by introducing a new training vector to the set. In the RLS-DLA the dictionary is updated recursively and no matrix inversion is involved. Therefore, in the RLS-DLA no internal structuring of the dictionary is taking place. In a sense, the ILS-DLA stands in the middle of the MOD and RLS-DLA matrix.

Unconstrained block based dictionaries
For each subset of data, i.e. signals , the corresponding weights are given by and they are related by a dictionary matrix ∈ ( ) defined as , for 1, 2, … , . Defining matrix as the 15 augmentation of the training vectors and the coefficient matrix to be the augmentation matrix for of the coefficient vectors corresponding to , for the whole data set we can write , as the following (12) Matrix is the augmentation of matrices . The optimization problem for finding the coefficients , i.e.
the sparse coding step, is given as
Solving for the coefficients is an iterative solution and sparse coding techniques like ORMP, LARS, and their variants can be used to obtain the coefficients. The ILS-DLA is developed to solve the optimization problem stated in Eq. (13) which is an error minimization with sparsity constraints or equivalent to Eq. (13) but with additional constraints.

25
The ILS-DLA algorithm for unrestricted bock based dictionaries is in fact reduced to the method of optimal direction (MOD) for which the algorithm can be written as below. Note that the matrix inversion in Eq. (16)  Having the set of data, Defining an initial dictionary, Set 0 The sparse coding step: Finding the coefficient matrix, , by solving for the optimization problem as argmin ‖ ‖ ∀ using the ORMP method The dictionary update step: Having the coefficient matrix, , obtained from the previous step, the dictionary is updated by solving for the optimization problem argmin (15) using an inversion step as (16) The dictionary columns should to be normalized Set 1 The ILS-DLA algorithm, after (Engan et al., 2007;Skretting and Engan, 2011b)

Unconstrained overlapping dictionaries
Rather than block based matrices, overlapping matrices are also widely appear in signal processing. The ILS-DLA procedure to solve for unconstrained overlapping dictionary matrices follows the same path as the block based matrices but the setting is different. The major dictionary matrix is composed of overlapping submatrices 5 … as (17) and illustrated schematically in figure 1. By substituting Eq. (17) into Eq. (12), the following system of equations is achieved The schematic in figure 1 is helpful in understanding how the sizes of the matrices are obtained. The corresponding and is structured as in the Eq. (17); and … where ∈ and ∈ for 1, … , .
For every block and its corresponding weight and signal data, the following setting holds which is a generalization of the system of Eq. (18). In fact, the coefficients with negative indices principally do 5 not exist, i.e. equal to zero. The corresponding system of equations is written for the whole blocks of the submatrices and their corresponding coefficients and signal data. Note that Eq. (18) and Eq. (20) are equivalent (refer to Appendix A to review an example).
According to Eq. (20), we can write and for convenience we change notation by writing where matrix is block Toeplitz matrix. Based on the above, the ILS-DLA algorithm for the unconstrained block 10 based matrices can be rewritten as for sparse coding step which uses a suboptimal vector selection algorithm or matching pursuit method to obtain (and thus ); and the dictionary update step can be written as argmin (22) 15 to obtain (and thus ) using the following inversion step (23) Again, the dictionary columns need to be normalized at each step.
As it can be noticed, for each iteration in the ILS-DLA, the process of dictionary update is taking place block by block by using an inversion step; this is unlike the MOD method in which the entire data set, dictionaries, and 20 coefficients is inverted for in a single inversion step.

Constrained dictionaries
Constrains are usually applied on 2D and 3D signal data to restrict the problem from getting very expensive and time-consuming. For dictionaries with linear constraints, , the optimization problem of the sparse coding step is written as which is similar to Eq. (13) with additional constraint. The general dictionary is structured as in Eq. (17).
Matrix is an overlapping matrix as in figure 1 and is defined … . Having these settings, the dictionary updating optimization problem for step 2 becomes argmin (25) which follows the same setting as in the case of unconstrained overlapping dictionaries. Looking into the design

15
where is the Lagrange multiplier.
Another case discussed by Engan et al. (2007) is when a large overlapping dictionary, with ∈ is defined as the multiplication of a smaller overlapping dictionary, with ∈ (figure 1a, Eq. (17)), and a block based dictionary, with ∈ (figure 1b, Eq. (12) (32) Therefore, instead of , the weighted vector/matrix will be the input to the minimization problem.
As it can be seen, in functionality, the idea of defining the dictionary matrix as a multiplication of an overlapping 5 and a block based matrix resembles the idea of the input-adaptive approach where we defined the dictionary as a multiplication of a fixed non-adaptive base dictionary, , with an adaptive learning dictionary, , defining .
The ILS-DLA algorithm can be used in an image compression scheme to produce sparsely represented images.
The process of image compression/decompression consists of two stages of an offline ILS-DLA training algorithm 10 and an online image encoding/decoding process. In this scheme, the image is first partitioned into smaller patches and the mean DC value is subtracted from each patch. The non-zero elements of the patches are then quantized by a uniform quantizer and entropy (and also AC and Huffman) coded. The resultant patches are used to train the dictionary by the ILS-DLA algorithm. The discrete cosine transform (DCT) is used as the fixed dictionary matrix.
Other alternatives include LOT and ELT methods. The final trained dictionary would be a matrix like which is

20
Results of this study are multifold. The 3D cube seismic data is deterministically inverted and AI, porosity, and saturation cubes are obtained. The spectral decomposition cube was generated and studied. Two reality-based permeability models were selected from the spectral decomposition cube based on which, many MPS realizations were generated using different MPS methodologies. Different dictionary learning algorithms were used to train content-specific dictionaries based on the training set of generated MPS realizations. Different sparse coding  The porosity profile corresponding to the AI profile indicates that the difference in pressure behavior of the western part from the center and eastern parts are due to porosity change. Looking back into the AI profile approves that the porosity change from west to the center of the reservoir is due to the change in the depositional system. This change of depositional systems causes a drop of porosity by a value of almost 5% from a mean value of 11-12% typical for carbonates to a mean value of 16-17% typical for consolidated fine sandstone. Further study of the 5 porosity cube approves a double-layered reservoir which is compatible with reservoir engineering data. Based on the porosity cube, the petrophysical setting of the reservoir is indicated to be a layering of two porous intervals interbedded within three upper, middle, and lower intact layers.
Studying the saturation profile, further approves that the reservoir is double layered and water-edged. This finding is also compatible with reservoir pressure and production data. The inversion results have been recently approved The above interpretations are provided to emphasize that the selected models utilized as basis for training images must represent the reality of the field and must be highly validated by any types of available data. Accordingly, a porosity section (figure 5a) is selected from the porosity profile extracted from the seismic cube. Furthermore, two permeability models are selected based on the features observed in the spectral decomposition profile (figures 5b and 7 for based on fracture system (figure 5b) and delta system training images (figure 5d), respectively. The generated stochastic MPS realizations are equally probable. In order to introduce most stochasticity within the models, there is no data conditioning applied on the process of generating the MPS realizations.

5
These sets of DisPat-generated realization models are used as training sets to learn the dictionary by ILS-DLA method with LARS. Each image will contribute to the sparse approximation process represented in Eq.
(1) as a column vector in . The columns of the trained dictionary are called atoms and will form the basis to sparsely represent signals and images. Figures 8a and 8b illustrate a number of 56 atoms of the dictionaries trained over the set of stationary fracture system and non-stationary delta system MPS realizations, respectively.

10
The sparsity-based compressed image model reconstructed by training the dictionary using ILS-DLA method (Engan et al., 2007;Skretting and Engan, 2011b) with LARS (Efron et al., 2004;Hastie et al., 2009) over the training set of stationary fracture system realizations generated by the DisPat MPS algorithm (Honarkhah, 2011) is illustrated in figure 9a. Figure  the error of the production profile of that reconstructed image is less than the error of production profile of 98% of the realizations in that training set. Note that the errors are calculated against the production profile of the true image which is used as basis to generate the population sample (i.e. the 3000 realizations used as training set in that specific experiment).
Considering the total 48 experiments, the results are shown to be very encouraging. It is very interesting to note 20 that out of 48 performed experiments, 54.17% of the total sparsity-based compressed models fall in the area of 95%-95% superiority to the population samples (note that the first 95% corresponds to the probability in production superiority and the second one corresponds to the probability in pressure superiority or vice versa), 89.58% of the sparsity-based compressed models fall in the area of 90%-90%, and 95.83% fall in the area of 85%-85%.
Specifically considering the setting of experiment discussed in this paper, the results indicate superiority of the 25 sparsity-based compressed model image over 97.08% of the total MPS realizations generated based on the stationary fracture image. The same comparison for the nonstationary image of the deltaic system indicate a superiority over 94.41% of the total number of MPS realizations.

Discussions
We have practiced the sparsity-based compressive reservoir characterization and modeling algorithm which is able The sparsity-based compressive reservoir characterization and modeling workflow integrates the multiple-point statistics modeling from the geostatistics with the dictionary learning, sparse approximation, and image compression from the image processing and works on the input data to result in sparsity-based compressed model images. The quality controlling is performed by passing the whole MPS realizations along with the sparsity-based compressed resultant models through the reservoir simulation and the pressure drop and production profiles are 5 used as criteria for goodness of models. The input data to this workflow is the seismic inverted 3D cubes (Acoustic Impedance, porosity, and water saturation), seismic spectral decomposition 3D cube, seismic interpretation data, production data, and the well logs. This workflow is applied on semi-industrial model images in the gas injection area of an Iranian Asmari reservoir due to the importance of the subject and to illustrate the effectiveness of the sparsity-based compressive workflow in a complex reservoir.

The case study reservoir, complexities, and uncertainties
The case study for this research is an Iranian oilfield located southwest of Iran. The Asmari reservoir is the main producing layer in this oilfield and two types of matrix and fracture porosities are considered in the simulation models for this reservoir. It is believed that the reservoir is communicating along the crest of the reservoir through the open fracture networks.

15
At the time of this study, 20 wells were drilled across the reservoir. Suitable distribution of the wells has provided an appropriate bank of information gathered from the entire reservoir. The gas is injecting into the reservoir from

25
The pressure behavior of the western part of the reservoir is different from the central and eastern parts. The spectral decomposition of 3D seismic data establishes a deltaic depositional system for the western part of the reservoir and it also evidences that the genesis of the reservoir rock for the western part is different from the rest of the reservoir. Accordingly, the discussed Asmari reservoir is a complicated reservoir which is deltaic in west and it is fractured carbonates with interbeds of sandstone elsewhere ( figure 3a and 3b). The most probable scenario 30 explaining the early breakthrough of gas in the observation wells is to consider the extension of the conjugate fractures at the crest from center to the east.
In figure 3b, the accumulation of the red color (low AI) is where the fracture networks are located. As it is illustrated, two parallel fracture networks are identified and in the middle, there is a non-fractured zone. A fault is joining into the northern fracture set from the northwest of the reservoir. Looking closer to the low AI feature, it 35 is clearly noticed that low AI features are linear and parallel but not continuous; these features are mostly recognizable as fractures. Whereas in the west, branches of the deltaic system diverging from west to east (left quarter of the picture) are quite recognizable. Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys.

Uncertainty and Stochasticity
As the cumulative production of the Iranian oil reservoirs increase, switching to secondary and tertiary oil recovery methods proves a necessity. Few of the southern oilfields are already under secondary recovery methods among which gas injection is the most popular. The success of pressure maintenance scenarios for a reservoir, e.g. gas injection, relies on our knowledge about the reservoir properties. These properties are either dynamic or static.

5
Seismic data interpretation qualitatively characterize the reservoir, for example the reservoir structure and the existence and form of the faults. The 3D cube seismic data for the reservoir under study was interpreted using a commercial software and the well log and vertical seismic profiling data were utilized as the inputs to perform such a task. No effective fault was interpreted hitting the WOC boundary of the reservoir. Thorough interpretation of the faults is important especially when the case of gas injection in considered. The injected gas can find its way 10 out of the reservoir structure through the faults that reach the surface which is the case of an adjacent reservoir.
Along with qualitative properties, quantitative properties also exist, e.g. porosity and permeability. Having seismic attributes as variables, the porosity distribution in the reservoir can be obtained by linear or nonlinear transforms and constraining to porosity data at well locations. The 3D cube porosity distribution was obtained using a multiattribute transform.

15
Finding the permeability distribution in the reservoir is always more difficult and more uncertain because unlike core sections were available from the reservoir under study and therefore the information to build a proper fracture network was poor.
The uncertainty is intrinsically related to measurements and reservoir modeling is no exception. To study uncertainty in reservoir modeling, determining the effective parameters and providing the statistical information about these parameters are the prerequisites (Caers, 2011). In order to include, quantify, and report the uncertainty,

5
suitable modeling methods should be selected and conducted. As the main source of information in the extent of the reservoir, seismic data should be inverted stochastically and based on reliable geostatistical information from the reservoir (Castro, 2007;Wu, 2007). For a reservoir, different models with similar or dissimilar uncertainties can give the same response and hence, more constraining information are required. 4D seismic data provide spatial constraints mostly on the fluid content and transport in the reservoir, which is missing for the studied reservoir.

10
This type of data is especially practical for studying the motion of the gas injection front and its adherence to the high structure, the fracture network directions, and the high porosity high permeability layers.

MPS realizations
A fracture and lithological model based only on the well data is quite uncertain as the wells are sparsely distributed in the reservoir. Geostatistics is the technique to spread the well properties from well locations to the extent of the 15 reservoir. Two-point geostatistics are based on linear changes between two points in the space (variogram). Twopoint geostatistical methods do not generate satisfactory models for complex reservoirs. Multiple-point statistics methods are able to generate more realistic models because it accounts for the changes from a point to different directions simultaneously. Among the current MPS methods, the pattern-based ones are known to have the best performance. SimPat (Arpat, 2005), FilterSim (Zhang, 2006), Direct Sampling (Mariethoz and Renard, 2010),
These algorithms stochastically reproduce the extracted patterns from the training images which are provided based on prior information about the reservoir. The location-independency of the training images along with the stochasticity of the pattern-based MPS algorithm will result in generating different models which are equally probable. The advances made in the DisPat algorithm has facilitated modeling a stationary fracture network ( figure   25 5b) or nonstationary deltaic system (figure 5d) based on training images and soft and hard data integration.
There are further expectations from the MPS in contribution with the field of reservoir engineering and the reservoir simulation in specific. As a specific case, new achievements in the nonstationary image modeling through segmenting the training images into stationary subregions could be correlated with sectoring the reservoir extent based on pressure and production data and eventually providing a better reservoir history-matching. In our 30 example, the reservoir depositional system is a combination of a deltaic and a carbonate ramp system. It is expected that the new MPS methods like DisPat can model such a complex depositional system reliably and the pressure and production data meet this differentiation of the depositional systems.
For our set of experiments, two training images were selected from the 3D spectral decomposition seismic cube representing the fracture network (figure 5b) and the delta system (figure 5d). The fracture network was considered 35 stationary whereas the delta system was assumed nonstationary. The two images were used as input to the DisPat MPS algorithm and a large set of stochastic realizations were generated for each case. The MPS simulation was not conditioned by any hard or soft data to introduce more stochasticity into the realization models. Therefore, the Nonlin. Processes Geophys. Discuss., doi:10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys.
MPS realizations which are utilized as training sets for dictionary training in the sparse approximation procedures, are generated based on the training images under pure stochasticity with no conditioning.

Applying sparse approximation on the MPS realizations
An image is an assembly of information and data redundancy. Information and data are not the same concept; in fact, information is conveyed by the data. Different amounts of data might transfer the same amount of information.

5
Some part of data might contain no related information or repeat the same scope of information. This concept is referred to as data redundancy. Coding, interpixel, and psychovisual are the three sources of data redundancy.
Image compression methods are designed to eliminate the data redundancy and they are either lossy or lossless (Gonzalez and Woods, 2002). Reducing the unnecessary details of the model images generated by the MPS methods as inputs to the reservoir simulation preserves the quality of the results while reducing the computational 10 time. Compressing a model without losing the fundamental information but reducing the dispensable details in an almost lossless manner (Gonzalez and Woods, 2002) will accelerate reservoir simulation especially for complex systems. Reservoir model upscaling can also be substituted by lossy compression (Gonzalez and Woods, 2002) if the amount of data loss is observed and controlled.
As it has been noted, signals can be sparsely represented or approximated by the dictionaries. These dictionaries Reviewing the workflow applied on the case study, the spectral decomposition cube (figure 3a) was used as the source to extract a basis for the fracture network stationary training image and a basis for the deltaic system nonstationary training image (figures 5b and 5d). The DisPat algorithm was utilized to create a large set of MPS

The sparsity-based compressed model
The sparsity-based compressive reservoir characterization and modeling workflow is applied on several sets of stationary and nonstationary generated DisPat, FilterSim, and SNESim MPS realizations. The sparsity-based compression scheme was applied on these training sets using different available dictionary training methods. The

35
statistical information from simulation models for each compressed model image is illustrated in figure 11. For each model image, two criteria are considered for comparison, the production and pressure. The horizontal axis is invent new algorithms in order to achieve such model? This issue is even more interesting when noting that the process of vector selection in the sparse approximation scheme is a random process which means that the chance of selecting a good model/vector for sparse approximation is limited. Therefore, suitable vectors or models, i.e.

30
the MPS realizations for which the pressure and production criteria are better than the sparsity-based compressed model and form a low percentage of the population sample, have limited chance to be selected by the sparse coding step to form a base for reconstructing the sparsity-based compressed model. In a sense, we can loosely conclude that the goodness of a sparsity-based compressed model is mostly relying on the inferior MPS realization models, which form the majority, rather than relying on the superior MPS realization models, that form a low fraction Nonlin. Processes Geophys. Discuss., doi: 10.5194/npg-2016-46, 2016 Manuscript under review for journal Nonlin. Processes Geophys.

Conclusions
The sparsity-based compressed reservoir characterization and modeling workflow is introduced, presented, and applied on semi-industrial scale models from a gas injection case Iranian Asmari reservoir. This workflow is set to achieve a close translation of an unknown true model from a set of models which are assumed as alterations of the true model. The main blocks of this workflow is the multiple-point statistics modeling and the sparse 5 approximation algorithms. The former algorithm generates the set of alterations of true model which are used by the latter algorithm to produce the final sparsity-based compressed model images.
Extracting the properties of porosity and saturation from the deterministically inverted seismic data of the oilfield under study, along with discovering a new potential hydrocarbon layer interval (which was later confirmed by a drilled well) and defining its extents, obviated the structural complexity of this reservoir. This study approved the 10 presence of sandstone interbeds in the Asmari carbonate reservoir and found that the overall fall in porosity is responsible for different behavior of the pressure drop in the western part of the reservoir. The saturation distribution based on deterministic inversion of the seismic data was able to detect the front of the injected gas.
Findings are also consistent with pressure data and the oil components in the observation wells.
The geologic complexities in the Asmari reservoir of this oilfield, noting as the presence of sandstone interbeds 15 in the carbonate reservoir, the existence of a connected and communicating fracture network at and along the crest, and the deltaic system of the western part of the reservoir, makes the fluid flow simulation history-matching of this reservoir a difficult task. Some of these problems refer to the incomplete studies and lack of information but some others are related to the intrinsic incapability of the current algorithms in building realistic static reservoir models. Accordingly, the DisPat algorithm was used to produce a set of realizations for the two training images of 20 the fracture network and the deltaic system because of its capabilities in hard and soft data integration at different resolution levels and its new nonstationary image modeling techniques.
The DisPat MPS algorithm was used to generate a set of stochastic realizations based on the fracture network stationary training image. A similar set of realizations was generated for the deltaic system nonstationary training image. A 2D injection model was tested on both images and the simulation results were set as the basis for