## Abstract

State-of-the-art snapshot spectral imaging (SI) systems introduce color-coded apertures (CCAs) into their setups to obtain a flexible spatial-spectral modulation, allowing spectral information to be reconstructed from a set of coded measurements. Besides the CCA, other optical elements, such as lenses, prisms, or beam splitters, are usually employed, making systems large and impractical. Recently, diffractive optical elements (DOEs) have partially replaced refractive lenses to drastically reduce the size of the SI devices. The sensing model of these systems is represented as a projection modeled by a spatially shift-invariant convolution between the unknown scene and a point spread function (PSF) at each spectral band. However, the height maps of the DOE are the only free parameters that offer changes in the spectral modulation, which causes the ill-posedness of the reconstruction to increase significantly. To overcome this challenge, our work explores the advantages of the spectral modulation of an optical setup composed of a DOE and a CCA. Specifically, the light is diffracted by the DOE and then filtered by the CCA, located close to the sensor. A shift-variant property of the proposed system is clearly evidenced, resulting in a different PSF for each pixel, where a symmetric structure constraint is imposed on the CCA to reduce the high number of resulting PSFs. Additionally, we jointly design the DOE and the CCA parameters with a fully differentiable image formation model using an end-to-end approach to minimize the deviation between the true and reconstructed image over a large set of images. Simulation shows that the proposed system improves the spectral reconstruction quality in up to 4 dB compared with current state-of-the-art systems. Finally, experimental results with a fabricated prototype in indoor and outdoor scenes validate the proposed system, where it can recover up to 49 high-fidelity spectral bands in the 420–660 nm.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. INTRODUCTION

Spectral images are three-dimensional (3D) data structures that include several 2D images of the same scene measured at different wavelengths. Traditionally, these data are an enabling factor in diverse applications, including medical imaging [1], remote sensing [2], defense and surveillance [3], and food quality assessment [4]. The amount of spatial information across a multitude of wavelengths represents one of the main challenges of the traditional scanning-acquisition imaging systems, since to obtain several high-definition images, these systems require long exposure time, therefore limiting their use in real-time applications [5].

Currently, snapshot spectral imaging (SI) techniques based on compressed sensing (CS) allow a drastic reduction in the amount of spectral information acquired by sensing coded projections where a recovery process needs to be carried out [6]. In this setting, a spectral image can be accurately estimated from a linear system, whose sensing matrix represents the random measurement acquisition. Several refractive-based snapshot SI devices have been proposed. Examples include the coded aperture snapshot spectral imager (CASSI) [7,8], the dual-coded hyperspectral imager (DCSI) [9], the spatial-spectral encoded compressive hyperspectral imaging system (SSCSI) [10], the snapshot colored compressive spectral imager (SCCSI) [11], prism-mask video imaging spectrometry (PMVIS) [12], and the single-pixel camera spectrometer (SPCS) [13]. An overview of the different sampling methods of these cameras can be found in [5].

A common characteristic of the refractive-optics-based devices is that they span different coding strategies. The most general method employs a black-and-white coded aperture (CA) with opaque or transparent features that block or let the light pass through each particular spatial point. Because the same pattern encodes all spectral bands, this strategy is known as spatial coding, where a digital micromirror device (DMD) is typically used to implement it [14]. Another method consists of employing optical filter arrays, known as color-coded aperture (CCA), which performs spatial and spectral coding entailing more powerful modulation, improving the probability of recovery from an ill-posed problem [15].

One of the main disadvantages of the state-of-the-art refractive CSI imaging systems is that they require expensive, large, and heavy optical elements, resulting in relatively large devices from factors compared to modern cameras. An increasing and exciting research topic that has the potential of overcoming these limitations of mobility in existing snapshot hyperspectral systems [16,17] is the design of diffractive optical elements (DOEs) used as a single optic lens. Mathematically, the image formation model, in setups that use DOE, is the spectral light integration of a spatially shift-invariant convolution between the unknown scene and a point spread function (PSF) at each spectral wavelength. This convolutional imaging formation model has the advantage of low computational complexity; however, the spatial-spectral modulation is limited by the height maps of the DOE, which produces an ill-posedness of the reconstruction process thar increases significantly [17].

Exploiting the advantage of the compactness of the DOEs and the rich spectral codification of the CCA, our work proposes a shift-variant colored coded diffractive (SCCD) SI system that is composed of a DOE and a CCA. Physically, in the proposed SCCD, the light is diffracted by the DOE and is then filtered by the CCA, which is located at a distance close to the sensor. The shift-variant property is clearly evidenced in the sensing model. Therefore, a contribution of this work is the symmetric structure constraint imposed on the CCA design, which reduces drastically the complexity of the implementation of the variant system, which decreases from ${N^2}$ to ${Q^2}$ PSFs with $Q \ll N$. Additionally, we jointly design the DOE height and the spectral response of the CCA with a fully differentiable forward model, considering three available technologies, using an end-to-end (E2E) model [18,19]. In particular, the E2E design consists of optimizing the optical parameters coupled with the weights of a deep neural network (DNN) that minimizes the deviation between the true and reconstructed image over a large set of images [20]. In simulations, the proposed system shows an improvement in the spectral reconstruction quality of up to 4 dB compared with state-of-the-art systems. Furthermore, an SCCD prototype was fabricated to experimentally validate the reconstruction quality in indoor and outdoor scenes, where an additional test was carried out to obtain 49 high-fidelity spectral bands from 420 to 660 nm, showing the effectiveness of the proposed system.

## 2. RELATED WORK

This work is closely related to SI systems. Traditionally, SI systems are based on scanning techniques such as whiskbroom or push-broom scanners [21,22]. Based on a dispersive optical element, such as a prism or a diffraction grating, scanning-based approaches can capture each wavelength of light in isolation through a slit. While scanning yields high spatial and spectral resolution, the target subjects are limited to static objects or remote scenes. In contrast, snapshot SI systems acquire a single coded projection of the scene and recover the SI in a postprocessing step, overcoming the limitations of static scenes. State-of-the-art snapshot SIs can be grouped in coded SI and diffractive SI. Since the proposed SCCD system uses CCA before the sensor and a DOE as a lens into the setup, the SSCD is based on these two technologies.

*Coded SI:* These systems were introduced to mitigate the limitations of scanning methods [23–25]. For instance, computed tomography imaging spectrometry (CTIS) [23] employs a diffraction grating with imaging and relay lenses. The grating splits the collimated incident light into diffraction patterns in different directions while sacrificing the spatial resolution for computed tomography. Coded aperture snapshot SI [7,11] was introduced for capturing dynamic objects. A dispersive optical element is coupled with a CA through relay lenses to encode spectral or spatial-spectral signatures. The authors of [15] replaced the binary CA [7] by a CCA, adding more powerful spatial-spectral modulation. Then, the compressive encoded measurements are used to recovery the SI using an additional postprocessing procedure. These two types of snapshot SI systems require several optical elements to collimate and disperse light (or modulate light for CASSI), making them bulky and hard to handle in practice. Recently, the authors of [26] introduced a compact SI method to enhance mobility. However, since the method is still based on optical elements, it requires a prism attached in front of a digital single-lens reflex (DSLR) camera. In contrast, our system requires a single diffractive imaging lens and a CCA close to a conventional bare image sensor.

*Diffractive optical elements:* Diffractive optical elements have microstructured patterns that modify the phase of the incident wave light and vary according to the spectral wavelength, i.e., the performance is wavelength-dependent. Recently, Wang and Menon [27], introduced several diffractive filter arrays for multicolor imaging without conventional Bayer-pattern color filters. However, such a DOE should be installed through a geometrical optical system with an additional imaging lens. On the other hand, the use of DOE as a lens has been used in [17,28,29]. These systems employ an irregular height lens that modulates the phase of the incident wave, which is modeled using a PSF at different wavelengths. Specifically, in [29], an optimization algorithm for the design of a DOE from a concatenation of ultrathin plates is presented, as well as a computational method to adjust DOE parameters and a self-calibration. In [17], a *Spiral* DOE design was proposed, which offers rotation in the PSF concerning the incident wavelength. However, these designs prove a low power of codification limited degrees of freedom, i.e., the produced modulation DOEs do not significantly differ between neighboring pixels and continuous bands, limiting high spectral resolution. Our proposed SCCD system provides a shift-varying PSF design to overcome this challenge.

## 3. SPATIALLY SHIFT-VARIANT SYSTEM WITH ${\rm DOE + CCA}$

The proposed SCCD camera is composed of only three optical elements: a DOE, which also works as the imaging as a lens; a CCA placed near the sensor; and a bare image sensor. Figure 1 shows a schematic illustration of the SCCD. The DOE is located at a distance $z = {z_1} + {z_2}$ to the sensor and diffracts the incoming light up to the CCA located at a distance ${z_2}$ near to the sensor, where it is then filtered in the spectral domain. Finally, the diffracted wave is propagated up to the sensor. In the following, the forward propagation model of the proposed system is established.

#### A. CCA Ensemble

Assuming the wavelength response of the CCA remains approximately constant over a spatial region of size $\Delta _m^2$, the CCA can be modeled by using a rectangular function as

The model in Eq. (1) is widely general, where a square CCA with $N \times N$ filter array with $L$ spectral bands results in ${N^2}L$ parameters ${w_{i,j,k}}$. In practice, it is not possible to customize arbitrarily spectral responses of the filters due to fabrication constraints [15]. Thus, the complexity of the model can be reduced by constraining the CCA to be the linear combination of a given set of $R$ optical workable filters $\{{{T^\ell}(\lambda)} \}_{\ell = 0}^{R - 1}$. These filters can be seen as the primary colors of the design. Each of the filters can be expressed as

On some occasions, the set of filters $\{{{T^\ell}(\lambda)} \}_{\ell = 0}^{R - 1}$ are not primary filters; instead, they are only a set of available filters to be used. This limitation comes from the technological or the manufacturer constraints, which offers one a predefined and fixed set of optical filters. In this case,

#### B. Image Formation Model

This section presents the image formation model for the proposed system shown in Fig. 1. We assume that there is a set of ${N_s} \times {N_s}$ equispaced point-illuminated sources expressed as

The modulated field ${U_{\lambda ,(p,q)}}(x,y,{z_1})$ continues to propagate a distance ${z_2}$, where the sensor is placed. Therefore, the measured field, which is denoted as the PSF of the system for the $(p,q)$ point, is expressed as

#### C. Theoretical Advantages of the Proposed System

Considering the discrete model derived in Eq. (20), one could conclude that it is computationally less attractive, since it results in $Q \times Q$ different PSFs, as illustrated in Fig. 3; however, the fact that this is a spatially variant system leads to an increase in the spectral resolution in comparison with a spatially invariant system. Roughly speaking, analyzing locally the measurements of the proposed imaging architecture in Fig. 1, it can be observed that two adjacent neighborhoods are dominated with highly differentiated PSFs, allowing an increase in the spectral resolution by sacrificing the spatial resolution, a scenario that is not possible in a spatially invariant system. The following lemma summarizes the result.

**Lemma 3.1.** *Let* ${{\textbf{Y}}_k}$, *be the spatially variant measurement model in* Eq. (19) *and* ${{\textbf{G}}_k}$ *its invariant counterpart measurement model, i.e., without CCA. Assuming that an image is locally similar, then we have that each neighborhood* $({{\textbf{Y}}_k})_{m,n}^{a,b} = ({{\textbf{Y}}_k}{)_{m + a,n + b}}$ *possesses a different PSF. In contrast, at* $({{\textbf{G}}_k})_{m,n}^{a,b} = ({{\textbf{G}}_k}{)_{m + a,n + b}}$ *is dominated by one and spatially shifted PSF*.

*Proof.* The proof is relegated to the Supplement 1.

Considering the mathematical observation in Lemma 3.1 formalized in the Supplement 1, we show that the spatial resolution of the measurements of the proposed system can better differentiate in a given neighborhood. In order to show the advance of the proposed variant system, the maximum cross correlation between two PSFs at a different wavelengths for the central point is evaluated as

The following section introduces an E2E optimization framework to jointly design the optical parameters of the SCCD system, the height maps of the DOE, the filter response of the CCA, and the recovery network parameters. Additionally, this previous affirmation is numerically and experimentally validated in Section 6.

## 4. E2E JOINT DESIGN ${\rm DOE} + {\rm CCA}$

Considering the three possible studied technologies to develop CCA, this section presents an E2E methodology to jointly design a DOE and a CCA in the proposed SCCD system. Specifically, the E2E approach models the forward system of the optical camera as a fully differentiable model to be included as a layer in a DNN, as illustrated in Fig. 5. The DOE and the CCA are modeled as optimization parameters in an optical layer, which are optimized jointly with the network parameters, using a state-of-the-art deep-learning optimization algorithm such as stochastic optimization derived from adaptive moment estimation (Adam) [31]. This work is based on the camera-designed parameter, while we used a U-net [32] as the recovery network, as explained in the simulation part. It is important to remark that any predefined architecture can be adapted as a recovery network. The optical element size, sensor pixel size, propagation distance $z = {z_1} + {z_2}$, and sensor noise level are traded as hyperparameters.

#### A. Feasibility Constraints of Fabrication

To build the DOE and the CCA, we need to cover some manufacturing constraints directly in the design. For instance, the DOE is fabricated with photolithography. DOEs are flat lenses that rely on the small phase delays induced by ultrasmall features in the height map that are usually discretized to build it [16]. To avoid fabrication errors, we represent the height map following the Zernike polynomials [16]. Furthermore, in some of our experiments, we figured out that the thick design of more than 1.511 µm significantly degraded the fabrication quality. Therefore, this restriction needs to be also included in the design. The CCA types presented in the document were addressed using regularization functions that are incorporated in the recovery loss function.

### 1. DOE Parametric Model

To address the constraints of the DOE, we can use a basis representation of the height map ${{\textbf{h}}_{\textit{map}}}$ and optimize for the respective basis coefficients. A natural choice would be a representation using Zernike polynomials, which is given by

where ${{\textbf{z}}_a}$ is the $a$th Zernike polynomial, and ${\beta _a}$ is the corresponding coefficient [16]. During the optimization, we add random uniform noise in the range ${\pm}20\;{\rm nm}$ to the height map before simulating the PSF to increase robustness to manufacturing imperfections. Experimentally, we noticed that a thickness of 1.511 µm covers a phase delay of $2\pi$ for the wavelength 665 nm. Therefore, we applied a wrapping strategy using 1.511 µm as the maximum value and compute the phase for other wavelengths in each forward propagation step in order to address the maximum height constraint.### 2. CCA Parametric Model

To address the CCA constraints, we use two different strategies using *clip function* [33] and a regularization term in the main loss function ${\cal L}$, explained as follows. *Type 1:* in the case when the variables ${w_{i,j,\ell}}$ are a real numbers and the CA is restricted to a set of primary colors, we have that the cost function ${\cal L}$ is given by

*clip function*is applied to satisfy the constraint.

*Type 2:* in the case when the variables ${w_{i,j,\ell}}$ are real numbers, we have that the cost function ${\cal L}$, which the network in Fig. 5 optimizes, is given by

*Type 3:* Finally, in the case when the ${w_{i,j,\ell}}$ are restricted to be binary, we follow a similar idea proposed in [34,35] to address binary values through a regularization term; therefore, we propose to use the following regularizer given by

## 5. FABRICATING CUSTOM OPTICAL ELEMENTS

#### A. Film-Based CA Fabrication

To build a color photography-film CA, six steps are followed, as illustrated in Fig. 6. In particular, this work uses a FUJICHROME Velvia 50 transparency film (35 mm, 36 exposures), since it offers an ISO of 50, which provides high sharpness, daylight-balanced color, and the smallest grain in the market [36]. First, a photograph of a computer screen with the desired CA patterns is taken with a film camera. This work used the Canon EOS Rebel 2000 film camera. Second, the film is soaked in water to swell the gelatin layer, facilitating the subsequent chemical treatments for 1 min. Third, the developer converts the latent image to macroscopic particles of metallic silver; this work used the Kodak D-76 developer [37]. This step takes 9.5 min. Fourth, a stop bath, typically a dilute solution of acetic acid or citric acid, halts the action of the developer, a process that takes 1.5 min. A rinse with clean water may be substituted. Fifth, the fixer makes the image permanent and light-resistant by dissolving remaining silver halide, a step that takes 2 min. A common fixer is hypo, specifically ammonium thiosulfate [37]. Finally, washing in clean water removes any remaining fixer, which can corrode the silver image, leading to discoloration, staining, and fading; this step takes 1.5 min. Figure 7 shows an experimental characterization of the film-based colored CA. We use collimated light and a spectrometer to determine the spectral signatures of the pixel of developed film. From this experiment, it can be seen that this film is indeed a filter to modulate the light in the wavelength dimension. Additionally, we corroborate that the spectral transmission of the pixels preserves the majority of the light.

#### B. Diffractive Lens Fabrication

The resulting DOE lens is fabricated using lithography techniques [28,38]. Specifically, a positive photoresist (AZ-1512, MicroChemicals) on a titanium-coated glass substrate is used to pattern the designed height map with a gray-scale lithography machine (MicroWriter ML3, Durham Magneto Optics). Once the photoresist is developed with a base developer (MF-319, Microposit), it is used as a mold to replicate its patterns on polydimethylsiloxane (PDMS, SYLGARD 184, Dow). When the PDMS is cured, a drop of optically clear UV-curable resin (NOA61, Norland Products) is put between the PDMS and a glass substrate and cured with a mercury vapor lamp. Finally, the PDMS mold is peeled off to form the DOE, where a light-blocking chromium–gold–chromium trilayer is used to coat the circular aperture that blocks the incoming stray light.

## 6. SIMULATIONS AND EXPERIMENTAL RESULTS

The performance of the proposed SCCD system, using the model described in Eq. (20), is evaluated using the proposed CNN trained in an E2E approach. For that reason, the ARAD hyperspectral data set [39] with 460 spectral images of $482 \times 512$ spatial pixels and 31 spectral bands from 400 to 700 nm with a 10 nm step was used, splitting 450 for training and 10 for testing. We augmented the input data sets by scaling them to two different resolutions (half and double) following [39]. Furthermore, we chose 25 spectral channels from 420 to 660 nm to consider the spectral response of the camera, as explained in detail in the experimental part. All the results shown were obtained as the best performance of the five-run trial, changing the initialization.

To evaluate the spectral reconstruction performance of the SCCD system, we simulated the Spiral DOE described in [17], where it was designed by setting 550 nm as the central wavelength and a DOE-sensor distance of 50 mm. For the Spiral, we used the same network architecture employed in the proposed E2E method described below, and the same hyperparameter tuning process and the fixed systems [40].

#### A. Recovery Network

For the recovery network, we use a Unet based with skip connections, [32], as shown in Fig. 5. This architecture comprises four leading operators, *Conv 2D*, consisting of 2D convolutions with an $F$ filter of $3 \times 3$ spatial size, with zero paddings and a ReLu as activation function; the $F$ values used were 16,32,64, as illustrated in Fig. 5; *MaxPool2D*, which is a max-pooling operator of size $2 \times 2$ to reduce the spatial resolution; *UpSampling2D*, which performs an upsampling of $2 \times 2$ that repeats the rows and columns of the feature map using nearest interpolation; and *Concat*, which performs a concatenation between the encoder and decoder in the same depth level as illustrated in Fig. 5. Each gray block in Fig. 5, performs a *conv2D*, the red line performs a *MaxPool2D*, the blue line stands for the *UpSampling2D*, and the black concat line stands for the *Concat* operator. Finally, the last layer performs a $1 \times 1 \times L$ convolution only in the spectral domain.

#### B. E2E Design of the ${\rm DOE} + {\rm CCA}$

In this experiment, we evaluate the performance of the proposed SCCD system under the three different CCA technologies described in Section 6.A. All the measurements were simulated with 30 dB of SNR. The CCA distance was set at ${z_2} = 3\,mm$ according to experimentation (See Supplement 1). Figure 8 shows the obtained optical systems for the tree configuration (top) DOE, (middle) the CCA, and (bottom) some spectral signatures of the resultant filter. It can be observed that Type 1 produces nonsmooth filters, as expected, compared with Types 2 and 3. Additionally, all the proposed DOEs are irregular, i.e., nonsymmetry is evidenced. Table 1 shows the mean and the deviation of the reconstruction quality measured in the peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and spectral angle mapping (SAM) for the testing data set. From Table 1 it can be concluded that the best reconstruction quality is achieved with the CCA Type 1, since this technology has more free parameters ${Q^2}L$ expanding the feasible sets. Additionally, the case of CCA of Types 2 and 3 also improve the results compared with the Spiral lens, which did not employ the CCA; this evidences that the proposed variant system provides the most accurate spectral reconstruction. Figure 9 (left) shows a false RGB mapping of testing recovered images with the PSNR and SSIM and SAM quantitative metrics, and Fig. 9 (right) depicts some spectral points, which show that the proposed method outperforms the Spiral system. Furthermore, a comparison with a refractive system using the same testing image is shown in Supplement 1.

## 7. EXPERIMENTAL RESULTS

This section evaluates the performance using a laboratory prototype. From the simulation part, we select the DOE results (with 3 mm of diameter), and the CCA Type 2 using the fabrication procedure explained in Section 5. The fabricated CCA and DOE are installed in a Canon EOS Rebel T5i camera, which has a CMOS sensor of 18 megapixels, as is illustrated in Fig. 10, using a custom-designed 3D printed holder. There, the CCA is placed at 3 mm from the sensor and the DOE at 50 mm. Since the performance of the proposed method can be affected by installation errors, as shown in Supplement 1, a calibration process is required. Precisely, this consists of obtaining the PSFs of the system and retraining the decoder network with a small data set of 10 spectral scenes and measurements using 100 epochs with a small learning rate of $1{e^{- 6}}$.

To compare with the state-of-the-art method, we fabricated a Spiral DOE, which was installed in another Canon EOS Rebel T5i camera. The spectral camera is placed next to the proposed system, verifying that the alignment of the camera allows one to partially see the same object. Indoor and outdoor sets of scenes were considered. Additionally, to see the spectral behavior in each scene, some random points were measured in the laboratory to the same target with the commercially available spectrometer (Ocean Optics ${\rm USB2000 +}$), used as spectral ground truth (GT). Finally, since the proposed SCCD system argues for obtaining high-fidelity spectral information, a test with the available data and the GPU’s memory of our computer (an Nvidia RTX 3090 with 24 GB) is carried out to recover 49 spectral bands from 420 to 660 nm. To the best of our knowledge, this is the first system able to recover this number of bands in the visible spectrum.

#### A. Indoor Scenes

To compare the proposed camera with the Spiral camera, we evaluated the performance of keeping the fabricated CCA and use the Spiral DOE in the proposed system, which is called ${\rm Spiral} + {\rm CCA}$. The motivation of this configuration is to show the performance of adding the CCA into the traditional systems. For the recovery method, we employed the same network architecture for the decoders (which recovered 25 bands from 420 to 660), where they are trained using the hyperparameter tuning process using the ARAD-data set. Figure 11 shows a testing scene using the three evaluated systems, Spiral, ${\rm Spiral} + {\rm CCA}$, and SCCD. There, the measurements, a false RGB mapping of the recovered spectral images, and two spectral points for the scene are shown. Additionally, the spectral behavior of the chosen points is compared with respect to the measurement from a commercial spectrometer denoted as GT. There it can be seen that all methods provide an acceptable spatial reconstruction. Notice that, although the ${\rm Spiral} + {\rm CCA}$ provides some spatial artifacts, its spectral response overcomes the Spiral, which does not include the CCA. Furthermore, in the SCCD system, the quality of the spectral reconstruction is closer to the GT compared with the nondesigned one (Spiral). Other testing scenes are shown in Supplement 1.

#### B. Outdoor Scenes

Due to the versatility and compactness of the proposed system, it can be used to analyze outdoor scenes. In this section, only the Spiral was used for comparison purposes. The designed DOE and the decoder remain fixed in this test. Figure 12 shows the reconstruction of the proposed SCCD system and the Spiral. It can be observed that the proposed method outperforms the spiral in spectral behavior, since the recovery spectral points are closer to those taken by the spectrometer.

#### C. High-Fidelity SI

To validate the statement that the proposed methods obtain high-fidelity spectral information, this test was developed to obtain 49 recovered spectral bands from 420 to 660 in steps of 5 nm. The same 10 spectral objects acquired in Section 7.A, were obtained with 49 spectral bands. The last layer of the decoder was modified to recover 49 spectral bands while the other layers remain fixed. Specifically, the last weights of the kernels were extrapolated, and then a fine-tuning process was employed with the small data set. Similar to Section 7.A, 100 epochs with a learning rate of $1{e^{- 6}}$ were carried out. Figures 13 and 14 show the reconstructions, where the measurements and RGB false color are illustrated. Furthermore, two spectral points for the scene are selected to analyze the spectral behavior. It can be observed that the proposed SCCD system maintains the spatial distribution of the scene and at the same time aims to recovery high-fidelity spectral information since, as is evident in the zoomed version, that the recovered spectral signatures of the proposed system are closer to the GT, following the transitions given when more bands are acquired. In particular, Fig. 14 presents all the reconstructed hyperspectral bands for the proposed method and a false RGB mapping; notice that the best visual quality is observed for the proposed SCCD system.

## 8. CONCLUSION

A high-fidelity SI system based on a diffractive lens and periodic color filter array denominated as SCCD was presented. A reliable shift-variant model is provided based on the symmetric structure constraints imposed on the CCA. The height map of the DOE and the filter of the CCA is learned using an E2E approach, where these optical parameters are jointly learned with the decoder network parameters. We fabricated the CCA and the DOE, which were incorporated in a commercial camera to build a prototype system to capture various indoor and outdoor scenes. We have demonstrated that the proposed system provided a high spectral resolution compared with the state-of-the-art hyperspectral imaging methods. As future work, the proposed system could be extended to another electromagnetic spectrum such as Longwave Infrared (LWIR) and Midwave Infrared (MWIR), where multiple continuous bands can bring more benefits; the use of metaoptics to modulate the light at subwavelength scales could also be considered.

## Funding

Vicerrectoría de Investigación y Extensión, Universidad Industrial de Santander (VIE-project 2699); Stanford University (ECCS-2026822); Fulbright Colombia; Army Research Office (PECASE).

## Acknowledgment

Henry Arguello was supported by a Fulbright scholarship under the project entitled: “Compressive spectral + light field imaging technology and its potential applications to the Colombian agriculture.” Gordon Wetzstein was supported by a PECASE from the ARO. Part of this work was performed at the Stanford Nano Shared Facilities (SNSF), supported by the National Science Foundation under award ECCS-2026822.

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## Supplemental document

See Supplement 1 for supporting content.

## REFERENCES AND NOTES

**1. **B. Fei, “Hyperspectral imaging in medical applications,” in *Data Handling in Science and Technology*, J. M. Amigo, ed. (Elsevier, 2020). Vol. 32; pp. 523–565.

**2. **C. Hinojosa, J. Bacca, and H. Arguello, “Coded aperture design for compressive spectral subspace clustering,” IEEE J. Sel. Top. Signal Process. **12**, 1589–1600 (2018). [CrossRef]

**3. **M. Shimoni, R. Haelterman, and C. Perneel, “Hypersectral imaging for military and security applications: Combining myriad processing and sensing techniques,” IEEE Geosci. Remote Sens. Mag. **7**(2), 101–117 (2019). [CrossRef]

**4. **Y. Liu, H. Pu, and D.-W. Sun, “Hyperspectral imaging technique for evaluating food quality and safety during various processes: a review of recent applications,” Trends Food Sci. Technol. **69**, 25–35 (2017). [CrossRef]

**5. **X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin, and D. J. Brady, “Computational snapshot multispectral cameras: Toward dynamic capture of the spectral world,” IEEE Signal Process. Mag. **33**(5), 95–108 (2016). [CrossRef]

**6. **D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory **52**, 1289–1306 (2006). [CrossRef]

**7. **A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. **47**, B44–B51 (2008). [CrossRef]

**8. **C. Li, T. Sun, K. F. Kelly, and Y. Zhang, “A compressive sensing and unmixing scheme for hyperspectral data processing,” IEEE Trans. Image Process. **21**, 1200–1210 (2012). [CrossRef]

**9. **X. Lin, G. Wetzstein, Y. Liu, and Q. Dai, “Dual-coded hyper-spectral imaging,” Opt. Lett. **39**, 2044–2047 (2014). [CrossRef]

**10. **X. Lin, Y. Liu, J. Wu, and Q. Dai, “Spatial-spectral encoded compressive hyperspectral imaging,” ACM Trans. Graph. **33**, 233 (2014). [CrossRef]

**11. **C. V. Correa, H. Arguello, and G. R. Arce, “Snapshot colored compressive spectral imager,” J. Opt. Soc. Am. A **32**, 1754–1763 (2015). [CrossRef]

**12. **X. Cao, H. Du, X. Tong, Q. Dai, and S. Lin, “A prism-mask system for multispectral video acquisition,” IEEE Trans. Pattern Anal. Mach. Intell. **33**, 2423–2435 (2011). [CrossRef]

**13. **Y. August, C. Vachman, Y. Rivenson, and A. Stern, “Compressive hyperspectral imaging by random separable projections in both the spatial and the spectral domains,” Appl. Opt. **52**, D46–D54 (2013). [CrossRef]

**14. **Y. Wu, I. O. Mirza, G. R. Arce, and D. W. Prather, “Development of a digital-micromirror-device-based multishot snapshot spectral imaging system,” Opt. Lett. **36**, 2692–2694 (2011). [CrossRef]

**15. **H. Arguello and G. R. Arce, “Colored coded aperture design by concentration of measure in compressive spectral imaging,” IEEE Trans. Image Process. **23**, 1896–1908 (2014). [CrossRef]

**16. **V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. **37**, 1–13 (2018). [CrossRef]

**17. **D. S. Jeon, S.-H. Baek, S. Yi, Q. Fu, X. Dun, W. Heidrich, and M. H. Kim, “Compact snapshot hyperspectral imaging with diffracted rotation,” ACM Trans. Graph. **38**, 117 (2019). [CrossRef]

**18. **D. G. Stork and M. D. Robinson, “Theoretical foundations for joint digital-optical analysis of electro-optical imaging systems,” Appl. Opt. **47**, B64–B75 (2008). [CrossRef]

**19. **G. Wetzstein, A. Ozcan, S. Gigan, S. Fan, D. Englund, M. Soljačić, C. Denz, D. A. Miller, and D. Psaltis, “Inference in artificial intelligence with deep optics and photonics,” Nature **588**, 39–47 (2020). [CrossRef]

**20. **S. Karen and A. Zisserman, “Deep convolutional networks for large-scale image recognition,” arXiv:1409.1556 (2014).

**21. **N. Brusco, S. Capeleto, M. Fedel, A. Paviotti, L. Poletto, G. M. Cortelazzo, and G. Tondello, “A system for 3D modeling frescoed historical buildings with multispectral texture information,” Mach. Vis. Appl. **17**, 373–393 (2006). [CrossRef]

**22. **W. M. Porter and H. T. Enmark, “A system overview of the airborne visible/infrared imaging spectrometer (aviris),” Proc. SPIE **834**, 22–31 (1987). [CrossRef]

**23. **R. Habel, M. Kudenov, and M. Wimmer, “Practical spectral photography,” Computer Graphics Forum **31**, 449–458 (2012). [CrossRef]

**24. **W. R. Johnson, D. W. Wilson, W. Fink, M. S. Humayun, and G. H. Bearman, “Snapshot hyperspectral imaging in ophthalmology,” J. Biomed. Opt. **12**, 014036 (2007). [CrossRef]

**25. **T. Okamoto, A. Takahashi, and I. Yamaguchi, “Simultaneous acquisition of spectral and spatial intensity distribution,” Appl. Spectrosc. **47**, 1198–1202 (1993). [CrossRef]

**26. **S.-H. Baek, I. Kim, D. Gutierrez, and M. H. Kim, “Compact single-shot hyperspectral imaging using a prism,” ACM Trans. Graph. **36**, 1–12 (2017). [CrossRef]

**27. **P. Wang and R. Menon, “Ultra-high-sensitivity color imaging via a transparent diffractive-filter array and computational optics,” Optica **2**, 933–939 (2015). [CrossRef]

**28. **X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,” Optica **7**, 913–922 (2020). [CrossRef]

**29. **F. Heide, Q. Fu, Y. Peng, and W. Heidrich, “Encoded diffractive optics for full-spectrum computational imaging,” Sci. Rep. **6**, 33543 (2016). [CrossRef]

**30. **J. W. Goodman, *Introduction to Fourier Optics* (Roberts and Company, 2005).

**31. **D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv:1412.6980 (2014).

**32. **O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in *International Conference on Medical Image Computing and Computer-Assisted Intervention* (Springer, 2015), pp. 234–241.

**33. **I. Goodfellow, Y. Bengio, and A. Courville, *Deep Learning* (MIT, 2016).

**34. **J. Bacca, L. Galvis, and H. Arguello, “Coupled deep learning coded aperture design for compressive image classification,” Opt. Express **28**, 8528–8540 (2020). [CrossRef]

**35. **J. Bacca, T. Gelvez, and H. Arguello, “Deep coded aperture design: An end-to-end approach for computational imaging tasks,” arXiv:2105.03390 (2021).

**36. **More details about the developing process of photographic film can be found at https://www.bhphotovideo.com

**37. **J. Hedgecoe, S. Gorton, J. Marffy, and D. Pugh, *John Hedgecoe’s New Book of Photography* (Dorling Kindersley, 1994).

**38. **H. Ikoma, C. M. Nguyen, C. A. Metzler, Y. Peng, and G. Wetzstein, “Depth from defocus with learned optics for imaging and occlusion-aware depth estimation,” in *IEEE International Conference on Computational Photography (ICCP)* (IEEE, 2021), pp. 1–12.

**39. **B. Arad, R. Timofte, O. Ben-Shahar, Y.-T. Lin, and G. D. Finlayson, “Ntire 2020 challenge on spectral reconstruction from an RGB image,” in *Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops* (2020), pp. 446–447.

**40. **All the code was implemented in TensorFlow 2.0 and can be found in https://github.com/jorgebaccauis/Shift-Variant-System.