Noise Power Spectrum Scene-Dependency in Simulated Image Capture Systems

The Noise Power Spectrum (NPS) is a standard measure for image capture system noise. It is derived traditionally from captured uniform luminance patches that are unrepresentative of pictorial scene signals. Many contemporary capture systems apply non-linear content-aware signal processing, which renders their noise scene-dependent. For scene-dependent systems, measuring the NPS with respect to uniform patch signals fails to characterize with accuracy: i) system noise concerning a given input scene, ii) the average system noise power in real-world applications. The scene-and-process-dependent NPS (SPD-NPS) framework addresses these limitations by measuring temporally varying system noise with respect to any given input signal. In this paper, we examine the scene-dependency of simulated camera pipelines in-depth by deriving SPD-NPSs from fifty test scenes. The pipelines apply either linear or non-linear denoising and sharpening, tuned to optimize output image quality at various opacity levels and exposures. Further, we present the integrated area under the mean of SPD-NPS curves over a representative scene set as an objective system noise metric, and their relative standard deviation area (RSDA) as a metric for system noise scene-dependency. We close by discussing how these metrics can also be computed using scene-and-process-dependent Modulation Transfer Functions (SPD-MTF).


Introduction
Spatial luminance contrast signals are core to subjective impressions of image quality and its attributes of resolution, noise, sharpness and contrast. The Noise Power Spectrum (NPS) characterizes luminance (or color) noise with respect to spatial frequency. It is used routinely in the design and optimization of capture systems. It is a fundamental component of spatial image quality metrics (IQM) that aim to correlate with the perceived image quality [1].
The NPS is based upon the Fourier theory of image formation and applies linear system theory [2]. However, it is increasingly being used to measure systems that apply non-linear content-aware spatial image signal processes (ISP), such as denoising and sharpening. These adaptive processes are dependent on local spatial signal content, thus rendering system noise power scene-dependent.
The aim of measuring the NPS of capture systems is to characterize the system (average) real-world noise power, i.e. their general noise power when capturing real scenes. The NPS is normally derived from captured uniform luminance patches in onedimension (1D) using the discrete Fourier transform (DFT). Artmann [3], and Fry et al. [1], [4] have discussed the limitations of applying this measurement method to scene-dependent capture systems that apply non-linear ISPs. These limitations, which are outlined below, are because uniform patches are unrepresentative of "average scene" signals.
The majority of non-linear denoising algorithms reduce their intensity in the presence of local image structure to mitigate perceived signal loss [4][5][6]. Removal of noise by such algorithms is highly content-dependent and impeded by structural signals. Uniform luminance patches provide optimal conditions for these algorithms to operate. Thus, the resultant uniform patch NPS is generally biased; it underestimates the average real-world NPS of non-linear systems.
Non-linear content-aware sharpening filters enhance local contrast selectively to minimize the perceived amplification of noise [7][8][9][10][11]. They amplify noise to a greater extent in regions containing edges, detail and other structural signals than in uniform luminance areas. This introduces content-dependency and scene-dependency to the system noise power, rendering the uniform patch NPS unrepresentative of the NPS of the "average sharpened scene" signal.
More recently, Artmann [3] has proposed two noise measures that are derived using the more suitable dead leaves test chart [5]. This test chart replicates the power spectrum of the "average natural scene" among other natural scene statistics (NSS). Artmann's noise measures represent a step toward measuring system noise with respect to pictorial scenes. Both measures are derived indirectly by comparing Modulation Transfer Functions (MTF) computed using different texture MTF implementations [5], [6].
However, in non-linear systems, deriving the NPS from uniform patches, or in fact any single test chart, cannot characterize [4]: 1) the system noise power with respect to a given input scene; 2) the average real-world system noise power, while accounting for the system's scene-dependency; 3) the level of scene-dependency in the noise power of the system.
The scene-and-process-dependent NPS (SPD-NPS) [4] measures employed in this paper characterize system noise power with respect to either 1), 2) or 3). They were developed for the characterization of non-linear, scene-dependent systems. Three corresponding scene-and-process-dependent MTF (SPD-MTF) measures have also been introduced and validated in [4]. Further, they have been applied in existing and novel spatial IQMs [1], [7]. Their use has improved metric correlations with perceived image quality. The next sub-section summarizes each SPD-NPS measure. Details on their calculation are provided in [4].
Despite the importance of noise to overall capture system image quality, the authors are unaware of any prior art that simulates and characterizes capture system noise scene-dependency. This paper aims to express the utility of the various SPD-NPS measures by analyzing in detail the scene-dependency of the noise power of four image capture simulation pipelines. The pipelines apply linear and non-linear ISPs, tuned at different signal-to-noise ratios (SNR) and levels of opacity. We also validate two novel objective metrics for system noise power and its level of scene-dependency. We close the paper by drawing conclusions on the scene-dependent behavior of each pipeline and the relevancy of each measure and metric.

Scene-and-Process-Dependent NPSs (SPD-NPS)
The SPD-NPS measures characterize the power of temporally varying system noise directly, with respect to relevant input signals; they account for system scene-dependency. Fixed pattern noise (FPN) is unaccounted for but is less significant than temporally varying noise in contemporary capture systems under most capture conditions. It can be measured separately following ISO 15739 [8].
The pictorial image SPD-NPS and dead leaves SPD-NPS [4] are measured from a single scene, or the dead leaves test chart, respectively. Both are derived as the 1D NPS of a scene-andprocess-dependent noise image. The latter is obtained by subtracting the mean image of ten or more replicate captures of the same signal from the captured scene (or dead leaves chart). The pictorial image SPD-NPS describes the system's noise power with respect to a given input scene, accounting for system scene-dependency. The dead leaves SPD-NPS approximates the average real-world system noise power. It accounts for system scene-dependency to a limited extent only, but is more appropriate than the uniform patch NPS [4].
The mean pictorial image SPD-NPS and pictorial image SPD-NPS standard deviation [4] are computed as the mean and standard deviation of a number of pictorial images' SPD-NPSs from a set of images. It is unorthodox to average NPSs in this way. However, the mean pictorial image SPD-NPS and pictorial image SPD-NPS standard deviation tend toward the system's general performance (accounting for its scene-dependency) and the level of system scenedependency, respectively, as increases [4]. This requires that the image set is representative of commonly captured scenes, and the individual pictorial image SPD-NPSs are accurately measured.

Mean Pictorial Image SPD-NPS Area Metric
The mean pictorial image SPD-NPS area metric, #$%& , describes the objective level of temporally varying system noise as a single figure (Equation 1). ( ) is the mean pictorial image SPD-NPS, is spatial frequency, and +,-./01 is the Nyquist frequency. It is the only objective metric for system noise that accounts for system scene-dependency. We suggest it is used as an optimization parameter for capture system design and benchmarking.
This metric should not be confused with IQMs that model subjective image quality, since it does not account for display (image output) or human vision. However, it does relate directly to the output scores of various scene-and-process dependent IQMs [7]. These include variants of the log Noise Equivalent Quanta (log NEQ) and Visual log NEQ [7], as well as revised versions of Barten's [9] Square Root Integral with Noise (SQRIn) and Topfer and Jacobson's [10] Pictorial Information Capacity (PIC). These metrics weight the SPD-NPS with a Contrast Sensitivity Function (CSF) and add neural noise [11] before integration to model the perceived noise level.

Relative Standard Deviation Area (RSDA) Metric
The relative standard deviation area (RSDA) of the pictorial image SPD-NPS, >?@A , presented in Equation 2, is the only metric for the relative level of scene-dependency in the temporally varying noise power of a system. ( ) is the pictorial image SPD-NPS standard deviation, #$%& is the mean pictorial image SPD-NPS area (Equation 1), is spatial frequency, and +,-./01 is the Nyquist frequency. It can be assumed that, systems with higher RSDAs apply greater levels of non-linear ISP, making their spatial performance less predictable.

Simulation Pipelines and Test Images
The simulation pipelines and relevant ISP algorithms are described in detail in [4]. All four pipelines modelled the following processes identically: i) Lens blur by convolution with a Gaussian model for a diffraction-limited lens' airy disk. ii) Shot noise as twodimensional (2D) Poisson noise with linear SNRs of 40 and 5 at saturation, representing very good and very poor capture conditions, respectively. iii) Read noise and dark noise as Gaussian noise. iv) Sensor quantum efficiency variation by scaling Poisson noise in the R, G and B channels by factors of 2, 1 and 3.3, respectively. v) Gain adjustment, noise floor removal and highlight recovery. vi) 'grbg' Bayer color filter array (CFA) sampling.
The input parameters for denoising and sharpening ISPs were tuned, at each SNR, to optimize subjective output image quality on a calibrated 15-inch MacBook Pro Retina (2016) display at 60cm viewing distance (Nyquist frequency of 46 cycles/degree).
For one linear and one non-linear pipeline, the filter opacity (defined in Equation 3) of the denoising and sharpening filters was adjusted according to values presented in Table 1, to optimize output image quality on a calibrated Eizo ColorEdge CG-245W display at 60cm viewing distance (Nyquist frequency of 20 cycles/degree). Lowering the percentage opacity ( ) below 100% reduced the filter's intensity in the output image, ( , ), by blending a proportion of the filtered image, ( , ), and unfiltered image ( , ). It was necessary to lower the intensity of certain ISPs to fully optimize subjective image quality at higher SNRs. Lowering the opacity of the ISPs also tested the robustness of the various SPD-NPS measures, as well as the metrics presented in this paper.  The input of the pipelines was 50 high-quality imaged scenes, representing typical images captured by contemporary consumer camera systems. They were selected from [17]- [20], resized using bicubic interpolation, cropped to 512-by-512 pixels and windowed to mitigate periodic replication artefacts originating from DFT processing [4]. The pictorial image SPD-NPS curves were expressed on linear axes to examine pipeline scene-dependency thoroughly. Each curve was colored according to the magnitude of its integrated area, derived between zero and the Nyquist frequency before denoising was applied. The green and blue curves have higher and lower integrated areas, respectively. Before denoising was applied, we notice a smooth transition from the green to the blue curves. The fact that they were slightly spread out is a result of minor shot noise scene-dependency, which is dependent on the pixel intensity.

Analyzing Pipeline Scene-Dependency
For the linear pipeline, the order of the curves (from high to low integrated areas) remained consistent after denoising and after denoising and sharpening. This is indicated by the smooth transition between the green and blue curves which is unchanged before and after these processes. The relative level of spread in these curves is also roughly constant. Both these characteristics indicate a lack of system scene-dependency, which is what one would expect from a linear system.
Non-linear content-aware denoising and sharpening, however, increased the relative level of spread between the curves and rearranged their order. Thus, the curves with a higher integrated area before denoising often ended up with a lower integrated area after denoising and/or sharpening, and vice-versa. This unpredictable behavior was particularly clear at SNR 40 and becomes clearer in the plots at SNR 5 if they are rescaled at each frequency. This behavior is indicative of adaptive processing.
The analysis indicates the utility of the pictorial image SPD-NPS measures to imaging systems characterization, and the depth to which systems can be analyzed. They demonstrate in detail the compounding effect of non-linear ISPs on scene-dependency.

Characterizing Pipeline Scene-Dependency
We employ all SPD-NPS measures derived from scenes to characterize the average real-world noise power and noise scenedependency of pipelines with ISPs tuned at full and reduced opacity. The dead leaves SPD-NPS is also computed for comparison. Evaluating measurements from the reduced-opacity pipelines demonstrated the robustness of the measures. The level of bias in all the measures was previously shown to be similar to the uniform patch NPS [4]. They are shown in Figures 3 and 4 on logarithmically scaled axis as is common in the industry.
The pictorial image SPD-NPS measures (grey lines) display significant scene-dependency after non-linear denoising, regardless of the ISPs' opacity level. Lowering the opacity of denoising did not increase considerably the bias in these measurements. But it reduced pipeline scene-dependency and introduced a noise floor, particularly at low SNRs. This was because the characteristics of unfiltered noise (Figures 1(b) and 2(b)) began to dominate, having a higher power and a lower relative level of scene-dependency to the filtered noise.
The following observations from [4] also applied at lower ISP opacities, indicating the various SPD-NPSs are robust.
The mean pictorial image SPD-NPS (black line) characterized suitably the pipelines' average noise with respect to the 50 scenes.
The pictorial image SPD-NPS standard deviation (broken lines) expressed pipeline scene-dependency effectively, when it was added and subtracted from the mean pictorial image SPD-NPS. It accounted for the spread of the pictorial image SPD-NPS curves but ignored changes in their order that are visible in Figures 1 and 2. For the linear pipelines, the dead leaves SPD-NPS (red line) was consistent with the mean pictorial image SPD-NPS (black line). This result indicates that, the former measure describes well the average real-world noise power, as expected from linear system theory. But this is not the case for the non-linear pipelines, for which the dead leaves SPD-NPS did not describe processing of noise in the "average pictorial scene". Despite the dead leaves test chart being designed to replicate the signal power characteristics of natural scenes, it was denoised more effectively than most scenes by the non-linear content-aware BM3D filter, particularly at low SNRs. It also responded differently to non-linear sharpening than most scenes. We thus conclude that, noise measures derived from dead leaves signals may be unrepresentative of the average real-world noise power of non-linear capture systems.

Validation of System Performance Metrics
We validate the mean pictorial image SPD-NPS area metric for temporally varying system noise, as well as the RSDA metric for system noise scene-dependency. This was achieved by evaluating their conformity with previous observations of pipeline behavior.

Mean Pictorial Image SPD-NPS Area Metric
The mean pictorial image SPD-NPS area metric ( Figure 5) expresses appropriately the following general trends that were observed in the individual SPD-NPS measures, derived from pictorial images: 1) that non-linear denoising removed more noise than linear denoising; 2) that denoising at full opacity reduced noise further than at reduced opacity; 3) that linear sharpening amplified noise more than non-linear sharpening; 4) that sharpening at full opacity increased noise more than at lower opacity. The mean pictorial image SPD-NPS and individual pictorial image SPD-NPS measure, which this metric is based upon, were found to be particularly relevant to the image quality modelling of simulated non-linear capture systems [7]. Therefore, the metric is expected to be relevant to the optimization of such systems.

Relative Standard Deviation Area (RSDA) Metric
The RSDA of the pictorial images' SPD-NPSs agreed with observations of the pictorial image SPD-NPS standard deviation measure ( Figure 6). For example, it was unaffected by both linear denoising and sharpening, as expected. Non-linear denoising raised the RSDA significantly, especially at higher SNRs, accounting for trends in Figures 3 and 4. Non-linear sharpening did not compound the RSDA, despite changing the SPD-NPS curves' shape and order (Figure 1(f)). This was because the RSDA does not account for the latter and the relative level of spread in the curves remained similar after filtering. The RSDA metric is expected to be particularly informative when it is quoted alongside the mean pictorial image SPD-NPS area metric. For example, it can be inferred that systems with a lower mean pictorial image SPD-NPS area, and a higher RSDA, are more likely to use significant non-linear ISP to yield high-quality images, rather than higher quality hardware. This comparison follows the same principle as when the pictorial image SPD-NPS standard deviation measure was added and subtracted from the mean pictorial image SPD-NPS, as shown in Figures 3 and 4.

Conclusions
The utility of several SPD-NPS measures and metrics has been demonstrated by characterizing the noise power of four simulated camera pipelines that apply linear and non-linear ISPs under various exposure conditions. The ISPs were either tuned at full opacity, or their opacity was adjusted to optimize output image quality. The measures and metrics were computed from a number of replicate scene captures (or dead leaves test chart captures). They account for temporally varying noise, not FPN. They also account for system scene-dependency, caused by interactions between the input signal and non-linear ISPs.
The level of pipeline scene-dependency and its causes were first investigated by analyzing the distribution and integrated areas of pictorial image SPD-NPS curves for fifty imaged scenes. This measure describes pipeline noise with respect to a given input scene and accounts most thoroughly for the scene-dependency of nonlinear denoising and sharpening ISPs. Non-linear ISPs were shown to increase the spread of the SPD-NPS curves and re-arranged their order after each process. These results, as well as findings from previous research [4], [7], suggest that IQMs designed for non-linear systems should, ideally, account for such behavior.
Analysis of the mean pictorial image SPD-NPS, pictorial image SPD-NPS standard deviation, and dead leaves SPD-NPS measures confirmed that conclusions from their previous validation [4] still apply when the pipelines' ISPs are tuned at reduced opacities. This finding demonstrates the robustness of the measures. Novel objective metrics for temporally varying system noise, and its relative level of scene-dependency, were validated. Conclusions from the analysis of each measure/metric are summarized below.
The pictorial image SPD-NPS standard deviation [4], and the related RSDA metric introduced in this paper, described the level of scene-dependent noise in the pipelines successfully, regardless of the ISPs' opacities. Certain aspects of system scene-dependency were unaccounted for.
The average real-world noise power of each pipeline was also characterized appropriately, regardless of the ISPs' opacities, by the mean pictorial image SPD-NPS. The mean pictorial image SPD-NPS area metric also expressed the general trends in this measure.
The dead leaves SPD-NPS estimated conveniently the average real-world performance of the linear pipelines. However, it failed to describe noise introduced to the "average" input scene for the nonlinear pipelines. A relevant study found also that signal transfer measurements from the dead leaves chart differed from signal transfer of the "average" scene for this pipeline [4].
We infer that, the signal and noise contents in the dead leaves chart are treated differently to signal and noise contents in complex imaged scenes, when these are processed by non-linear ISPs. We expect this to be because, the mathematically generated dead leaves chart consists of randomly distributed overlaid discs with "perfect", low-contrast edges. In contrast, most natural scenes contain a variety of structural signals, which are not randomly distributed and have a wide range of edge gradients and contrast levels.
No test chart will yield a unique NPS (or MTF) for non-linear systems. However, we hypothesize that a more representative test chart should yield NPS (or MTF) measurements that describe average real-world system noise (or signal transfer) more suitably. This could be generated by: i) reducing the dead leaves chart's homogeneity by varying the shape of each overlaid "leaf" element. ii) ensuring the edges of the elements are representative of the range of natural scene edge gradients. iii) increasing the contrast of these elements and distributing them non-randomly. The latter may affect the target's useful scale, rotation and shift-invariant properties.
Alternatively, we propose that a representative set of images of scenes containing common features should be used for capture system signal transfer and noise characterization. Each of these scenes should provoke non-linear ISP behavior that is typical to a particular "type of scene". Thus, each scene type should yield relevant pictorial image SPD-NPS (or SPD-MTF) measurements. The mean pictorial image SPD-NPS (or SPD-MTF) of these scenes should also be approximately representative of the average realworld noise (or signal transfer) performance of non-linear systems.
The mean pictorial image SPD-NPS area metric, and the RSDA metric, can be computed with their SPD-NPS measures substituted for the corresponding SPD-MTF measures [4] (Equations 1 and 2). The resultant metrics describe the average real-world level of signal transfer of a given system, and its signal transfer scene-dependency, respectively. Currently, bias in the pictorial image SPD-MTF measurements limits the accuracy of both metrics. This results from signal-to-noise limitations and is discussed in [4]. Investigations of methods to further mitigate this bias are ongoing.
We have also developed new IQMs [1], [7] that use SPD-NPS measures presented in this paper, as well as scene-dependent, contextual visual models [21]. These IQMs were validated successfully with images generated by the reduced-opacity pipelines of this paper [7]. The objective metrics relate closely to these SPD-NPS measures and IQMs and are expected to be valuable for capture system design, optimization and benchmarking.