Camera System Performance Derived from Natural Scenes

The Modulation Transfer Function (MTF) is a well-established measure of camera system performance, commonly employed to characterize optical and image capture systems. It is a measure based on Linear System Theory; thus, its use relies on the assumption that the system is linear and stationary. This is not the case with modern-day camera systems that incorporate non-linear image signal processes (ISP) to improve the output image. Non-linearities result in variations in camera system performance, which are dependent upon the specific input signals. This paper discusses the development of a novel framework, designed to acquire MTFs directly from images of natural complex scenes, thus making the use of traditional test charts with set patterns redundant. The framework is based on extraction, characterization and classification of edges found within images of natural scenes. Scene derived performance measures aim to characterize non-linear image processes incorporated in modern cameras more faithfully. Further, they can produce ‘live’ performance measures, acquired directly from camera feeds.


Introduction
Ever since the transition from analog to digital imaging, the camera system has increasingly becoming more complex. The advancements within imaging science, computer vision and computational performance have allowed the digital camera system to produce imagery with higher quality.
Consumer smartphone camera systems have been developed to increase resolution, sharpness, dynamic range and low light performance, whilst keeping the user experience simple. However, the system design is restricted by the size of the hardware, such as the size of the optics and sensor. Manufactures have released smartphones that contain multiple camera systems to increase output performance at various focal lengths, at additional monetary cost to the user. Parallel approaches, which come at a lower expense to the consumers, rely on developments related to camera Image Signal Processing (ISP). These include developments on non-linear sharpening, non-linear de-noising, High Dynamic Range (HDR) multi-image processing and super resolution zoom techniques. Modern mobile phone camera systems rely heavily upon this processing; the user is given little control over it.
For scientific and computer vison applications, the combination of hardware and ISP are optimized for a specific output and relevant tasks. For instance, in autonomous driving the highly characterized optical camera systems apply specific ISP to increase the detectability of the incoming signal. Measuring accurately the performance of a capture system, and its constituent components, is imperative for camera module and system optimization.
The Modulation Transfer Function (MTF) [1][2][3] is the primary imaging performance measure used for sharpness and resolution evaluation, standardized in the ISO12233 [4]. It is equal to the modulus of the Fourier transform of the Point Spread Function (PSF), the latter being defined, for a continuous system, as the image of a point source. The one-dimensional MTF is equal to the modulus of the Line Spread Function (LSF), obtained from integrating the PSF over one orientation. The LSF is obtained by differentiating the one-dimensional edge profile, i.e. the Edge Spread Function (ESF).
There are various methods for MTF evaluation, including imaging of point sources, slits (lines), 'perfect' edges, series of sinusoids, and stationary stochastic noise fields, or patterns of known spectral contents [3,5,6]. Each of these comes with several implementations, different advantages and disadvantages, as well as associated measurement errors.
The standardized slanted-edge method [4,7] derives the MTF from a well-characterized, edge input -typically a test chart with vertical and horizontal edges, on and off camera axis. The edge is captured tilted and a super-sampled ESF is derived through resampling a projection down the edge slope [8]; this is then differentiated and Fourier transformed. As this measure in not corrected for the target's frequency content the result is referred to as the Spatial Frequency Response (SFR) [4,7,8]. A benefit of the SFR is that it is practical to implement in digital systems. Since the ESF is resampled the MTF can be measured beyond the Nyquist frequency.
MTF measurement is based on Linear System Theory [1,2,7]. Modern-day cameras systems are non-linear and incorporate adaptive ISP. Thus, these non-linearities result in variations to the camera performance, which becomes dependent upon the specific input signal.
Current methods for deriving camera MTFs or SFRs are carried out in controlled conditions to attempt to reduce the impact the ISP has upon the result. For example, low contrast edge inputs are used, since high contrast edges are prone to heavy sharpening [4,9]. With the output image quality becoming more and more dependent upon non-linear ISP, should we continue measuring camera performance overlooking its effects?
Recent work [10][11][12][13][14][15] have explored the possibilities for a noise-based MTF measure from pictorial test images. Although the input Noise Power Spectrum (NPS) must be known and thus the method is impractical to implement with any live input signal, the research demonstrated advantages over the traditional use of test charts. Fry et al. [12] have demonstrated the benefits of implementing such scene dependent performance measures in relevant Image Quality Metrics (IQM).
In this paper we describe a novel framework, developed to measure the SFR from natural scene images through the adaptation of the standardized slanted edge method, ISO 12233 [4]. The framework was initially proposed in [16]. This publication presented key principles and initial techniques that formulated the framework. It discussed relevant edge detection requirements, the resilience of the measurements to image noise and issues resulting from measuring performance from "uncharacterized" natural scenes, compared to well characterized signals.
This paper breaks down the components of the proposed framework and provides details on the edge isolation and verification techniques that we developed. Significant edge parameters that affect the measured SFR from imaged scenes are then discussed. It continues with presenting results from an initial image database versus results from the standardized method. It closes with conclusions and further work.

Framework
To achieve a natural scene derived SFR, we developed an automated measuring framework that replaces a test chart capture with a real natural scene capture. The framework detects, isolates, and verifies step edges from pictorial images. The ISO 12233 standardized algorithm is then applied to the extracted edges.
The flowchart in Figure 1 describes the key stages of the framework.

Edge Detection
In our initial study [16] we compared two algorithms that are used to locate imaged scene edges, the Canny edge detector [17] and a matched filter [18] and found that the Canny edge detector is most appropriate for purpose. Unlike the matched filter that missed valid edges, the Canny detector returns both step and non-step edges alike. A series of logical stages are required to deselect edges that do not meet the criteria for SFR measurement. This approach ensures the maximum number of step edges are extracted from the scenes.
Digital camera systems are non-isotropic; therefore, the Canny edge detector was adapted to keep the vertical and horizontal gradients separate. Note, once detected the horizontal edges were rotated 90 degrees allowing the same processing to be subsequently applied to both orientations.

Edge Isolation
The ISO12233 requires the isolation of a step edge within a Region of Interest (ROI). When using the traditional edge test charts, the automated edge extraction is a simple task, since the step edges are arranged at appropriate distances apart with uniform gray tones either side. Using natural scenes this task is not straight forward.
Several factors that must be removed or minimized from the imaged scene ROIs. These include: • change in focus due to the optical Depth of Field, • scene texture and increased noise, • low gradient luminance changes, • intersecting edges • and other edges in close proximity. The use of smaller ROI dimensions reduces the likelihood of including these unwanted artifacts in the isolation process. However, there is a tradeoff, since with the reduction of ROI height the SFR error increases. This is seen in Figure 2, where Mean Absolute Error (MAE) was measured from the SFRs in comparison to the minimum recommended ROI size (64 width and 128 height) [19]. As the ROI height increases the error decreases, as noise increases this decrease becomes more prominent. The decrease is due to the larger number of data points that formulate the resampled ESF. Following relevant evaluations, we have set a threshold of 128 pixels in ROI height. Longer ROIs are split into 128 pixel segments, thus balancing this tradeoff. ROIs having height below this threshold are not deselected; the height data is stored with every ROI for further analysis.
The ROI width can be as narrow as the edge angle permits, as long as the full ESF within the ROI is not affected. Figure 3 demonstrates that with increasing noise levels, a narrower ROI reduces error in the SFR. This has also been demonstrated by Williams [19]. In addition, a narrow ROI will give the ability to isolate more edges from the imaged scene that are in close  proximity. The minimum separation that allows edges to be isolated is limited to half the ROI width and is determined by edge angle.

Figure 3. The Mean Absolute Error (MAE) introduced by adjusting the Region of Interest (ROI) width at various Signal to Noise (SNR) levels.
High angled edges require large ROI widths to isolate them, thus adjacent textures, artifacts and other edges within the ROI become an issue. We have therefore developed an effective method to isolate imaged scene edges at the desired height, at any angle and proximity. This method is effective as long as the neighboring ESFs do not overlap. Thus, a proximity filter is used to remove edges that are lower than 5 pixels apart.
Our edge isolation process entails: 1. Creating a ESF mask 2. Taking a 'T' shaped median value 3. Filling each row with the appropriate median value 4. Giving the ROI a weighted Gaussian blur The ESF mask is created by first measuring the horizontal gradient of the edge, i.e. the ESF for every row in the ROI. As the edge location is known, the ESF mask boundary is established when the gradient either side of the edge position becomes a uniform tone. A threshold is used to deem what is considered 'uniform' tone, taking into account the image noise floor. This threshold is currently set as 0.04, which is equivalent to a pixel value change of 10 (for an 8-bit systems). This resultant mask covers the area of the ESF and remains untouched in all subsequent processing. Figure 4 demonstrates this principle.
Once the ESF mask is obtained, the 'T' shaped median values are obtained. These values are taken for every pixel either side of the ESF mask and are calculated from four pixels in a shape of a 'T', as seen in Figure 5. This median value is used to fill the row, from the ESF mask boundary to the ROI frame, creating the 'pixel stretch'. However due to scene textures and high levels of image noise, the resulting ROIs may contain striped artifacts. Thus, a Gaussian blur is applied, weighted strongly in two opposite corners of the ROI, i.e. decreasing the blur intensity to zero as the filter approaches ESF mask. The diagram in Figure 6 illustrates our edge isolation technique for a noiseless and a noisy simulated ROI.  This technique is similar to filtered tails procedure that Williams and Burns demonstrated [20]. The tail filtering is a method for obtaining reliable SFRs from noisy image captures, through blurring either side of the edge without touching the ESF transition.
Testing our edge isolation technique using simulated edges with various noise levels indicated that the method reduces the effects of noise on the SFR measure in the same fashion as the tail filtering. This is shown in Figure 7, where the SFR was measured from i) a wide ROI, containing a simulated 21 degree slanted edge (yellow), ii) the same ROI cropped as narrow as possible (orange) and iii) the ROI passed through our edge isolation technique (gray).

Step Edge Verification
To verify that the isolated edges have the required step edge profile, the ROIs undergo a step edge verification. Once again, the horizontal gradient is taken for every row in the ROI. A step edge normally has a singular increase or decrease in gradient. Using this logic, unappropriated ROI are deselected. The uniformity threshold was, once again, set at 0.04. Figure 8 demonstrates this principle with seven ROIs. a), b), c) and d) all contain step edges. c) is deselected, as the contrast is under the noise floor, also d) is deselected, since it only partially contains a step edge, the center portion contains a staircase edge profile.

Region of Interest Verification
In addition to verifying the presence of a step edge in the ROI, other processes were implemented to detect changes in the edge direction as well as unwanted tonal changes in the uniform areas around the edge profile. If such artifacts were detected, the ROI would, either be segmented into smaller more suitable ROIs when possible, or completely deselected.

ISO12233 Algorithm
The isolated and verified edges then pass through the standardized slanted edge algorithm, ISO 12233 [4]. We have used Burns' sfrmat algorithm [21] for this purpose.
In the latest iteration of sfrmat, sfrmat4, a higher polynomial fitting can be applied to the extracted edge profile, rather than a linear fitting. This reduces the error when measuring curved edges caused by lens distortion, [22] and is especially useful when measuring SFRs from edges in captured scenes, which are commonly curved. We currently use a 3rd order polynomial fitting function.

Edge Parameters
Unlike the standardized SFR measure, natural scenes are captured under uncontrolled conditions with uncharacterized edge inputs. Thus, several edge parameters must be considered and evaluated alongside each ROI, which are normally not considered in the ISO 12233 method, for a full analysis of results.

Radial Distance
The position of the edge within the image frame has an impact upon the output SFR. This is because the highest preforming region of an optical lens is the center of the imaging circle; lens performance decreases towards the edge of the imaging circle.
This is seen in Figure 9, where the SFRs are extracted from a test chart input and color coded to indicate the radial location of the input edges. The color transitioning from green to red represent increase in the radial distance.

Edge Angle
The angle of the edge produces a variation on the SFR measure. This is well documented in several studies [9,19,23,24]. Using a ROI size of 64 pixels width and 128 pixels height, and simulating noiseless edges ranging in angle from 0 to 45 degrees, we show this SFR variation, using the sfrmat4 algorithm, in Figure  10. Edge angles at 0 or 45 degrees are deselected, since they cannot produce unique resampled data by the slanted edge technique.
When using edges from captured scenes, the restriction on edge angles been between 2 and 7 degrees (recommended when using the ISO 12233) becomes a major data gathering constraint. Our framework measures the SFR with edge angles ranging between 2.5 and 42.5 degrees. When analyzing the output SFRs, a minimum and a maximum angle threshold can be applied to determine which SFRs are used in the performance measure, rather than restricting the data gathering stage.

Edge Contrast
Edge contrast impacts the SFR when images are subject to high image noise levels, and non-linear sharpening [9]. As a result, the ISO 12233 specifies that low contrast edges must be used for the SFR measure, since noise is not an issue under controlled conditions. Once again, this restricts our data gathering when using captured scenes. SFRs in our framework are measured from edge contrast levels 0.2 and above. Relevant metadata is then used in the analysis of results.

Additional Parameters and Further Considerations
Depth of field and ROI nonuniformity are two edge parameters that are currently not accounted for the framework but require consideration.

Depth of Field
When capturing three-dimensional natural scenes, some edges are out of focus due to the optical depth of field. Depending upon the intent of the user and the camera system, a shallow depth of field may be a decision intentionally made. For a comprehensive level of SFR analysis, the optical depth of field in the image from which edges are extracted must be known. From the lens focal length, f-number, the diameter circle of confusion, the hyperfocal distance and the focus distance, the far limit and near limit depth of field can be calculated [25][26]. The focal length and f-number can be extracted from the camera metadata, whilst the circle of confusion is calculated using the diagonal size of the imaging sensor. For a 35mm sensor format the circle of confusion diameter of 0.025-0.030 mm is commonly used [26].
However, determining the focal distance solely from a single two-dimensional image is not a straight-forward operation. One potential solution is to use a neural network estimate of the depth map from a two-dimensional image [27][28][29][30]. From our framework we extract the location of the strongest edges in the frame; therefore, we can map the edge strengths to the predicted depth map to obtain the focus distance. Using the depth of field equations  [25,26] we can then derive which regions of the frame are in focus.
In the image database we use to validate the framework at this stage, the depth of field is not a factor impacting the study. Our test system has a circle of confusion diameter of 0.030 mm. This results in a depth of field that ranges from approximately 2.45 meters (at 5 meters focal distance) to infinity (see Results section).

ROI Nonuniformity
In a natural scene the lighting is not uniform, resulting to low frequency gradients running through some extracted edges. The error in the measured SFRs from such edges is shown in Figure 11. Further work must take ROI nonuniformity into account, in a similar fashion to the 'nonuniformity MTF correction' that is employed in the ImatestTM software [31].

Results
Unlike the traditional method of obtaining the SFR, the input edges from captured natural scenes are not 'perfect'. The SFR relies on measurement from 'perfect' edges (with constant frequency content over the camera bandwidth) or characterized edges (with known/measured frequency content). Since our method does not produce SFRs from such inputs, we have named the resulting measure the captured scene derived SFR, or NS-SFR.
For testing the outcomes from our framework, we used a Nikon D800 DSLR as our test camera system, equipped with a lens with focal length 24mm and its aperture set at f/4. All images were captured in 16-bit RAW format. They were then converted to TIFF uncompressed files, in Adobe RGB color space. Both sharpening and noise reduction were turned off in the RAW file conversion, thus we assume no, or minimal non-linear ISP. Due to the selected focal length, aperture and focal distances, image information was all in focus, thus blur resulting from shallow depth of fields was not an issue.
The NS-SFRs derived from each captured scene form an envelope of varying performances, which are due to various factors relating to the system as well as the quality of edges extracted from the scene (see Edge Parameters section). In the traditional SFR measurements from test charts, variations in the SFRs are mainly due to the varying performance of the lens with radial distance from the center, as shown Figure 9.
In the analysis of the NS-SFRs, separation of system effects and scene content effects must be made. From individual captured edges it is impossible to determine whether the ESF degradation is due to the edge input profile, or the camera system blur. In preliminary results presented here, the stronger NS-SFRs have been given more weight when averaging results to obtain one measure. Further work in the framework must use the measured ROI edge parameters and neighboring edges to determine which are the 'highest performing' edges for a given radial distance and depth of field. This would allow a classification of edges and separation of the effects of the edge input quality and the system performance.
To derive a single performance measure for the test camera system, a weighted mean is calculated from the SFR envelope for each captured scene. ISO 20462 [32] suggests different weights to be given to SFRs derived from on-axis edges (center edges, in 50% radial distance) and off-axis edges (corner edges), 0.43 and 0.57 respectively. The horizontal and vertical oriented SFRs are then weighted by 0.33 and 0.66 respectively.
The weighted means are kept to individual horizontal and vertical orientations for this study. The weights used are 1.00 for the center edges (0-30% the radial distance), 0.75 for edges part way (30-75% the radial distance) and 0.50 for edges in the corners (75-100% the radial distance). This follows the default weight settings in the slanted edge ImatestTM algorithm [33]. These weights can be altered for the intended purpose of the NS-SFR.
We have then applied further weights to give more importance on the strongest NS-SFRs within each frame segment (center, part way and corners). A weighted median is used, rather than a weighted mean, to reduce the effect of anomalous NS-SFRs. This was achieved through: 1. Taking the peak MTF50 (MTF50P) of all NS-SFRs in each frame segment. 2. The strongest 1/3 in each individual segment is given a weight of 1.00, the intermediate 1/3 is given the weighting 0.75 and then the lowest 1/3 with 0.25. 3. The median is then taken with these set weights for each frame segment. In addition, the 5th and 95th percentiles are identified for each of the NS-SFR envelopes. For the NS-SFRs that exceed the 95th percentile, or fall short of the 5th percentile, the weightings are decreased to 0.50 and 0.20 respectively. We have not completely deselected these outer NS-SFRs, since we have not identified the reason for their positioning within the envelope.
Note: For the purpose of fair comparison, the standardized SFR measure derived from test charts and the NS-SFR are both formulated using the same weighted average procedure. Figure 11. This is a visualization of low-frequency nonuniformity that commonly present in natural scenes. The ESF and SFR demonstrate how these nonuniformities effect the result compared to the ground truth (GT). Adapted from ImatestTM [31].

Framework Assessment Using a Test Chart
To assess the accuracy of edge selection and processing in the framework a test chart was captured. From the image two measures were obtained: the first was the ISO 12233 traditional SFR with manual selection of edges; the second was the SFR obtained when the test chart image was passed through our measuring framework.
Comparing these two methods, in Figure 12, it can be seen that the framework produces accurate results. The gray curves are output horizontal SFRs (from vertical edges), the red dashed curves correspond to the 5th percentile, the weighted average and the 95th percentile from these SFRs. The green dashed lines show the same results for ISO12233 method.
The framework finds and isolates the correct edges. Our ROI processing, i.e. the 'T' shaped pixel stretching, has little influence on the result from perfect edge inputs. Figure 13 shows the horizontal NS-SFR envelopes for two example natural scenes captured with our test camera system. There are several observations that can be made from these NS-SFRs, which clearly demonstrate the measurement dependence on scene content (i.e. NS-SFR envelope shape, average and selected percentiles).

Natural Scene Envelopes
In Image 1 the majority of selected ROIs are located to the left and right of the frame, the weakest performing segment of the optical imaging circle, and few in the higher performing center and part way regions. Thus, Image 1 produces NS-SFRs that yields a low average system performance. In contrast, in Image 2 the selected ROIs are evenly spread across the frame, giving a NS-SFR spread and average SFR comparable to measures derived from a test chart.
In both example images, the 95th percentile curve is higher than that of the test chart in the high spatial frequencies. In Image 1, the low performing edge inputs cause the NS-SFRs to drop rapidly, but for many of the edges the NS-SFR plateaus off at 0.2 modulation and below. This is due to high frequency scene textures (noise).   Figure 14 combines results by processing a small database of 30 captured scenes. The gray dashed NS-SFRs are the weighted averages from each scene, the green is the average SFR derived from the test chart and the red curve is the mean NS-SRF from 30 scenes. All 30 captured scenes comprised of well-lit subjects.
The mean NS-SFR follows closely the mean SFR curve, with high and low frequencies being only slightly overestimated, and mid-frequencies underestimated. From these preliminary results it is clear that, the NS-SFR derived from each scene is highly dependent on the scene content (edge location, contrast, noise, etc.) which result to the NS-SFR distribution in Figure 14. We expect that, with the proposed improvements to the framework, as well as the inclusion of a larger set of scenes, the low and mid-frequency overestimation in the mean NS-SFR of the system will improve. High frequencies will probably remain overestimated due to image noise and texture in the extracted ROIs.

Conclusions
We have demonstrated a novel approach that adapts the ISO 12233 slanted edge method to obtain camera system performance measurements directly from images of natural scenes. The input edges extracted from the captured scenes are not perfect step edge inputs, nor are they characterized in terms of spatial frequency content. The resulting measures are therefore no longer SFRs; we refer to them as natural scene derived SFRs, or NS-SFRs. These measurements do not solely describe the system, but they also relate to the input scene characteristics. This paper has outlined the key steps developed to identify, isolate and verify step edges from an image of a natural scene and has tested this framework with a small image dataset, captured from a test capturing system.
The results clearly describe how the scene content influences the output NS-SFR envelope, containing an NS-SRF from each extracted edge in the captured scene. Parameters affecting the resulting SFRs in the envelope are the edge location, edge contrast, edge angle, texture and noise within the selected ROI and the frequency content of the input edges.
The images used for this study were all captured in well-lit conditions. They contain low image noise, subjects with large number of edges, they have a large depth of field and have been subjected to little or no non-linear ISP. All these are advantageous conditions for our measuring approach. Further work will test less suitable images, i.e. scene captures that contain fewer step edges, high amounts of texture, shallow depth of field, poorly lit/high image noise and subjected to non-linear ISP.
In addition, further studies must analyze the NS-SFRs using the measured edge parameters in order to estimate the system performance. Additional work will include the use of Natural Scene Statistics (NSS) and seek correlation between specific scene types and the NS-SFR outputs.
The preliminary results obtained using the proposed measuring framework look promising. They build a foundation for deriving live system performance SRF measurements, as well as providing scene content information which in turn can lead to the description of performance of non-linear system processes.