# Difference between revisions of "Preliminary adaptations"

(→Methodology) |
(→Smoothing) |
||

Line 29: | Line 29: | ||

With ''y1'' is the last real value before the gap of data; ''y2'' the first real value after the gap, ''l'' is the length of the gap expressed in number of observation missed plus 1 and ''p'' is the relative position in the gap of the observation interpolated. If the gap is larger than five consecutive observations no interpolation is performed and all the missing values within are transformed to a very low value. This latter passage is done because is assumed that if a long gap happens is probably due to a snow/ice values that have vegetation very low indexes values. The last step of the procedure is given by the real smoothing procedure through a weighted windowed linear regression: the so-called “Swets algorithm”. A brief mathematical description of the method follows. The idea that, given a set of couple of variables (''xi, yi, i''=1,2,…n), one independent (''xi'') and one dependent (''yi'') , the means lie on the straight line, each of the yi can be described by equation [2]: | With ''y1'' is the last real value before the gap of data; ''y2'' the first real value after the gap, ''l'' is the length of the gap expressed in number of observation missed plus 1 and ''p'' is the relative position in the gap of the observation interpolated. If the gap is larger than five consecutive observations no interpolation is performed and all the missing values within are transformed to a very low value. This latter passage is done because is assumed that if a long gap happens is probably due to a snow/ice values that have vegetation very low indexes values. The last step of the procedure is given by the real smoothing procedure through a weighted windowed linear regression: the so-called “Swets algorithm”. A brief mathematical description of the method follows. The idea that, given a set of couple of variables (''xi, yi, i''=1,2,…n), one independent (''xi'') and one dependent (''yi'') , the means lie on the straight line, each of the yi can be described by equation [2]: | ||

− | <div class=" | + | <div class="formula_double"><center><math>Y_i = {\mu }_{Y|x_i} + E_i = \ \alpha + \beta x_i + E_i\ </math>[2]</center></div> |

Given ''εi'' the value assumed by the error ''Ei'' when Yi assume the value ''yi'' and estimating ''α'' with ''a'' and ''β'' with ''b'' the estimated line is expressed by [3] | Given ''εi'' the value assumed by the error ''Ei'' when Yi assume the value ''yi'' and estimating ''α'' with ''a'' and ''β'' with ''b'' the estimated line is expressed by [3] | ||

Line 42: | Line 42: | ||

− | <center><div class=" | + | <center><div class="formula_quint"><math>SWSE=\ \sum^n_{i-1}{w_i{e_i}^2}=\sum^n_{i-1}{w_i{(y_i-a-bx_1)}^2}\ </math></div> [5] |

Line 52: | Line 52: | ||

Where ''S<sub>w</sub> = Σw<sub>i</sub>; S<sub>y</sub> = Σw<sub>i</sub>y<sub>i</sub> ; S<sub>x</sub>= Σw<sub>i</sub>x<sub>i</sub> ; S<sub>xy</sub> = Σw<sub>i</sub>x<sub>i</sub>y<sub>i</sub> and S<sub>x</sub><sup>2</sup> = Σw<sub>i</sub>x<sub>i</sub><sup>2</sup>'' . | Where ''S<sub>w</sub> = Σw<sub>i</sub>; S<sub>y</sub> = Σw<sub>i</sub>y<sub>i</sub> ; S<sub>x</sub>= Σw<sub>i</sub>x<sub>i</sub> ; S<sub>xy</sub> = Σw<sub>i</sub>x<sub>i</sub>y<sub>i</sub> and S<sub>x</sub><sup>2</sup> = Σw<sub>i</sub>x<sub>i</sub><sup>2</sup>'' . | ||

+ | |||

In the algorithm computation the weights of each observation in the ''wide interval'' are given according to its relative position compared to the neighbours. Because the noise tends to lower rather than rise the signal local valleys have the weight is lower and to local peaks have a higher one. | In the algorithm computation the weights of each observation in the ''wide interval'' are given according to its relative position compared to the neighbours. Because the noise tends to lower rather than rise the signal local valleys have the weight is lower and to local peaks have a higher one. |

## Revision as of 10:07, 9 December 2013

## Flagging

At the end of the pre-processing, the radiometric quality indicators, and identification of cloud (shadow)-, snow-covered pixels are stored in a separate image layer, the status map. Also, most of the image layers are provided in integer images.

The post-processing starts with (1) conversion to byte (except for DMP and LST) and (2) application of the status map directly in each image layer. The latter is realized by reserving the range 251 to 255 (-5 to -1) in byte (integer) images for five different flags. Not all information from the status map is retained (e.g. radiometric quality per band). The ‘no data’ flag is separated into ‘water’ and ‘boreal winter’ using a land cover classification.

Most software can only deal with one flag, but GLIMPSE, which is used for the post-processing is specifically adapted to working with multiple flags.

More information on the flags and the rescaling can be found in the section Images.

## Smoothing

### Methodology

10-daily composite images still contain perturbations due to clouds, snow and missing values. The objective of this time series analysis is to scan each pixel's profile, detect the bad values and replace them with more appropriate ones – mostly via temporal interpolation of the previous/later "good" observations. The smoothing methodology used for cleaning the NDVI and fAPAR in MCYFS follows the approach presented by Swets et al. (1999) and modified by Klisch et al. (2007). It’s a regression methodology applied on a time series through a moving window of a defined number of consecutive observations. The modification of Klisch et al. (2007) concerns the unrealistic high values before and after steep slopes in the Swets et al. (1999) methodology, and also to keep the original high values. The algorithm requires the existence of a larger number of observations (i.e. the wide interval) than the ones to be cleaned. Usually from 15 to minimum 7 observations before and after the target period are required.

The smoothing takes place only if at least the 75% of the observations in the time series are valid (no clouds, no missing data, etc…) otherwise the smoothed values are flagged as missing value. Before the real smoothing a filter is applied to observations with high differences compared to the neighbours ones or if the neighbours have flagged values. The filtering procedure is applied to all observations in the wide interval and is based on a threshold that gives the possible amplitude variation between two consecutive observations.

**More detail on the smoothing procedure**:

The filtering procedure is applied to all observations in the wide interval and is based on a threshold (*tr*) that gives the possible amplitude variation between two consecutive observations. The Figure 1 resumes the rules for filtering a given observation in the considered interval.

Value y is a real value by definition, flags values are excluded at this step. Case A: observations at time x-1 and x+1 are real values. If Δ(-1) and Δ(+1) are larger than *tr(in this case tr= 0.4)* the value y is moved to a flagged value. Case B: observations at time x-1 and x+1 are real values. If Δ(-1) and Δ(+1) are larger than tr(in this case tr= 0.4) the value y is moved to a flagged value. Case C: observation at time (x-1) is flag while at (x+1) is a real value. If Δ(+1) is larger than tr(=0.12) the value y is moved to a flagged value. Case D: like C but with an inversion between data at x-1 and data at x+1. Furthermore is foreseen an interpolation for the flagged values. Defining a gap of real values for less than 4 consecutive observations, the values (v) are substitute according to equation [1]:

With *y1* is the last real value before the gap of data; *y2* the first real value after the gap, *l* is the length of the gap expressed in number of observation missed plus 1 and *p* is the relative position in the gap of the observation interpolated. If the gap is larger than five consecutive observations no interpolation is performed and all the missing values within are transformed to a very low value. This latter passage is done because is assumed that if a long gap happens is probably due to a snow/ice values that have vegetation very low indexes values. The last step of the procedure is given by the real smoothing procedure through a weighted windowed linear regression: the so-called “Swets algorithm”. A brief mathematical description of the method follows. The idea that, given a set of couple of variables (*xi, yi, i*=1,2,…n), one independent (*xi*) and one dependent (*yi*) , the means lie on the straight line, each of the yi can be described by equation [2]:

Given *εi* the value assumed by the error *Ei* when Yi assume the value *yi* and estimating *α* with *a* and *β* with *b* the estimated line is expressed by [3]

In this way each observation pair could satisfies equation [4]

Where the *ei* is the residual error for the model for data *i*. Defining a set of weights (*wi, i*=1,2,…n) the statistical method of “least sum of weighted squares” is used to estimates *a* and *b*. So to minimize the sum of weighted squared errors (SWSE) partial derivates with respect to a and b for equation [5]should be set to 0 obtaining equation [6] and [7] for estimating *a* and *b*.

Where *S _{w} = Σw_{i}; S_{y} = Σw_{i}y_{i} ; S_{x}= Σw_{i}x_{i} ; S_{xy} = Σw_{i}x_{i}y_{i} and S_{x}^{2} = Σw_{i}x_{i}^{2}* .

In the algorithm computation the weights of each observation in the *wide interval* are given according to its relative position compared to the neighbours. Because the noise tends to lower rather than rise the signal local valleys have the weight is lower and to local peaks have a higher one.

Case A in the figure, a relative peak, the observed point (marked in green) is weighted with a high weight (*wa*). Case B, a relative valley, the observation is weighted with low weight (*wb*). In case C, a plateau, the weight is neutral (*wc*), and in case D, a slope, the observation has a small negative weight (*wd*). Once the algorithm has determined the weights, a regression is performed on the wide interval. The method uses two operational windows, the regression windows of five observations and the combination windows of eleven observations. The latter is defined as the overlapping of 5 consecutive regression windows. The value to be smoothed (*X*) is the last observation of the first *regression windows* (5th value), the regression is computed and then the regression window is moved one observation forward. In the new *regression window X* is the fourth value. The operation is repeated until *X* assumes the first position. For each of the regression *X* obtains a smoothed value. The average of all the five smoothed values obtained gives the final smoothed value. If this is lower than the original observed value, then the smoothed value is omitted and the original one is kept (so called *modified Swets smoothing algorithm*). After this, the original data are substituted in the time series to the ones requested for smoothing. In the next figure, an example is provided of the used windows and the results achieved.

More information on the methodology can be found in Swets et al., 1999 and [[References|Klisch et al., 2007].

For MCYFS, the 10-daily time series of NDVI and fAPAR, coming from VGT , MODIS and NOAA-AVHRR are smoothed. This is done prior to calculation of the Long term statistics, Difference images, Similarity analysis and Cluster classification. The non-smoothed NDVI (fAPAR) has extension i (a), and the smoothed version gets extension k (b) (see Filename conventions).

The method can be applied on a past time series (back processing), but also in near real time. When applying on a past time series, the entire series is considered at once. For the near real time mode (NRT) the processing steps are described below.

### Application in NRT-mode

In near real time mode, the time series of the past 36 dekads are always used for the modified SWETS-smoothing. When a new 10-daily composite arrives, the procedure is repeated on a time series of again 36 dekads, but now shifted in time with 10 days. Smoothed images of only the last 6 dekads are created. The 6th last image is now updated for the last time and is kept as final smoothed image. The 5 last dekads are updated again when a new 10-daily composite arrives. That means that the most recent smoothed scenes are only temporary versions, which will be updated five times at later stages. Over time, the chance of obtaining “good” (non-missing) values increases, so the SWETS algorithm can provide reliable estimates. And the final version only is established after 2 months or six dekads.