This study explains the methodology used for modelling temperature and precipitation (monthly and annual). This modelling has been done using basic GIS techniques and information from meteorological stations. The geographical area used to apply and develop this model is the north east of Spain (Catalonia).
KEYWORDS:
Climatology, Temperature, Precipitation, GIS, Multiple regression analysis, Modelling.
This work attempts to develop an empirical and statistical model to forecast the temperature and the precipitation (monthly and annual) in the north east of Spain (Catalonia). Empirical because for building as well for validating this model we use data obtained from meteorological stations. Statistical because is based in a multiple regression analysis and its correspondent validation. This analysis uses as dependent variables the temperature or the precipitation and, as independent variables, the altitude, latitude, continentality and solar radiation (in the case of temperature) or a cloudy factor (in the case of precipitation). The statistical analysis as well as the selected variables used in this study, produces, from our point of view, a very simple and therefore realistic model regarding the current climatological knowledge. Obviously, in further studies it would be possible to increase its complexity if it is thought appropriate to add new variables or the way they contribute to explain the variability.
This model aims to predict temperature and precipitation just in a numerically and objectively way. So, at least in the beginning, the intuition and knowledge of the researcher is lost but it is a posteriori when the model can be changed or improved easily by an expert. Using GIS techniques allows us to get data in raster format with the advantages that this kind of format has about repeatability and analysis [8].
The outcoming of this work are 52 maps classified as follows: 13 maps (monthly and annual) of the maximum mean temperature, 13 of mean temperature, 13 of mean minimum temperature and 13 of total precipitation. A corrector map for each one of the final maps also exists.
Catalonia, located in the NE of the Iberian Peninsula, is the geographical area where this work is centered. It is a small area (32 000 km2) but it has high variability and very strong contrasts due to the relief [9], proximity to the Mediterranean Sea [10], and its geographical situation because it receives Atlantic influences (although attenuated by Iberian System and Pyrinees [11]), Mediterranean and even Saharian influences. Even though we can characterize this area as typical a Mediterranean climate as described by the pluviotermic index of Emberger [12], we found areas classified as semiarid, subhumid and humid [13, 14].
Density of meteorological stations in Catalonia is quite good (64 stations per km2 according with Clavero et al. [15], but their location is heterogeneous across the land (Map 1). Usually their placing has been done with criteria that do not match with the purposes of this kind of work. In fact, there are a lot of stations placed in plain areas (even in the Pyrinees region) and there are not many in areas with high slope ground or complex orography. This causes that, for example in the case of solar radiation, all the stations are very homogeneous and, as we will see later, there are not positive results in the model of multiple regression.
Filtering data
Even though there are different criteria to filter
climatological data, in this work we have put the emphasis in
spatial patterns, always bearing in mind the
length of the series. The WMO (World Meteorologic
Organization) gives 30 years as the optimal length for a series,
but it will depend of the variability in the studied area. In
this case we have filtered temperature series within 15 years
and precipitation ones within 20 years. Since precipitation has
more variability, maybe 20 years are not sufficient, but, as seen
above, we are trying to do an spatial interpolation so it is very
important to cover the most possible area. In the case of precipitation
it may be better to show the local variability than to work with
stabilized series, less representative of the whole territory.
Different statistics have been computed to choose the length of
the series. In any case it is obvious that the process can be
repeated with longer series in the next years.
Finally, we have worked with 86 stations of temperature
(from the 436 initial stations) and with 142 stations of precipitation
(from the 664 initial stations). So, the density of stations used
is one each 372 km2 for temperature and one each 225
km2 for precipitation.
All the stations are provided by the INM (Instituto
Nacional de Meteorología) and the series belongs to 1951-1991
period. Unfortunately, when we begun this work it was impossible
to obtain later data.
Map 1. Geographical location of the meteorological stations used in this study. Big circles represents stations with temperature data; little circles represents the stations with precipitation data and dark circles show the meteorological stations with both data.
Choosing dependent variables
From INM data we have calculated, through own programs, monthly and annual means of the series with whole years. For temperature we have chosen mean maximum temperature, mean temperature and minimum mean temperature. For precipitation we show only total precipitation, because the data of the days of precipitation give us insufficient results in preliminary tests, probably because of the quality of the initial data and/or its impredictable behavior.
Choosing independent variables
In this case, the election have been done following the more known factors cited in the literature that fashion climatology patterns. For temperature we have altitude (ALT), latitude (LAT), continentality (CON) and solar radiation (RAD). For precipitation we use the same variables except the solar radiation that has been changed by a cloudiness factor (CLO).
The variable altitude is the nominal altitude of
the stations and it gives information of the variability caused
by the relief. The latitude is, in fact, the cosine of the nominal
latitude of the stations because due to the curvature of the earth
it is a better measure than the straight latitudinal value. This
variable has not been included in the solar radiation model we
have used (as seen below) and therefore we have to include it
in the model. Latitude will approach us to the variability generated
by the zonal atmospheric circulation. Continentality is the lineal
distance to sea. Other non-linear models have been tested (i.e.,
sigmoid models) but, for instance, the best results are obtained
with the lineal one. That is, probably, due to the distribution
of the meteorological stations and the relief in Catalonia. Other
works [3] use the logarithm of distance to sea as independent
variable. It will be essential to develop another studies to get
more knowledge about this problem. In this case, the model will
be strongly related with the orography of the studied area. Continentality
values are obtained from a raster that is build trough GIS techniques
and a DEM (Digital Elevation Model, Map 2).
Map 2. General view of the DEM used in this work.
Later, we extract, through own programs, the values
of the cells that match with the location of a meteorological
station. This process is done for all the variables.
Finally, solar radiation have been obtained from
a potential radiation model (proposed by Pons [16]). This model
is entirely computational and based in a DEM. Optical density
of the atmosphere is treated as constant ( = 0.288) and equals
mean conditions for a clear forestry atmosphere [17, 18]. Moreover,
this model, due to the
little latitudinal variation of Catalonia, take the
central point of the DEM as reference. Finally, this potential
model has been corrected with the meteorological data obtaining
a cloudiness factor. This factor is a lineal relation between
potential radiation (PR) from the model and real radiation (RR)
from the stations (Figure 1). This relationship has been calculated
for each station and for every month of the year and it is interpolated
over the whole area. The intercept have been set to zero because,
though there are not empirically data, it is easy to believe that
if potential radiation is nil, real radiation will be nil too.
Of course, real radiation always will be equal or lower than potential
radiation.
Figure 1:
It is this cloudiness factor (CLO) what we use for modelling the precipitation instead of solar radiation which we use to model the temperature. This factor expresses, in fact, the lack of clouds.
Regression model: some considerations
We have used a multiple regression analysis with standard method to add dependent variables into regression equation. This method forces all the dependent variables to be added in the equation at the same time. Tests done with forward stepwise and backward stepwise have produced the same results. In the statistical results section we show the multiple determination coefficient (R2), to get an estimation of the fitting of the model, and the non standardized regression coefficients (B). This coefficients will be used, as seen below, to build the final maps. All the calculations have been done with = 0.05. The p values are not showed because in all significant cases this values are very low (at least lower than 0.001) .
Evaluation of the model
We remove, randomly, a 40% of the stations (with adequate series for each case) and we run the multiple regression analysis for the other 60%. Then we calculate the dependent variable for the 40% of the other stations from the value and the coefficients obtained in the first subset of stations. Finally, we compare predicted values and observed values for this 40% of stations. With this method the obtained reliability is always at least equal or lower than the obtained reliability with the whole set of stations (100%).
Mapping the model: final synthesis
The process we will describe now is valid for each
one of the dependent variables studied.
To build the final maps of temperature and precipitation
we start working with one raster matrix for each one of the independent
variables and their coefficients obtained in the multiple regression
analysis.
The altitude matrix is obtained from a DEM with a
resolution of 180 x 180 m. The latitude and continentality matrices
are built from this DEM through GIS techniques. For solar radiation
and cloudiness, as above-mentioned, we have use data from previous
works. In the case of solar radiation it is important to note
that we have use a model based in a 500 x 500 m DEM. This is because,
in previous tests, we observed better fittings using a 500 x 500
m DEM than using a DEM with higher resolution. Once computed the
solar radiation matrix we densified it (180 x 180 m) to obtain
the same cell size than the other matrices. This is because the
relationship between radiation and temperature is less correlated
with local factors (little hollows, etc.) than with the general
location of the station. Temperature is regulated by air draught
and its mixture, so it is better to use a more general DEM to
modellate it.
Crossing these matrices, through elementary GIS techniques,
allows us to obtain the final maps. That is, we multiply each
cell with the adequate regression coefficient and then we add
up all the matrices (Figure 2). Be noticed that the matrix of
interception values just contains cells with the same value. These
values are the obtained in the regression analysis.
Figure 2.
Remember that in the case of precipitation, the equation is basically the same but changing the variable RAD for the variable CLO.
Building the correctors
Each cell of the raster maps we have obtained will
have the values of temperature and precipitation in accordance
with the results of the regression model. So we call this rasters
potential maps. However, if we compare potential values
(obtained from the model) and real values (obtained from the stations)
we can get a good estimation of the generated error in the regression
analysis for each geographical location where we have field data.
In this way, we have obtained, for each meteorological station,
an error value that we call corrector.
We have chosen an additive corrector (Figure 3) so
that it will be non related with the magnitude of the values.
Figure 3.
If we interpolate the corrector values of each station all over the area we will obtain a corrector or anomaly maps that will estimate the error of the model in each point. In this step of the process we make a typical interpolation, based in the inverse of the quadratic distance. This corrector maps will not be uniform but they will show maximum variability in the more impredictable areas and minimum variability in the predictable sites. The most impredictable areas are correlated with rugged terrain areas. In this case, local factors are of higher importance than general factors. For that reason, these corrector maps are also interesting by themselves because they show us which areas are more predictable and which ones are less.
Crossing the corrector maps with potential maps (Figure 4) we will obtain what we call real maps. Obviously, in the cells that match meteorological stations, the observed values will be equal to the measured ones which is also interesting for the usage of our maps. In the rest of the cells we will obtain potential values modified by these correctors.
Figure 4:
Evaluation of the corrected model
Once applied the correctors to the potential maps we proceed to test the model. We chose, randomly, 60% of the stations to build the model and we keep the 40% for validation purposes. We build the corrector maps from the 60% selected stations and we compute the corresponding real maps. The cells of the 60% of the stations will have perfectly corrected values (that is, equal to the observed values) and the rest of stations will have estimated values by our model and corrected by interpolated values from 60% of the stations. That is, they will have values closer to reality than the straight predictions of our model but they will not be equal to the observed ones. Once obtained the real maps, calculated from 60% of the stations, we extract the values obtained from 40% of the stations that we had kept. With these values we compute a multiple regression analysis, getting, finally, the coefficient of multiple determination for the corrected model (Rc2). This coefficient allows us to compare the non corrected model with the corrected one. Obviously, the final maps are generated with the 100% of the stations available in each case. The model fittings with all the stations will be at least equal or better that the obtained ones in de validation test.
Some considerations about the accumulated error
| B (ALT)= -0.003
B (LAT)= ns |
B (CON)= -0.022
B (RAD)= 0.003 |
R2= 0.838
B (INT)= 6.343 |
||
| B (ALT)= -0.005
B (LAT)= ns |
B (CON)= -0.013
B (RAD)= ns |
R2= 0.918
B (INT)= 9.410 |
||
| B (ALT)= -0.005
B (LAT)= 66.476 |
B (CON)= ns
B (RAD)= 0.002 |
R2= 0.952
B (INT)= -40.690 |
||
| B (ALT)= -0.006
B (LAT)= 40.802 | B (CON)= ns
B (RAD)= 0.002 |
R2= 0.951
B (INT)= -20.772 |
||
| B (ALT)= -0.006
B (LAT)= 73.775 |
B (CON)= 0.008
B (RAD)= 0.003 |
R2= 0.933
B (INT)= -43.853 |
||
| B (ALT)= -0.006
B (LAT)= 157.09 |
B (CON)= 0.014
B (RAD)= ns |
R2= 0.902
B (INT)= -96.290 |
||
| B (ALT)= -0.006
B (LAT)= 180.55 |
B (CON)= 0.016
B (RAD)= ns |
R2= 0.861
B (INT)= -110.66 |
||
| B (ALT)= -0.006
B (LAT)= 188.85 |
B (CON)= 0.014
B (RAD)= ns |
R2= 0.908
B (INT)= -117.13 |
||
| B (ALT)= -0.005
B (LAT)= 141.67 |
B (CON)= 0.006
B (RAD)= 0.002 |
R2= 0.949
B (INT)= -87.200 |
||
| B (ALT)= -0.005
B (LAT)= 121.93 |
B (CON)= ns
B (RAD)= 0.002 |
R2= 0.952
B (INT)= -76.474 |
||
| B (ALT)= -0.004
B (LAT)= ns |
B (CON)= -0.018
B (RAD)= 0.003 |
R2= 0.891
B (INT)= 10.266 |
||
| B (ALT)= -0.003
B (LAT)= ns |
B (CON)= -0.024
B (RAD)= ns |
R2= 0.812
B (INT)= 8.570 |
Table 1. Results of the multiple regression analysis for the case of mean temperature. The independent variables are: ALT (altitude), LAT (cosine of the latitude in degrees), CON (continentality) and RAD (solar radiation), that will change regarding the month. Multiple determination coefficient (R2) and non standarized regression coefficients (B) are also shown. Ns means that the variable is not significant and R2c is the multiple determination coefficient of the corrected model.
There are some sources that can generate problems
or errors and it is important to consider them. These errors can
come from the initial data: the digital geographical information
(altimetry and planimetry of the matrices) as well as climatological
information of the stations. In the later we can found errors
in the geographical coordinates, in the location and calibration
of the measuring instruments, in the lecture of the data obtained
by the devices and in the transcription of the information [11].
Errors can, also, come from our manipulation of the information
as well as from inherent factors of the model (omission of important
parameters and linear interpolation of the correctors).
All these errors will be shown in the model residuals
and, finally, in the corrector maps (except the ones that come
from interpolation). When we mention the correctors we are not
meaning only natural climatic variability but these accumulated
errors.
We only show the results for mean temperature. It
is important to be noticed that the results for the maximum mean
temperature and minimum mean temperatures, though quite good,
are always lower than the obtained for mean temperature. We find
the better predictors in spring and autumn months while the worse
predictors are founded in winter months.
We can see in Table 1 that all the determination
coefficients of the non corrected model (R2) are greater
than 0.80. In the corrected model (Rc2)
we can see that these coefficients are slightly better. So, the
model is improved when we correct it with the meteorological data
but, because non corrected model fittings are already really good,
this improvement is less important.
Chart 1 shows the statistical significance of the
different independent variables for the case of mean temperature
Altitude is significant every month. However, we should not forget
that the studied variables are correlationated in the case of
Catalonia (i.e., a northern latitude and a longer distance to
sea corresponds a higher altitude). Continentality is less important
in the spring months probably because the temperature gradient
between inner land and coast is smaller. Solar radiation is not
significant during the summer probably because solar altitude
makes the radiation to be more homogeneous during summer. The
pattern of the latitude seems to be not so clear as the other
variables and we must wait for next works to obtain more data
about this pattern.
Chart 1. Statistical importance of the independent variables during the year in the case of mean temperature. 'One' means that the variable is significant at 95% and 'zero' means that is not significant.
Table 2 shows that the statistical parameters for the real mean temperature have good fittings with the previous data [15]. The extreme months are January (T=3.9 0C) and July (T=20.7 0C). The annual temperature has been obtained from the monthly maps.
Table 2. Statistical parameters of the real mean temperature for the 2 545 200 cells of the matrix of Catalonia.
Chart 2 shows the histogram of frequencies for the annual mean temperature.
Chart 2. Histogram of the annual real mean temperature. The high frequency of each class is because of the matrix has 2 545 200 cells.
Precipitation
Table 3 shows that the determination coefficients are lower than the observed in the case of temperature, as it is expected, because precipitation is more impredictible than temperature. The most impredictible months are the autumn ones (R2=0.28 and R2=0.44). This could be due to the kind of perturbations that take place these months in the Mediterranean coast [15]. Summer months are the ones with better predictors (R2=0.6). Unlike in the temperature case, we can see that the coefficients of the corrected model (Rc2) are significantly higher than the coefficients of the non corrected model. So, this model is highly improved once the model is corrected with the meteorological stations data. This is because the precipitation is worse predicted with only geographical data than the temperature.
| B (ALT)= 0.017
B (LAT)= ns |
B (CON)= ns
B (CLO)= -271.182 |
R2= 0.451
B (INT)= 229.794 |
||
| B (ALT)= 0.017
B (LAT)= ns |
B (CON)= ns
B (CLO)= -317.277 |
R2= 0.576
B (INT)= 260.294 |
||
| B (ALT)= 0.017
B (LAT)= ns |
B (CON)= ns
B (CLO)= -366.646 |
R2= 0.507
B (INT)= 310.420 |
||
| B (ALT)= 0.014
B (LAT)= ns |
B (CON)= 0.131
B (CLO)= -304.349 |
R2= 0.613
B (INT)= 263.538 |
||
| B (ALT)= 0.039
B (LAT)= ns |
B (CON)= ns
B (CLO)= -218.150 |
R2= 0.596
B (INT)= 211.133 |
||
| B (ALT)= 0.047
B (LAT)= ns |
B (CON)= ns
B (CLO)= -199.523 |
R2= 0.616
B (INT)= 184.171 |
||
| B (ALT)= 0.020
B (LAT)=-1719.6 |
B (CON)= ns
B (CLO)= -156.320 |
R2= 0.698
B (INT)= 1421.59 |
||
| B (ALT)= 0.030
B (LAT)=-1511.0 |
B (CON)= ns
B (CLO)= -262.060 |
R2= 0.633
B (INT)= 1360.24 |
||
| B (ALT)= 0.012
B (LAT)= ns |
B (CON)= ns
B (CLO)= -330.561 |
R2= 0.284
B (INT)= 305.522 |
||
| B (ALT)= 0.018
B (LAT)= ns |
B (CON)= -0.226
B (CLO)= -397.158 |
R2= 0.413
B (INT)= 368.023 |
||
| B (ALT)= 0.025
B (LAT)= ns |
B (CON)= ns
B (CLO)= -279.649 |
R2= 0.446
B (INT)= 250.556 |
||
| B (ALT)= 0.021
B (LAT)= ns |
B (CON)= ns
B (CLO)= -324.561 |
R2= 0.437
B (INT)= 278.839 |
Table 3. Results of the multiple regression analysis for the case of precipitation. The independent variables are: ALT (altitude), LAT (cosine of the latitude in degrees), CON (continentality) and CLO (cloudiness factor), that will be equal during all year. Multiple determination coefficient (R2) and non standarized regression coefficients (B) are also shown. Ns means that the variable is not significant and R2c is the multiple determination coefficient of the corrected model.
Chart 3 shows, as in the temperature case, the statistical importance of the used variables. In this case altitude as well as cloudiness are significant during all year. Continentality is significant during spring and autumn because the stations located farther from the sea receive more precipitation during the spring months and the nearer ones receive more precipitation during the autumn months. Latitude is significant only during summer, probably, because of the presence of the Pyrenees in the north of the area where summer rainfall is high.
Chart 3. Statistical importance of the independent variables during the year in the case of precipitation. 'One' means that the variable is significant at 95% and 'zero' means that is not significant.
Table 4 shows that statistical parameters for the precipitation have good fittings, as in the case of temperature, with the initial data [15]. The months with less precipitation are February (42.7 mm) and July (44.4 mm) while the months with higher precipitations are May (82.5 mm) and October (76.1 mm). The annual precipitation has been obtained from the monthly maps.
Table 4. Statistical parameters of the precipitation for the 2 545 200 cells of the matrix of Catalonia.
Chart 4 shows the frequencies histogram for the annual precipitation.
Chart 4. Histogram of the annual
real precipitation. The high frequency of each class is because
of the matrix has 2 545 200 cells.
It is important to bear in mind that the final results
of this work are digital maps in raster format. So they can be
updated and queried easily, once the process will be automatized,
with new meteorological data. They are quite simple models because
it is just needed to have a DEM of the area and the meteorological
data of the stations.
REFERENCES
Miquel NINYEROLA
Miquel Ninyerola is an assistant
teacher for the Unity of Botany at University Autonoma of Barcelona.
He is working in his thesis with the objective of understanding
the relationship between the distribution of vegetation and the
environment variables.
His area of knowledge is botany but
he has been working in climatology as a basis to understand the
vegetation distribution.
Unity of Botany
Joan M. Roure is professor of Botany
at University Autonoma of Barcelona. He is the coordinator of
AridusEuromed European program.
Unity of Botany
Xavier Pons received his BS degree
in Biology in 1988, a MS degree in Botany in 1990, a MS degree
in Geography in 1995, and a PhD degree in Remote Sensing and GIS
in 1992, all from the Autonomous University of Barcelona. His
main work has been done in radiometric and geometric corrections
of satellite imagery, in cartography of ecological and forest
parameters from airborne sensors, in studies of the spectral response
of Mediterranean vegetation and in GIS development, both in terms
of data structure and organization and in terms of software writing.
He has recently worked in descriptive
climatology models, in modelling forest fire hazards and in analysis
of landscape changes from long series of satellite images.
He is professor at the Department
of Geography of the Autonomous University of Barcelona and coordinates
research activities in GIS and Remote Sensing at the Centre for
Ecological Research and Applied Forestry.
Department of Geography and Centre for Ecological Research and Applied
Forestry, CREAF.
FINAL CONCLUSIONS
High accuracy predictions are made by the proposed
models of temperature and precipitation, specially taking into
account that initially they work without climatological data of
the stations so they work only with geographical data.
ibbt1@uab.es
Universitat Autònoma de Barcelona
Facultat de Ciències
Edifici C
Unitat de Botànica
08193 Bellaterra (Barcelona)
Spain
Tel: +34.3 93 581 29 85
Fax: +34.3 93 581 13 21
Joan M. ROURE
ibbt11@uab.es
Universitat Autònoma de Barcelona
Facultat de Ciències
Edifici C
Unitat de Botànica
08193 Bellaterra (Barcelona)
Spain
Tel: +34.3 93 581 20 42
Fax: +34.3 93 581 13 21
Xavier PONS
x.pons@uab.es
Autonomous University of Barcelona (UAB)
08193 Bellaterra, Barcelona
Spain
Tel: +34 93 581 13 12
Fax: +34 93 581 13 12
URL: http://www.creaf.uab.es