3. Data Life Cycle

Download PDF

3.7.2 Data and Information Integration

Increasing efforts dedicated to improving access and data sharing prompt issues related to integrating1 multiple sources of data. In a perfect world, all data producers would adopt international standards and would use accessible and interoperable computer environments. But what is really the current situation?

HETEROGENEITY

On a daily basis, everyone is using some form of calendar. However, looking at how people write the date, one can easily understand that heterogenous procedures, formats, syntax and systems are in fact major issues with respect to data integration. For example, although the international ISO standard 8601 defines YYYY-MM-DD as the official date format, a wide variety of ways to write 2015-02-12 can be found as shown in the table below:
 

12/02/15 15/02/12 02/12/15 15/12/02
12-02-15 15-02-12 02-12-15 15-12-02
12-02-2015 2015-02-12 02-12-2015 2015-12-02
Feb. 12/15 February 12, 2015 12 fév. 2015 12 février 2015
12.02.2015 2015.02.12 12 de febrero Etc. etc. etc.

Sample ABC-009:
collected 06/05 at 7 h.

Was it May 6?
June 5?
AM?
PM?

Generally speaking, we can see that combining datasets where variables are represented in different formats can cause problems. The same goes for the units, measurement precision or cartographic projections used. Rigor and consistency are therefore essential.

ASSIMILATION IN MODELS

The work of meteorologists illustrates well the use of models: weather experts feed a variety of environmental parameters into climate models in order to produce the best forecasts possible. The cycle of data assimilation in this process adds in situ observations into the models as a way to fine-tune the forecasts. For instance, this is how the coupled water-athmosphere model developed by Saucier et al 23 can produce surface current forecasts for the Estuary and Gulf of St. Lawrence.4