537
Views
9
CrossRef citations to date
0
Altmetric
EDITORIAL

Big data for scientific research and discovery

(Editor-in-Chief)

With data volumes expanding beyond the Petabyte and Exabyte levels across many scientific disciplines, the role of big data for scientific research is becoming increasingly apparent: the massive data processing has become valuable for scientific research. The term big data is not only a buzzword and a marketing tool, but also it can provide invaluable help for scientific research and discovery through the data-intensive scientific discovery paradigm.

‘Big Data for Development: Challenges & Opportunities’ by United Nations Global Pulse, an initiative of the Secretary-General on big data, suggesting that projects/programs of big data research promote a national strategy, and pointing out the essential role of big data for the development of society as a whole, including science and technology, economics, and decision-making. It is noteworthy that UN Global Pulse launched the Big Data Climate Challenge as part of the Secretary-General's 2014 Climate Summit, which provides data-driven evidence of the impacts of climate change. The UN spokesperson Stephane Dujurric said, ‘This initiative will help build public understanding of how big data can reveal critical insights for strengthening resilience and mitigating emissions.’

Numerous examples of big data's contribution to scientific discoveries have been identified, especially for big interdisciplinary research, such as Digital Earth and Global Change. Unprecedentedly, large datasets generated, sensed, and harvested from experiments, observations, and simulations have brought great opportunities for making scientific progress for two reasons: (1) Huge datasets will serve as important inputs and will support adjustment and validation of current theories for large scientific problems, thus leading to new findings. A good example is the new paradigm of ‘big data meets big models’ for large inverse problems. (2) Massive datasets themselves are able to provide endless sources of new knowledge without modeling the scientific phenomena. This has been characterized as the ‘Fourth Paradigm’ – data-intensive scientific discovery. There is no doubt that big data will significantly change the way scientific discoveries are made. Scientists must be prepared to welcome a new age in which digital data will play an important role and might dominate the methodologies for scientific research.

Scientists will inevitably face a number of challenges before the aforementioned opportunities are realized. Considering the technology level, there are mainly three challenges to overcome before big data can significantly contribute to scientific discoveries. The first challenge is for data processing infrastructures and platforms. New generations of software and hardware with high performance, high scalability, and high efficiency are required for sensing, storing, and computing big data. The second challenge faced by scientists mining big data are the processing algorithms. It is highly encouraged to develop new models and algorithms for efficiently locating interesting findings in the context of vast, unstructured, heterogeneous, nonlinear, nonsteady, high-dimensional datasets. The third challenge is the methodologies for linking the processing to the discovery: how can big data be integrated with the processes of research development and really lead to new findings? New methodologies and data science frameworks will need to be developed for different scientific disciplines.

In the era of big data, the Digital Earth concept has evolved into a new connotation. Using scientific data as a basis, Digital Earth integrates massive, multispatial, multitemporal, multiresolutioned, and multityped Earth observation and socioeconomic data as well as analysis algorithms and models, fully committing to the big data properties. The birth and development of big data have introduced new challenges to Digital Earth and moved forward it from putting Earth into the computer to Big Earth Data.

In June 2014, the ‘International Workshop on Big Data for International Scientific Programmes: Challenges and Opportunities’ was held in Beijing, sponsored by the Committee on Data for Science and Technology and co-sponsored by the International Society for Digital Earth, World Data System, Future Earth, Integrated Research on Disaster Risk, Research Data Alliance, Group on Earth Observations, and Institute of Remote Sensing and Digital Earth of Chinese Academy of Sciences. The workshop developed a joint statement of recommendations and actions, which can be found at http://www.digitalearth-isde.org/news/700. This statement emphasizes the need to develop a better understanding of the role of big data for scientific research, thereby strengthening international science for the benefit of society by developing policy frameworks, research guidelines, case studies, and best practices which will help us exploit big data effectively.

Although only a starting point, this statement is a practical step toward focusing attention on the potential of big data, recognizing that big data presents particularly significant challenges and notable opportunities for transdisciplinary, international research programs as well as for scientific data services and infrastructure providers. The major points in the statement are: (1) respond to the importance of big data for international scientific programs; (2) exploit the benefits of big data for society; (3) improve understanding of big data through international collaboration; (4) promote universal access to big data through global research infrastructures; (5) explore and address the challenges of big data stewardship; (6) encourage capacity building and skill development in big data science; and (7) foster development of policies to maximize exploitation of big data.

The workshop also developed actions designed to be achievable and beneficial. The actions are: (1) produce case studies in big data for international scientific programs; (2) promote sharing of big data solutions across scientific disciplines; (3) research policy, ethical and legal issues for big data; and (4) research stewardship and sustainability challenges for big data by establishing Working Groups on big data for scientific programmes.

Scientists around the world are invited to consider and debate these principles, and to turn the actions into reality. There is no doubt that the adoption of these principles and the associated activities can help us ensure that big data is effectively exploited for the benefit of science and society.

Huadong Guo

Editor-in-Chief

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.