Bold ideas and critical thoughts on science.

Alison Specht, Pedro Corrêa, Lee Belbin, Hank Loescher

Critical research infrastructure: The importance of synthesis centers

In this piece, Specht, Corrêa, Belbin and Loescher elaborate their thoughts on the role of synthesis centers in facing today's challenges of research.
11 February 2020

There is more than one sort of scientific research infrastructure: those that provide technology and computational capacity, and those that are supplied by the community who works with it (Baron et al., 2017). We suggest that computing research infrastructures provide an information environment that support effective research, but of themselves are insufficient to provide the insights and understanding we desire. Computing ‘information infrastructures’ began in the 1960s, flourishing in the 1990s as technology advanced, growing alongside the research they supported (Lee and Schmidt, 2018). The establishment of increasingly sophisticated cyberinfrastructures, such as the SuperMUC-NG of the Leibniz Supercomputer Center, the European Open Science Cloud , the National Computational Infrastructure in Australia, and the Global Earth Observation of Systems, have significantly enabled new opportunities for research. Research Infrastructures (RIs) such as these provide sophisticated, long-term instrumentation combined with the computing capacity to answer complex questions beyond the scope of the individual researcher or single laboratory. Such cyberinfrastructures are complex systems, often distributed ‘systems of systems’, locally and globally linked. They are usually consortium or government-funded and deliver open-access data.

The computational capacity and technology developed since the 1990s has inspired a variety of more domain-specific RIs, such as those relevant to environmental science, the subject of this article. These include RIs that have integrated computational capacity for big data computing and simulation modelling, provide data and large storage capacity. The wealth of data now available is extraordinary but can present a logistic challenge for users as they gather data, compile it, merge data from different sources, extract needed data and analyse it. The relationship between the RI and the user community is critical for full benefit to be gained from the available data and technology (Frank et al., 2016, Finkel et al., 2020). We introduce some of the types of environmental RIs, discuss some of the unique challenges that face them in their provisioning of data to the broad environmental science community, and the role the synthesis center can play in alleviating those challenges.

Environmental Research Infrastructures (ERIs) incorporate data from experiments and observations that are updated automatically and continuously and may be supplemented by individual measurements. Chabbi et al. (2017) and Peters et al. (2014) summarised ERIs such as EISCAT, a northern European scientific association which conducts ionospheric and atmospheric measurements with radars, the National Ecological Observatory Network, a continental-scale USA ecological observatory NEON, the Terrestrial Ecosystem Research Network in Australia TERN, and the long term ecological research networks as LTER in the USA and iLTER inter alia. These ERIs are commonly regionally-specific, such as the newly-established Atmospheric Data Centre in Brazil (Corrêa pers. comm.), and the Biodiversity Exploratories BE established in Germany in 2006. DataONE – the Data Observation Network for Earth – is a cyberinfrastructure platform that was developed to support rapid data discovery across a variety of these platforms, commencing in the USA, but with a global reach (Michener et al., 2012).

Critical Zone Observatories (CZOs) are a National Science Foundation (USA) funded project, which supports watershed or catchment-based observatories focussed on processes within the soil-biota-atmosphere interface (White et al., 2015). Using a suite of linked sites, their aim is to support the development of a common infrastructure for cross-disciplinary work optimising combined resources to advance earth system science understanding.

The Scientific Data Repository (SDI) is another type of infrastructure that has roots in the traditional practice of scientific collections. SDIs have been enhanced by advances in genomics, and new analytical opportunities that cyber-infrastructure has afforded. SDIs in broad terms provide a resource for the preservation and re-use of primarily biodiversity data, and include the Atlas of Living Australia ALA, the Global Biodiversity Information Facility GBIF and the Barcode of Life Data System BOLD (Ratnasignham and Hebert, 2007). Many SDIs have developed tools for data analysis and visualisation that take them beyond the simple classification of a repository (e.g. ALA). Several, like GBIF, act as a discovery tool for other repositories.

Although broadly domain-specific, the large computational infrastructures, ERIs, CZOs and SDIs cover many sources of data within their domain specialties. Because data integration among large-scale RIs is an emergent activity, there is a natural overlap in some capabilities. On the one hand this may appear redundant, or even confusing to an end user (Koureas et al., 2016), on the other hand. interoperability among different domain specialities is also an opportunity to examine new types of data that may not have been otherwise discovered. There is a need, however, to integrate data infrastructures to make them more efficient and discoverable (Michener et al., 2012, Koureas et al., 2016; Hardisty et al. 2019). These RIs are not static, as the information they manage is continually updated and technology changes. Given that a great number of the pressing environmental science questions are multi-dimensional, multi-scalar, and require transdisciplinary responses, the heterogeneous nature of RIs and the linkage among them seems appropriate (Musche et al., 2019).

The importance of the ‘client’

These RIs have emerged from and are embedded in their scientific communities of practice; their client-base. This is recognised by most RIs but is not always dealt with successfully; active client engagement takes time away from operating the infrastructures, and often requires skill sets the infrastructure staff do not have, or for which they do not have the time or budget. Each RI approaches ‘client’ engagement in different ways, starting from passive services (e.g. periodic market analysis), while others take a more participatory approach. ‘Collaboratories’, computer-supported systems that allow scientists to work with each other across facilities and databases independent of geographical location (Finholt and Olson, 1997), emerged in the late 1990s to make network sharing and computing resources, software and computational infrastructure better able to respond to the needs of scientists. Such an approach has enabled scientists to co-develop new cyberinfrastructure tools in real-time in response to their competences and needs (Belbin, 2011, Guru et al., 2016, Özmen et al., 2016). Several organisations, such as the Research Computing Centre at the University of Queensland and the Leibniz supercomputing center have developed partnership programs to engage with their users, that continue to co-develop scientific applications and services (Frank et al., 2016).

Although many communities of practice existed before the RIs, all have benefited from them. Advancements in the genomic field are an excellent example of the co-evolution of cyber-infrastructure and science: both started at around the same time and have developed together. So it is not surprising, therefore, that the National Centre for Biotechnology Information’s GenBank, established in 1982, is closely embedded in its community of practice (Costa et al., 2016).

For RIs to be valued by the client, they must be used and seen to advance science in ways that would not otherwise be available: their ‘value proposition’. All RIs are built on a number of community assumptions, and testing these assumptions and the validity of the science they provide becomes an academic effort. Community activities inform the natural evolution of the RIs, optimises their approaches, and gleans ‘ownership’ by the client; if this is done to their benefit, and advances their science, they will support it.

Back to the science

The complexity and enormity of current environmental challenges requires solutions that transcend disciplinary boundaries (Suresh, 2012, Lynch et al., 2015). RIs such as those described have grown in response to the need of scientists to have access to long-term, quality-driven data across a wide variety of scientific domains and scales. This has, however, created new challenges for the scientists that include: (i) aggregating and rectifying data from multiple sources, (ii) finding the time to analyse and synthesise diverse sources of information to gain new understanding and insight, (iii) understanding how best to make sense of the information thus gathered, and (iv) finding methods for communicating across disciplinary, cultural, organisational and geographic boundaries, (v) as each RI provides a different data interface and/or services there is an inherent problem facing users to learn multiple systems (Koureas et al., 2016). Even with the vastly improved access to RI data and information, scientists by themselves generally do not have the time, skills or incentives to develop solutions to these challenges. This requires directed effort (Cheruvelil et al., 2014, Specht 2017).

How does the synthesis center help? And how does this relate to infrastructures?

Synthesis centres (SCs) emerged in the mid 1990s in response to a growing need to support stakeholders (researchers and others) in the synthesis and analysis of existing data to address pressing complex environmental questions and produce transformative, innovative outcomes. At any one time since the 2000s there have been around eight operating synthesis centers across the globe that specifically deal with ecosystem questions (www.synthesis-consortium.org). SCs have several hallmarks: (i) the exclusive use of existing data and open-data policies for their results, (ii) multi- and transdisciplinary collaboration across generational, organisational and geographical boundaries, (iii) stimulation of theoretical, analytical and conceptual understanding leading to new knowledge, i.e. convergent outcomes, (iv) development of new methodologies, and (v) fostering data science literacy (Lynch et al., 2015). This SC approach has an excellent track record of producing innovative and widely cited outcomes. For example, average citations per article produced from the French synthesis center CESAB over nine years, was 10, with a maximum of 230 for one article, in comparison with a reference corpus in the ecological sciences of 1.2 (Larivière et al., 2016). Several SCs have stimulated the creation of new tools (e.g. metacat, the metadata discovery browser which originated at NCEAS), open-access r-scripts for analysis (e.g. https://gdauby.github.io/rainbio/index.html), and tools to help solve the difficulties of conflicting terminology (e.g. the CESAB Thesauform. SCs are also acknowledged for creating a culture of interdisciplinary and trans-organisational collaboration (Baron et al., 2017, Hackett et al. 2019). Again, a simple example from CESAB shows the average number of authors per article produced is seven (2018 analysis), compared to an average for ecological articles in the database of the Web of Science of 4.1 (Lalonde B.S., Bibliometric fun facts).

While there is a common approach, each SC differs in (i) their thematic emphasis (from biodiversity and earth sciences through to genomics and chemistry), and (ii) funding source. Funding sources and the SC raison d’etre proscribes their specific focus, as well as their operational constraints and reporting metrics. Some are funded by research grants (e.g. SESYNC), while others like CESAB are funded by a consortium of research agencies and private foundations. The Canadian synthesis center is funded through a university consortium and a scientific society. Some, like the Australian synthesis center, ACEAS, now closed, and the John Wesley Powell Center, gain(ed) much of their funding from government programs. The greater part of SC funding goes to support group participation, the remainder to a small staff, meeting rooms and computing infrastructure. No stipends are paid to group participants, their time is provided in-kind. Some SCs support postdoctoral or visiting fellows, sometimes affiliated with a specific synthesis group.

The SC facilitates synthesis activities by bringing together small groups of scientists, managers and policy makers (depending on the center) to meet together over a set period of time to tackle pressing questions for which there is sufficient existing data and information. SC staff assist in the formation of these teams, providing scaffolding support, both remotely and in face-to-face interaction. [‘Scaffolding’ is a term used in educational practice, entirely appropriate to the work of synthesis center staff, who accompany the group on their synthesis journey, providing support as required and ‘fading’ once the group members have acquired competence or solved their problem (Belland, 2014).]

Integrating complex data from multiple sources over multiple time-frames for synthesis activities is a common challenge. SC staff assist groups find data, navigate data-use agreements and intellectual property rights, help source analytical tools for data and code-sharing, and provide support for communicating results (Pollock et al., 2019).This experience improves participant’s skillsets, their scientific understanding, and how to access valuable RI resources. They take this knowledge with them when they return to their ‘day job’.

SC groups provide a small, focused user community for the RI as these communities actively engage with the RI to mutually tackle frontier science questions, solve operational challenges or some combination thereof. For example, at two synthesis centers (ACEAS and CESAB) over a period of five years, groups of scientists worked with at least ten different RIs to access and deposit data, and in turn have provided insights that contributed towards the RIs, enabling scientific advances and optimising operations. RI staff have worked directly with synthesis groups or as bona fide group members. As an example of this interaction, an ACEAS synthesis group provided vital input to the installation of phenocams in TERN observatories while producing refereed articles (e.g. Moore et al., 2016), and aligned this capability with other continental-scale RIs, e.g. NEON. In another example, various ACEAS groups advocated the importance of visualisation in the communication of the science. This resulted in the development of a visualisation portal (Specht et al., 2015). This visualisation tool was used actively by policy-makers influencing change in government policy and in the development of at least one entirely new RI (the AusPollen network). An SDI staff member, Sujeevan Ratnasignham, was a core member of a CESAB group, facilitating access to and submission of the group’s data with BOLD. As a result of his interaction with the group he expanded the data model at BOLD, and developed some new analytical functions (Ratnasignham pers. comm., 2019). It is only at these SCs where diverse groups of researchers and data come together with the sole purpose of advancing understanding, that such clear direction (feedback) is obtained to accommodate and optimise the RI to facilitate these new scientific frontiers. This unique and integrated activity provides a value-added interaction that cannot otherwise be achieved.

As such, NSF NEON and NCEAS, the oldest of the modern SCs, are working to achieve these goals. The USGS John Wesley Powell Center brings together USGS scientists and other ‘outside’ researchers with the vast ScienceBase repository hosted and managed by the USGS to develop new knowledge and inform an actionable path forward for new observations, infrastructure and data needs. As SCs are high-publication hubs, the DOIs of the data they use and produce can be counted towards the effectiveness and relevance of the RIs that provided them.

The networks of experts and the collective knowledge built through the SCs are formidable (Lynch et al., 2015, Baron et al., 2017). SCs are skilled in facilitating science synthesis and many of today’s major issues benefit from the concerted collaborative approach that they provide. The societal and scientific imperatives for SCs have never been greater. We need to think together on how best to advance the frontiers of science using the RI and synthesis center for sustainable solutions in light of a rapidly changing environment. We hope we have demonstrated that there are synergies between RIs and SCs which benefit and enhance the function of both.

Author info

Alison Specht

Alison holds a PhD from the University of Queensland. She has been a teaching and research academic for 30 years, specialising in ecosystem processes, long-term observations using different approaches (satellite image analysis, on-ground measurements etc), collation and mapping of biodiversity data from diverse sources, extreme natural events, and innovation adoption for better conservation and land management. She is interested in how to catalyse new understanding and embed knowledge, from the application of teaching techniques, such as authentic learning, through to the functionality and productivity of inter- and trans-disciplinary working groups. She has been the director of two synthesis centers and initiated the establishment of the International Synthesis Consortium. She has been on a working group of DataONE for 10 years, an active contributor to the development of ecosystem observatories (including TERN and ANaEE). She is a co-lead of the Belmont-funded PARSEC project.

Pedro Luiz Corrêa

Pedro has a PhD in Electrical Engineering from the University of São Paulo. He has been a visiting scholar at the Oak Ridge National Laboratory and the University of Tennessee (USA). His expertise is in scientific data management, including data management of protected areas, and he works with the Brazilian atmospheric and biodiversity scientific communities. He is the coordinator of a research group on Big Data of EPUSP. He is responsible for undergraduate disciplines of Information Systems, Modelling and Simulation of Computer Systems, Object-Oriented Programming and Projects of Digital Systems. Advisor and Professor in Graduate Programs (master and PhD) of Information Systems and Electrical Engineering at USP in the research areas of Big Data and Data Science. He is a Fellow of the National Council for Scientific and Technological Development, CNPq. He is the country lead for the Belmont Forum funded project, PARSEC.

Lee Belbin

Lee holds a masters degree in Information Technology from the University of Canberra. He specialises in ecological data analysis and data mining, investigating methods for effective conservation planning, providing leadership and project management. He has worked in ecological research with two CSIRO Divisions, and established the Australian Antarctic Data Centre at the Australian Antarctic Division. In that position, Lee was responsible for establishing data sharing policies for Australian Antarctic science and for the Antarctic Treaty. He designed the research portals for the Atlas of Living Australia (ALA) and for the past seven years has been the ALA’s Science Advisor.  Lee also developed the governance and processes of Biodiversity Information Standards (TDWG) and is currently the Convenor of the Data Quality Task Group on Tests and Assertions.

Hank W. Loescher

Hank’s career has been at the nexus of science, engineering and project development. He gained his PhD from the University of Florida in Ecosystem Ecology. He was the second hire for the National Ecological Observatory Network (NEON), a first-of-its continental-scale ecology major research facility and has led numerous NEON Science Teams i.e., National Continental Design, Instruments, Mobile Platforms, and Aquatic teams. He is currently directing Battelle’s strategic development efforts for Environmental Research Infrastructures.

Hank’s research interests include determining the biotic and abiotic controls on ecosystem-level carbon and energy balance across spatial and temporal scales (re. eddy covariance). Prior to coming to NEON, he was faculty at the College of Forestry, Oregon State University and managing the DOE AmeriFlux program. Hank has over 70 publications and continues to be an active member in the terrestrial biogeochemistry research community.

Digital Object Identifier (DOI)

https://doi.org/10.5281/zenodo.3660920

Cite as

Specht, A., Corrêa, P., Belbin, L., Loescher, H.W. (2020). Critical research infrastructure: The importance of synthesis centers. Elephant in the Lab. DOI: 10.5281/zenodo.3660920

References

Collapse references

Baron J., Specht A., Garnier E., Bishop P., Campbell A., Davis F., Fady B., Field D., Gross L., Guru S., Hampton S., McClain C., Parker J., Price R., Rawson C., Rodrigo A., Sheble L., Winter M. (2017) Synthesis centres as critical research infrastructure. BioScience 67(8): 750-59. doi: 10.1093/biosci/bix053

Belbin, L. (2011) The Atlas of Living Australia’s Spatial Portal. pp. 39-43 In, Proceedings of the Environmental Information Management Conference 2011 (EIM 2011) (Ed M.B. Jones  and C. Gries) Santa Barbara, USA. doi:10.5060/D2NC5Z4X

Belland B.R. (2014) Scaffolding: definition, current debates and future directions. pp. 505-518, In, Handbook of research on educational communications and technology. (eds J.M. Spector et al.) Springer, New York. doi: 10.1007/978-1-4614-3185-5_39

Chabbi A., Loescher H.W., Dillon M.S. (2017) Integrating environmental science and the economy: innovative partnerships between the private sector and research infrastructures. Frontiers in Environmental Science 5:49. doi: 10.3389/fenvs.2017.00049

Cheruvelil, K.S., Soranno, P.A., Weathers, K.C. Hanson P.C., Goring S.J., Filstrup C.T., Read E.K. (2014) Creating and maintaining high-performing collaborative research teams: the importance of diversity and interpersonal skills. Frontiers in Ecology and Environment 12: 31-38. doi: 10.1890/130001

Costa M.R., Qin J., Bratt S. (2016) Emergence of collaboration networks around large scale data repositories: a study of the genomics community using GenBank. Scientometrics 108: 21-40. doi: 10.1007/s11192-016-1954-x

Finholt T.A., Olson G.M. (1997) From laboratories to collaboratories: a new organizational form for scientific collaboration. Psychological Science 8(1): 28-36.

Finkel M., Baur A., Weber T.K.D., Osenbrück K., Rügner H., Leven C., Schwientek M., Schlögl J., Hahn U., Streck T., Cirpka O.A., Walter T., Grathwohl P. (2020) Managing collaborative research data for integrated, interdisciplinary environmental research. Earth Science Informatics doi: 10.1007/s12145-020-00441-0

Frank A., Heikkurinen M., Jamitzkey F., Satzger H. (2016) In need of partnerships–an essay about the collaboration between computational sciences and IT services. Journal of Computational Science 14: 78-84. doi: 10.1016/j.jocs.2016.01.008

Guru S., Hanigan I.C., Nguyen N.A., Burns E., Stein J., Blanchard W., Lindenmayer D., Clancy T. (2016) Development of a cloud-based platform for reproducible science: a case study of an IUCN Red List of ecosystems assessment. Ecological Informatics 36: 221-230. doi: 10.1016/j.ecoinf.2016.08.003

Hackett E.J., Leahey E., Parker J.N., Rafols I., Hampton S., Corte U., Drake J.M., Penders B., Sheble L., Vermeulen N., Vision T. (2019) Do synthesis centers synthesize? A semantic analysis of diversity and performance. bioRiv. doi: https://doi.org/10.1101/518605

Larivière V., Kiermer V., MacCallum C.J., McNutt M., Patterson M., Pulverer B., Swaminathan S., Taylor S. (2016) A simple proposal for the publication of journal citation distributions. Publication of Journal Citation Distributions doi: 10.1101/062109

Lee C.P., Schmidt K. (2018) A bridge too far? Critical comments on the concept of ‘infrastructure’ in CSCW and IS. pp.117-217 in Socio-informatics: a practice-based perspective in Design and use of IT artifacts (Eds, V. Pipek, D. Randall, M. Rohde, K. Schmidt, G. Stevens) Oxford University Press, Oxford. doi: 10.1093/oso/9780198733249.001.0001

Lynch A.J.J., Thackway R., Specht A., Beggs P.J., Brisbane S., Burns E., Byrne M., Capon S.J., Casanova M., Clarke P.A., Davies J.M., Dovers S., Dwyer R., Ens E., Fisher D.O., Flanigan M., Garnier E., Guru S., Kilminster K., Locke J., Mac Nally R., McMahon K.M., Mitchell P.J., Pierson J., Rodgers E.M., Russell-Smith J., Udy J., Waycott M. (2015) Transdisciplinary synthesis for ecosystem science, policy and management: the Australian experience, Science for the Total Environment. 534: 173-184. doi: 10.1016/j.scitotenv.2015.04.100

Koureas, D., Arvanitidis, C., Belbin, L., Berendsohn, W., Damgaard, C., Groom, Q., Güntsch A., Hagedorn, G., Hardisty, A., Hobern, D., Marcer, A., Mietchen, D., Morse, D., Obst, M., Penev, L., Pettersson, L., Sierra, S., Smith, V., Vos, R. (2016) Community engagement: The ‘last mile’ challenge for European research e-infrastructures. Research Ideas and Outcomes 2: e9933. doi: 10.3897/rio.2.e9933.

Michener W.K., Allard S., Budden A., Cook R.B., Douglass K., Frame M., Kelling S., Koskela R., Tenopir C., Vieglais D.A. (2012) Participatory design of DataONE–enabling cyberinfrastructure for the biological and environmental sciences. Ecological Informatics 11: 5-15. doi: 10.1016/j.ecoinf.2011.08.007

Moore C.E., Brown T., Keenan T.F., Duursma R.A., van Dijk A.I.J.M., Beringer J., Culvenor, D., Evans, B., Huete A., Hutley L.B., Maier S., Restrepo-Coupe N., Sonnentag O., Specht A., Taylor J.R., van Gorsel E., Liddell M.J. (2016) Reviews and syntheses: Australian vegetation phenology: new insights from satellite remote sensing and digital repeat photography. Biogeosciences 13: 5085-5102. doi: 10.5194/bg-13-5085-2016.

Musche M., Adamescu M., Angelstam P., Bacher S., Bäck J., Buss H.L., Duffy C., Flaim G., Gailardet J., Giannakis G.V., Haase P., Halada L., Kissling W.D., Lundin L., Matteucci G., Meesenburg H., Monteith D., Nikolaidis N.P., Pipan T., Pyšek P., Rowe E.D., Roy D.B., Sier A., Tappeiner U., Vilà M., White T., Zobel M., Klotz S. (2019) Research questions to facilitate the future development of European long-term ecosystem research infrastructures: a horizon scanning exercise. Journal of Environmental Management 250: 109479. doi: 10.1016/j.jenvman.2019.109479

Özmen Ö., Yilmaz L., Smith J. (2016) The impact of socio-technical communication styles on the diversity and innovation potential of global science collaboratories. Computational and Mathematical Organization Theory 22: 521-548. doi: 10.1007/s10588-016-9213-5

Peters D.P.C., Loescher H.W., SanClements M., Havstad K.M. (2014) Taking the pulse of a continent: role of observatories and long-term research networks to fill critical knowledge gaps. Ecosphere 5(3): 1-23. doi: 10.1890/ES13-00295.1

Pollock D., Allard S., Yan A., Parker M. (2019) The role of data & synthesis centers in convergence research. Proceedings of the Association for Information Science and Technology 56(1): 745-746. doi: 10.1002/pra2.158.

Ratnasignham S., Hebert P.D. (2007) BOLD: the barcode of life data system. Molecular Ecology Notes 7(3): 355-364. doi: 10.1111/j.1471-8286.2006.01678.x

Specht A. (2017) Synthesis centres: their relevance to and importance in the anthropocene. Pp. 469-488. In, Terrestrial Ecosystem Research Infrastructures: challenges and opportunities (Eds. A. Chabbi and H.W. Loescher), CRC Press, Boca Raton, FL, USA. doi: 10.1201/9781315368252

Specht A., Guru S.M., Houghton L., Keniger L., Driver P., Ritchie E., Lai K., Treloar A. (2015) Data management challenges in analysis and synthesis in the ecosystem sciences. Science for the Total Environment. 534:144-158. doi: 10.1016/j.scitotenv.2015.03.092

Suresh, S. (2012) Global challenges need global solutions. Nature, 490 (7420): 337-338. doi: 10.1038/490337a

White, T., Brantley, S., Banwart, S., Chorover, J., Dietrich, W., Derry, L., Lohse K., Anderson S., Aufdendkampe A., Bales R., Kumar P., Richter D., McDowell, B. (2015) The Role of Critical Zone Observatories in Critical Zone Science. In, Developments in Earth Surface Processes Vol. 19, pp. 15-78. Elsevier. doi: 10.1016/B978-0-444-63369-9.00002-1

0 Comments

Continue reading