Bold ideas and critical thoughts on science.

In this piece, Specht, Corrêa, Belbin and Loescher elaborate their thoughts on the role of synthesis centers in facing today's challenges of research.

There is more than one sort of scientific research infrastructure: those that provide technology and computational capacity, and those that are supplied by the community who works with it (Baron et al., 2017). We suggest that computing research infrastructures provide an information environment that support effective research, but of themselves are insufficient to provide the insights and understanding we desire. Computing ‘information infrastructures’ began in the 1960s, flourishing in the 1990s as technology advanced, growing alongside the research they supported (Lee and Schmidt, 2018). The establishment of increasingly sophisticated cyberinfrastructures, such as the SuperMUC-NG of the Leibniz Supercomputer Center, the European Open Science Cloud , the National Computational Infrastructure in Australia, and the Global Earth Observation of Systems, have significantly enabled new opportunities for research. Research Infrastructures (RIs) such as these provide sophisticated, long-term instrumentation combined with the computing capacity to answer complex questions beyond the scope of the individual researcher or single laboratory. Such cyberinfrastructures are complex systems, often distributed ‘systems of systems’, locally and globally linked. They are usually consortium or government-funded and deliver open-access data.

The computational capacity and technology developed since the 1990s has inspired a variety of more domain-specific RIs, such as those relevant to environmental science, the subject of this article. These include RIs that have integrated computational capacity for big data computing and simulation modelling, provide data and large storage capacity. The wealth of data now available is extraordinary but can present a logistic challenge for users as they gather data, compile it, merge data from different sources, extract needed data and analyse it. The relationship between the RI and the user community is critical for full benefit to be gained from the available data and technology (Frank et al., 2016, Finkel et al., 2020). We introduce some of the types of environmental RIs, discuss some of the unique challenges that face them in their provisioning of data to the broad environmental science community, and the role the synthesis center can play in alleviating those challenges.

Environmental Research Infrastructures (ERIs) incorporate data from experiments and observations that are updated automatically and continuously and may be supplemented by individual measurements. Chabbi et al. (2017) and Peters et al. (2014) summarised ERIs such as EISCAT, a northern European scientific association which conducts ionospheric and atmospheric measurements with radars, the National Ecological Observatory Network, a continental-scale USA ecological observatory NEON, the Terrestrial Ecosystem Research Network in Australia TERN, and the long term ecological research networks as LTER in the USA and iLTER inter alia. These ERIs are commonly regionally-specific, such as the newly-established Atmospheric Data Centre in Brazil (Corrêa pers. comm.), and the Biodiversity Exploratories BE established in Germany in 2006. DataONE – the Data Observation Network for Earth – is a cyberinfrastructure platform that was developed to support rapid data discovery across a variety of these platforms, commencing in the USA, but with a global reach (Michener et al., 2012).

Critical Zone Observatories (CZOs) are a National Science Foundation (USA) funded project, which supports watershed or catchment-based observatories focussed on processes within the soil-biota-atmosphere interface (White et al., 2015). Using a suite of linked sites, their aim is to support the development of a common infrastructure for cross-disciplinary work optimising combined resources to advance earth system science understanding.

The Scientific Data Repository (SDI) is another type of infrastructure that has roots in the traditional practice of scientific collections. SDIs have been enhanced by advances in genomics, and new analytical opportunities that cyber-infrastructure has afforded. SDIs in broad terms provide a resource for the preservation and re-use of primarily biodiversity data, and include the Atlas of Living Australia ALA, the Global Biodiversity Information Facility GBIF and the Barcode of Life Data System BOLD (Ratnasignham and Hebert, 2007). Many SDIs have developed tools for data analysis and visualisation that take them beyond the simple classification of a repository (e.g. ALA). Several, like GBIF, act as a discovery tool for other repositories.

Although broadly domain-specific, the large computational infrastructures, ERIs, CZOs and SDIs cover many sources of data within their domain specialties. Because data integration among large-scale RIs is an emergent activity, there is a natural overlap in some capabilities. On the one hand this may appear redundant, or even confusing to an end user (Koureas et al., 2016), on the other hand. interoperability among different domain specialities is also an opportunity to examine new types of data that may not have been otherwise discovered. There is a need, however, to integrate data infrastructures to make them more efficient and discoverable (Michener et al., 2012, Koureas et al., 2016; Hardisty et al. 2019). These RIs are not static, as the information they manage is continually updated and technology changes. Given that a great number of the pressing environmental science questions are multi-dimensional, multi-scalar, and require transdisciplinary responses, the heterogeneous nature of RIs and the linkage among them seems appropriate (Musche et al., 2019).

The importance of the ‘client’

These RIs have emerged from and are embedded in their scientific communities of practice; their client-base. This is recognised by most RIs but is not always dealt with successfully; active client engagement takes time away from operating the infrastructures, and often requires skill sets the infrastructure staff do not have, or for which they do not have the time or budget. Each RI approaches ‘client’ engagement in different ways, starting from passive services (e.g. periodic market analysis), while others take a more participatory approach. ‘Collaboratories’, computer-supported systems that allow scientists to work with each other across facilities and databases independent of geographical location (Finholt and Olson, 1997), emerged in the late 1990s to make network sharing and computing resources, software and computational infrastructure better able to respond to the needs of scientists. Such an approach has enabled scientists to co-develop new cyberinfrastructure tools in real-time in response to their competences and needs (Belbin, 2011, Guru et al., 2016, Özmen et al., 2016). Several organisations, such as the Research Computing Centre at the University of Queensland and the Leibniz supercomputing center have developed partnership programs to engage with their users, that continue to co-develop scientific applications and services (Frank et al., 2016).

Although many communities of practice existed before the RIs, all have benefited from them. Advancements in the genomic field are an excellent example of the co-evolution of cyber-infrastructure and science: both started at around the same time and have developed together. So it is not surprising, therefore, that the National Centre for Biotechnology Information’s GenBank, established in 1982, is closely embedded in its community of practice (Costa et al., 2016).

For RIs to be valued by the client, they must be used and seen to advance science in ways that would not otherwise be available: their ‘value proposition’. All RIs are built on a number of community assumptions, and testing these assumptions and the validity of the science they provide becomes an academic effort. Community activities inform the natural evolution of the RIs, optimises their approaches, and gleans ‘ownership’ by the client; if this is done to their benefit, and advances their science, they will support it.

Back to the science

The complexity and enormity of current environmental challenges requires solutions that transcend disciplinary boundaries (Suresh, 2012, Lynch et al., 2015). RIs such as those described have grown in response to the need of scientists to have access to long-term, quality-driven data across a wide variety of scientific domains and scales. This has, however, created new challenges for the scientists that include: (i) aggregating and rectifying data from multiple sources, (ii) finding the time to analyse and synthesise diverse sources of information to gain new understanding and insight, (iii) understanding how best to make sense of the information thus gathered, and (iv) finding methods for communicating across disciplinary, cultural, organisational and geographic boundaries, (v) as each RI provides a different data interface and/or services there is an inherent problem facing users to learn multiple systems (Koureas et al., 2016). Even with the vastly improved access to RI data and information, scientists by themselves generally do not have the time, skills or incentives to develop solutions to these challenges. This requires directed effort (Cheruvelil et al., 2014, Specht 2017).

How does the synthesis center help? And how does this relate to infrastructures?

Synthesis centres (SCs) emerged in the mid 1990s in response to a growing need to support stakeholders (researchers and others) in the synthesis and analysis of existing data to address pressing complex environmental questions and produce transformative, innovative outcomes. At any one time since the 2000s there have been around eight operating synthesis centers across the globe that specifically deal with ecosystem questions (www.synthesis-consortium.org). SCs have several hallmarks: (i) the exclusive use of existing data and open-data policies for their results, (ii) multi- and transdisciplinary collaboration across generational, organisational and geographical boundaries, (iii) stimulation of theoretical, analytical and conceptual understanding leading to new knowledge, i.e. convergent outcomes, (iv) development of new methodologies, and (v) fostering data science literacy (Lynch et al., 2015). This SC approach has an excellent track record of producing innovative and widely cited outcomes. For example, average citations per article produced from the French synthesis center CESAB over nine years, was 10, with a maximum of 230 for one article, in comparison with a reference corpus in the ecological sciences of 1.2 (Larivière et al., 2016). Several SCs have stimulated the creation of new tools (e.g. metacat, the metadata discovery browser which originated at NCEAS), open-access r-scripts for analysis (e.g. https://gdauby.github.io/rainbio/index.html), and tools to help solve the difficulties of conflicting terminology (e.g. the CESAB Thesauform. SCs are also acknowledged for creating a culture of interdisciplinary and trans-organisational collaboration (Baron et al., 2017, Hackett et al. 2019). Again, a simple example from CESAB shows the average number of authors per article produced is seven (2018 analysis), compared to an average for ecological articles in the database of the Web of Science of 4.1 (Lalonde B.S., Bibliometric fun facts).

While there is a common approach, each SC differs in (i) their thematic emphasis (from biodiversity and earth sciences through to genomics and chemistry), and (ii) funding source. Funding sources and the SC raison d’etre proscribes their specific focus, as well as their operational constraints and reporting metrics. Some are funded by research grants (e.g. SESYNC), while others like CESAB are funded by a consortium of research agencies and private foundations. The Canadian synthesis center is funded through a university consortium and a scientific society. Some, like the Australian synthesis center, ACEAS, now closed, and the John Wesley Powell Center, gain(ed) much of their funding from government programs. The greater part of SC funding goes to support group participation, the remainder to a small staff, meeting rooms and computing infrastructure. No stipends are paid to group participants, their time is provided in-kind. Some SCs support postdoctoral or visiting fellows, sometimes affiliated with a specific synthesis group.

The SC facilitates synthesis activities by bringing together small groups of scientists, managers and policy makers (depending on the center) to meet together over a set period of time to tackle pressing questions for which there is sufficient existing data and information. SC staff assist in the formation of these teams, providing scaffolding support, both remotely and in face-to-face interaction. [‘Scaffolding’ is a term used in educational practice, entirely appropriate to the work of synthesis center staff, who accompany the group on their synthesis journey, providing support as required and ‘fading’ once the group members have acquired competence or solved their problem (Belland, 2014).]

Integrating complex data from multiple sources over multiple time-frames for synthesis activities is a common challenge. SC staff assist groups find data, navigate data-use agreements and intellectual property rights, help source analytical tools for data and code-sharing, and provide support for communicating results (Pollock et al., 2019).This experience improves participant’s skillsets, their scientific understanding, and how to access valuable RI resources. They take this knowledge with them when they return to their ‘day job’.

SC groups provide a small, focused user community for the RI as these communities actively engage with the RI to mutually tackle frontier science questions, solve operational challenges or some combination thereof. For example, at two synthesis centers (ACEAS and CESAB) over a period of five years, groups of scientists worked with at least ten different RIs to access and deposit data, and in turn have provided insights that contributed towards the RIs, enabling scientific advances and optimising operations. RI staff have worked directly with synthesis groups or as bona fide group members. As an example of this interaction, an ACEAS synthesis group provided vital input to the installation of phenocams in TERN observatories while producing refereed articles (e.g. Moore et al., 2016), and aligned this capability with other continental-scale RIs, e.g. NEON. In another example, various ACEAS groups advocated the importance of visualisation in the communication of the science. This resulted in the development of a visualisation portal (Specht et al., 2015). This visualisation tool was used actively by policy-makers influencing change in government policy and in the development of at least one entirely new RI (the AusPollen network). An SDI staff member, Sujeevan Ratnasignham, was a core member of a CESAB group, facilitating access to and submission of the group’s data with BOLD. As a result of his interaction with the group he expanded the data model at BOLD, and developed some new analytical functions (Ratnasignham pers. comm., 2019). It is only at these SCs where diverse groups of researchers and data come together with the sole purpose of advancing understanding, that such clear direction (feedback) is obtained to accommodate and optimise the RI to facilitate these new scientific frontiers. This unique and integrated activity provides a value-added interaction that cannot otherwise be achieved.

As such, NSF NEON and NCEAS, the oldest of the modern SCs, are working to achieve these goals. The USGS John Wesley Powell Center brings together USGS scientists and other ‘outside’ researchers with the vast ScienceBase repository hosted and managed by the USGS to develop new knowledge and inform an actionable path forward for new observations, infrastructure and data needs. As SCs are high-publication hubs, the DOIs of the data they use and produce can be counted towards the effectiveness and relevance of the RIs that provided them.

The networks of experts and the collective knowledge built through the SCs are formidable (Lynch et al., 2015, Baron et al., 2017). SCs are skilled in facilitating science synthesis and many of today’s major issues benefit from the concerted collaborative approach that they provide. The societal and scientific imperatives for SCs have never been greater. We need to think together on how best to advance the frontiers of science using the RI and synthesis center for sustainable solutions in light of a rapidly changing environment. We hope we have demonstrated that there are synergies between RIs and SCs which benefit and enhance the function of both.