Bulletin of the World Health Organization

Systematic archiving and access to health research data: rationale, current status and way forward

Manju Rani a & Brian S Buckley b

a. Western Pacific Regional Office, World Health Organization, Corner Taft and UN Avenue, Manila-1000, Philippines.
b. Philippine General Hospital, University of the Philippines, Manila, Philippines.

Correspondence to Manju Rani (e-mail: ranim@wpro.who.int).

(Submitted: 04 April 2012 – Revised version received: 21 September 2012 – Accepted: 24 September 2012 – Published online: 10 October 2012.)

Bulletin of the World Health Organization 2012;90:932-939. doi: 10.2471/BLT.12.105908

Introduction

Despite repeated global calls for increased investment in health research,13 securing investment can be challenging, especially in developing countries where research may compete with health service delivery for funding and personnel. Advocacy for increased investment can also be undermined by stakeholders’ doubts about the efficiency and effectiveness of research, by failure to realize the potential of previous investment due to the poor utilization of research outputs and by a low level of public trust in research.46 In this context, some way of increasing the accountability, efficiency and effectiveness of research is needed. In addition to universal clinical trial registration and open access to publications, two closely linked strategies have considerable potential: the systematic archiving of unaggregated data generated by research studies and wider access to databases. Both would facilitate the secondary use of data within and, preferably, between countries.

In recent decades, there have been several high-level initiatives advocating the routine archiving and sharing of health research data.711 The rationale for this is both scientific and economic. Sharing data facilitates reinforces the collaborative and cumulative processes involved in creating scientific knowledge.7 It can also promote new research and enable the testing of new or alternative hypotheses. For example, combination and meta-analysis of databases can allow researchers to examine trends through time and between regions.710,12,13 In addition, archiving and sharing data can increase the transparency and accountability of research and bolster its reliability and authority by enabling other investigators to repeat or extend analyses. Since data collection is often a significant and expensive aspect of research, ensuring that databases can be used repeatedly increases the financial return on research investment by reducing the possibility of data duplication.

Despite these benefits, systematic data archiving and sharing are not yet the norm, especially in low- and middle-income countries. Moreover, many health research databases are not efficiently cleaned, managed or used, even by primary researchers, and often data are stored informally by institutions or individual researchers, which makes secondary use impossible.14 Systematic and secure data archiving can ensure that these valuable resources are available for answering future public health questions.

This paper discusses important developments in data-sharing policy and highlights factors in health research that may affect policy implementation, with particular reference to countries in the World Health Organization (WHO) Western Pacific Region. In addition, practical strategies for fostering data sharing are considered.

Global context

In 1997, a collaboration of scientific bodies concluded that: “The value of data lies in their use. Full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research.”15 Subsequently, in 2004, the 30 countries of the Organization for Economic Cooperation and Development, along with China, Israel, the Russian Federation and South Africa, adopted the Declaration on Access to Research Data from Public Funding and, in 2007, issued principles and guidelines.7 In 2007, the European Union called upon member states to develop data-sharing policies.16 The World Bank and the Health 8 group of international health agencies made commitments towards data sharing in early 2010.

A global consultation in 2008 led to a joint statement of purpose on sharing research data by 17 major research funding organizations in 2011.10 The statement asserted that: (i) data sharing should be equitable, ethical and efficient; (ii) the interests of researchers who create the data sets, researchers who want to reuse the data and the communities and funders who expect health benefits to ensue from the research should be recognized; (iii) the privacy of individuals must be protected; and (iv) data sharing should increase the quality and value of research.10 Finally, both the WHO global strategy and plan of action for public health, innovation and intellectual property and the WHO research for health strategy call for greater access to data through improved sharing.17,18

Although important and influential, these high-level declarations were limited to general principles and provided little operational guidance on how or when data should be shared. They acknowledged the need for flexibility in individual countries’ approaches to data sharing and recognized that the cost must be balanced against the potential benefits.

Data-sharing policy

Dialogue at the global level has yet to trickle down to the national level and this is particularly true for developing countries. In the WHO Western Pacific Region, there is no ongoing discussion of policy and no specific policy on data sharing in most countries, apart from a few developed countries.19

However, in developed countries an increasing number of international research funding institutions are adopting data-sharing policies. The policies do not differ a great deal, although there is some heterogeneity. For example, the National Science Foundation in the United States of America, the Austrian Science Fund, the British Medical Research Council and Cancer Research UK require all applicants for funding to provide detailed strategies for data archiving and sharing, whereas the United States National Institutes of Health has this requirement only for research funding in excess of 500 000 United States dollars and the Wellcome Trust in the United Kingdom requires it for research that results in databases with significant value for the wider research community.

In the Western Pacific Region, the National Health and the Medical Research Council in Australia and the Health Research Council of New Zealand are signatories to the joint statement of purpose on data sharing.10 The Australian body requires the open publication of and the sharing of data from any research that it funds unless a reason for limiting access can be demonstrated. In New Zealand, a data-sharing policy is being developed for introduction in 2012 through a consultation with stakeholders.19

No existing policy stipulates in detail how data should be archived or how access should be managed. This reflects a lack of consensus on best practice that may in part be due to the heterogeneity of health research, the need for flexibility and the lack of a single method that suits all forms of research and data.9,19 The policy of the National Institutes of Health leaves researchers free to decide where data are archived and how databases are shared. Researchers can use a central repository or establish databases and manage access themselves. The policy of the Austrian Science Fund states that data should be housed in subject-specific or institutional repositories. The Wellcome Trust suggests which repositories could be used but makes no stipulations as long as the archive used is accessible and can be linked to others. In Australia, research institutions must provide adequate facilities and infrastructure for secure data archiving and have a policy on database management.

Approaches also vary on when data should be shared. Most policies explicitly acknowledge that primary researchers should have exclusive access for a certain period. The National Institutes of Health require data to be made available no later than the date of acceptance of the final research report; the Australian National Health and Medical Research Council require them to be available within 12 months of the publication of a peer-reviewed research paper; and the Austrian Science Fund, within 2 years of the end of the project. Other policies, such as those of the British Medical Research Council and the Wellcome Trust, simply refer to “timely” sharing of data. In both Malaysia and Thailand, primary researchers have exclusive use of data for 2 years before sharing is required.20

All existing policies maintain that secondary researchers should acknowledge both the primary researchers and the data source. Some organizations, such as the National Institutes of Health and the Wellcome Trust, recommend that, where appropriate, data should be shared through the collaboration of, and by mutual agreement between, primary and secondary researchers to balance the need to maximize access with the need for safeguards.11,21

Existing policies acknowledge that it may not be possible to share some data for ethical, confidentiality or privacy reasons. International agreements on ethical principles in health research involving human subjects state that an individual’s privacy and data confidentiality must be protected and that, with some exceptions, informed consent on the use of personal health information must be obtained from study participants.22,23 It could be argued that these principles are upheld if data are made anonymous before archiving for possible sharing and if the risk to study participants is minimal. However, it has been pointed out that, to date, policies have been proposed without sufficient discussion of how ethical standards can be maintained during data archiving and sharing or how risks to participants can be prevented.24 Nor do policies provide guidance on these matters.

Policy effectiveness

There has been little evaluation of whether data-sharing policies are effective for ensuring that researchers comply with recommendations or for increasing the amount of research carried out.25 Compliance with policy in genomics research, which is considered the frontrunner in data sharing, seems to be fairly good, though it is far from universal. One study of papers published in six key journals that required data sharing as a precondition for publication found that at least 85% of authors reported depositing data in the global deoxyribonucleic acid (DNA) data repository.26 Overall, however, a great deal remains unshared, especially data from studies of cancer and human subjects.27 Despite claims that microarray data are now routinely stored in accessible archives, less than 50% of data sets are deposited, often because of technical difficulties.28,29

Though little “direct” analysis of the public health impact of formal data-sharing policies has been carried out, databases have been shown to have a huge impact on research when they are made accessible. Databases from Demographic and Health Surveys conducted in more than 70 developing countries since the 1980s are accessible globally, which demonstrates that cultural, ethical and technical barriers to data sharing can be overcome. In 2010 alone, there were nearly 4000 requests for data from these databases.30 The number of peer-reviewed publications based on data from these surveys has increased substantially and has influenced health policy in many countries.31 Similarly, by June 2011, some 650 peer-reviewed papers based on the United States National Cancer Institute’s Surveillance, Epidemiology and End Results databases had been published; the influence of these databases on researchers’ understanding of treatment and survival in cancer is undoubted.32 The long-running Caerphilly Prospective Study of cardiovascular disease in the United Kingdom has resulted in the publication of some 150 peer-reviewed papers.33 In addition, data from the nationally representative United Kingdom National Cancer Data Repository has helped in interpreting the results of less representative experimental research.34 A case study that compared the outputs of two large-scale surveys in the Philippines demonstrated the value of data sharing but also indicated that data analysis capacity needs to be built up if the full benefits are to be realized (Box 1).

Box 1. Data sharing and its effect on research output and efficiency: a case study from the Philippines

Several health programmes and research institutions in the Philippines undertake research based on large-scale surveys that produce data which may be valuable over the long term and help answer a wide range of public health questions. In the absence of an explicit policy or measure that requires individuals and institutions to archive and share data, these data are usually retained by the institutions or individuals that produced them. Consequently, the status and quality of the archives available for future use remain uncertain.

We assessed the effect of data archiving and sharing in the Philippines by comparing the utilization and influence of data collected in two large-scale, nationally representative surveys that are carried out every 5 years: externally funded Demographic and Health Surveys (DHS), which are implemented by the National Statistics Office in each country; and domestically funded National Nutrition Surveys, which are conducted by the Food and Nutrition Research Institute. The most recent surveys – the 8th DHS and 7th National Nutrition Survey – took place in 2008. Both surveys produced data on a range of health indicators.

Whereas DHS data are systematically archived and transparent access is provided via an internationally accessible repository maintained by the external funder, the Food and Nutrition Research Institute provides only aggregated results for National Nutrition Survey data and there is no official information on how to access microdata.

A PubMed search was conducted using the terms Demographic (and) Health Survey and Philippines and National Nutrition Survey (and) Philippines to identify scientific publications that used data from the surveys. In total, 58 records mentioning a DHS were retrieved; 21 of them directly reported DHS data (10 involved comparisons with other countries and 3 reported trends over time). In contrast, only 14 records were retrieved for National Nutrition Surveys, including one comparative study and one trend analysis.

According to statistics provided by the DHS data repository, between January 2007 and July 2012, 799 distinct users downloaded DHS data from the Philippines on 914 occasions. There were 183 downloads by users within the Philippines and 586 users were from universities. However, an analysis of the authorship of publications that used DHS data showed that external researchers predominated, which indicates that efforts may be needed to build data analysis capacity in developing countries. The wide access to DHS microdata since 1993 was not associated with any resistance from in-country researchers or from the implementing agency (the National Statistical Office). Nor were there any reports that sharing data was associated with misuse or misrepresentation of the data or cultural or ethical issues.

In coming years, the Philippines is likely to witness an increase in scientific output and a better return on investment in research with the development of systematic archiving and wider access to data. Several new initiatives have been launched in the last 3 years with the assistance of the Accelerated Data Program. The National Statistical Office set up a repository for archiving microdata in October 2009, although currently it includes only data from research or surveys conducted by the National Statistical Office. In addition, the National Statistical Coordination Board brought together several agencies to form the Philippine Statistical System, which aims to archive and document microdata using international standards, and recently the National Statistical Office was given the responsibility for maintaining a central repository for archiving microdata.

Raising awareness

Currently, the attitude to data sharing in developing countries in the WHO Western Pacific Region is characterized by a widespread lack of awareness or appreciation of its benefits rather than active resistance.19 In the region, the predominance of external funding for health research and a lack of clarity on the ownership of research outputs has contributed to the indifference observed, especially in low-income countries.

Proactive advocacy is required to ensure that the concept of data sharing becomes a mainstream consideration in national discussions of research management and governance. One way of increasing awareness may be to carry out a systematic assessment of the current situation to demonstrate its inefficiencies and to highlight the loss of valuable scientific data.

Articulating and enforcing policies

Clear and enforceable data archiving and sharing policies are required. Since ensuring the efficiency and effectiveness of health research is a governance issue, it may be appropriate that the lead on data sharing be taken by national health research governance bodies, where they exist, or by elements within ministries of health, such as national health information units, in consultation with all stakeholders in health research. In addition, funders, research institutes and other stakeholders should have their own policies on data sharing, which may provide the opportunity to pilot different approaches while countries prepare national policies.

A policy should state clearly when, where, how and which data should be archived and made available. Heterogeneity in policies, a lack of clarity on ethical considerations and uncertainty about archiving and sharing methods may both frustrate researchers who want to share data and provide loopholes for those who are unwilling to share.

Clear mechanisms for enforcing and monitoring compliance with data-sharing policies should be developed. In the United States, it has been reported that non-compliance with the National Institutes of Health data-sharing policy in cancer research may have been due to a lack of clarity about data-sharing requirements and the absence of enforcement.35 The inability of funding agencies to enforce data-sharing policies has also affected compliance in genetics.29 Partnerships with scientific publishers may be useful for enforcing compliance as researchers report that the data-sharing policies of scientific journals influence their actions more than those of funders because publication is such an important currency in the world of academic research.35

Overcoming researchers’ reluctance

Many researchers have a proprietorial attitude towards data and are concerned that the benefits of data sharing might be outweighed by perceived disadvantages: the loss of academic advantage and independence; the possibility that their work may be misused, misinterpreted or misrepresented; the loss of intellectual property; and an increased workload for administration and data management.14,25,3538 A survey of the first authors of research articles published in the Annals of Internal Medicine in 2008 demonstrated a hesitancy about data sharing: only 4% said they would share data unconditionally, whereas 57% would do so only under author-defined conditions and most would not share data without personal contact with secondary users.36 Hence, a period of exclusive data use for primary researchers – an approach advocated by data-sharing policies internationally – may be required to protect their interests and ensure they receive the appropriate benefit and recognition.39

Another issue affects the attitude of researchers and policy-makers in developing countries to sharing data internationally. Researchers in developing countries may have invested considerable effort in data collection and database generation, but often better-resourced researchers in developed countries analyse and publish data without sufficiently collaborating with or acknowledging the primary researchers. This inequity has been acknowledged by high-level global advocates of data sharing. Strategies are required to prevent this potential inequity and to encourage sharing of both skills and data between countries and regions.10,13,20 However, researchers may be encouraged by the beneficial research collaborations that can result from sharing data.14,35,40 In Thailand, national data sets are made available to international researchers only on the condition that they form skill-sharing and collaborative partnerships with local scientists.20

A realignment of the way in which research achievement is evaluated may also be beneficial. Currently, recognition of individual researchers and institutions and researchers’ advancement depend largely on peer-reviewed publications. This fosters competition and a degree of secrecy among researchers at the expense of collaboration. In this context, sharing data seems counterproductive and, consequently, the creation, curation and utility of databases are given relatively little attention. The joint statement of purpose by research funding organizations acknowledged that the generation of valuable databases deserved better recognition as a research activity.10

Increasing skills and resources

Although generating and maintaining well organized and well-documented databases is part of good research practice,41 researchers in developing countries may have neither the skills nor the resources required.14,40,42 Data management training for researchers and the recruitment of dedicated support staff to document data and manage repositories may be needed.10,43 In addition, data archiving and sharing may also be constrained by the lack of accepted protocols for data formats, security and transfer.19 The introduction of modest minimum standards and the preparation of supporting materials for research databases, which may use different formats, would make data reformatting and interpretation easier for secondary users.

The International Household Survey Network and the Accelerated Data Program, both of which started in 2006, are important initiatives in this area. They are involved in developing standards for data documentation and in building national capacity in microdata preservation, analysis, anonymization and dissemination. In addition, the Accelerated Data Program is helping countries establish national data repositories using international data standards.

Although data archiving and sharing require financial and human resources, this is counterbalanced by the resulting rise in opportunities for collaboration and increased scientific output. The joint statement of purpose acknowledged that funders should underwrite the cost of data sharing.10

Accessing databases

Both developed and developing countries in the WHO Western Pacific Region report limited awareness of the existence of many databases currently available for secondary use and difficulty in locating them, which may decrease the return on research investment in these countries. Existing data archiving and sharing models recognize that some method for locating databases is needed.

Several models for archiving and sharing research databases exist (Box 2). The portal model may be the most effective for encouraging a culture of data sharing because it allows primary researchers to retain involvement with their databases while facilitating database searching, data sharing and collaboration between primary and secondary researchers. It may also minimize the resources required in developing countries. Other models, such as centralized archiving with disseminated expert support and the subject-focused repository model, necessitate greater investment in infrastructure and require coordination. This makes them more difficult to implement in settings lacking resources, capacity and cohesive health research governance.

Box 2. Data archiving and sharing models

The centralized model

A portal provides links to databases stored elsewhere, often in the institutions that created them. The portal relies on searchable metadata about the remote databases and provides users with a one-stop shop for identifying and locating databases. The portal service does not, however, manage access to databases or data transfer: these functions remain in the hands of the primary researchers and their institutions. This model is less costly than others and has the advantage that it can provide links to data archives anywhere in the world. In addition, researchers retain more control over data sharing and reuse. However, since there is no central data repository, data preservation standards cannot be assured and consistent access to data cannot be guaranteed.

Centralized archiving with disseminated expert support

In this model, data archiving and managing access to data are handled by a single repository to which researchers are obliged to submit their databases along with supporting documentation. Rights to the data are either ceded to the repository or terms are agreed to govern limits on access. The United Kingdom Data Archive, which has operated successfully since the late 1960s, uses this model.44 Reported advantages include: cost-effective infrastructure use; the opportunity to train and retain highly skilled data archivists; removal of the burden of data archiving from institutions; and provision of a “one-stop shop” for researchers seeking data for secondary use. However, the centralized system is the least popular model with many researchers, who place a high value on being able to monitor, influence or participate in the secondary use of their data.14,35,39,45 In addition, it has been argued that centralized data archives cannot provide the same expert understanding of research databases as the generating institutions or more specialized repositories.14

Subject-focused repositories

This model involves centralized data archiving and access combined with the support of experts in several participating institutions. These diverse centres of expertise, which may often include the original research groups, support the secondary use of databases but do not make decisions about access. The advantages of this model are broadly similar to those of the fully centralized model, plus the availability of expert support. However, researchers’ concerns about the loss of control remain and are compounded by the requirement to provide ongoing support, which gives them no greater rights. In addition, the cost of coordinating expert support makes the model less cost-effective.14,46

Portal models

Databases are archived in repositories that specialize in specific research areas. Infrastructure use is reasonably cost-effective and the support provided to users benefits from greater familiarity with the data being managed. However, it has been argued that this model can work against interdisciplinary collaboration because boundaries between research areas are not always well defined, making databases harder to locate.47

Controlling access to data and quality control of data use are also important. Concerns have been expressed that unconditional access to databases may result in poor quality secondary studies, which could undermine the reputation of the data sources and primary studies.14,39,48 Just as citations to papers are monitored, some way of monitoring database usage is also needed, both to evaluate the effectiveness of data-sharing polices and to ensure that databases are appropriately referenced and acknowledged.10,39,41,43

Prioritizing data for archiving and sharing

Given the cost and infrastructure implications of data archiving and sharing, a good starting point in the short term could be the development and implementation of data-sharing policies for databases associated with large-scale surveys and registers, since these offer fewer challenges and provide the greatest benefit to health research. Many data sets from large-scale surveys, which are often externally funded and initiated, duplicate effort because separate surveys ask similar questions and the data are subsequently underused. In some countries, such as the Philippines and Viet Nam, aggregate data from national health surveys are published. However, data archiving is fragmented and there is no clear arrangement for accessing microdata.

The establishment of good archiving and data-sharing practices for these databases would enable the host bodies to achieve several objectives. First, valuable data would be made available for national health research. Second, the process of identifying, implementing and evaluating contextually appropriate methods for the wider preservation and sharing of data could begin. Third, the growth of a “data-sharing research culture” would be encouraged. This could increase awareness and understanding of the rationale for, and benefits of, data sharing and pave the way for more wide-ranging polices and strategies that could be extended to academic institutions and investigator-initiated research databases.

Conclusion

Routine data archiving and sharing offers considerable benefits: the effectiveness and efficiency of health research could be increased and science and health-care policy could advance more rapidly. However, if the potential is to be realized equitably, especially in developing countries, advocacy and leadership are needed at both national and regional levels.

The most effective way of achieving the ultimate goal of universal data archiving and sharing may be to adopt a gradual, multistage approach. Increased access to national databases hosted by statutory bodies can pave the way for data sharing by smaller, but nonetheless valuable, individual research databases. Research funders should encourage researchers to maximize the value of their databases and adopt consistent data standards and management strategies when designing new studies.

The infrastructure, skills and standards needed for data archiving and sharing may be best developed through international partnerships and skill sharing, thereby avoiding the duplication of effort. The creation of good databases and good data management should be recognized as legitimate research activities by funders and academic culture alike, and developing countries should start building capacity in data management and analysis.


Acknowledgements

We would like to thank Vicente Belizario, James D Best, Dave Carr, Jaime Montoya, Robin Olds, Kia Reinis, Robert Terry and all participants in the Expert Consultation on Health Research Management, Governance and Data Sharing in the Western Pacific convened by the WHO Regional Office for the Western Pacific in August 2011.

Competing interests:

None declared.

References

Share