Data management: how we’re building capacities in a complex research project

Data management for an interdisciplinary research project, with multiple research partners, working on sensitive issues and in conflict-affected and fragile states is far from straightforward. I’m Data Manager for such a project: Drugs & (dis)order – with twelve partner organisations researching how illicit drug economies can transform into peacetime economies in Afghanistan, Colombia and Myanmar.

This article outlines how we approach data management within the project, with an emphasis on building capacities for good data management. We hope this information will be useful for projects and organisations to develop their data management strategy. You may also wish to read our Data management guidance.

Identify priorities and challenges

Drugs & (dis)order seeks to generate robust evidence for use in policy and practice. Evidence is collected through interviews, life histories, focus group discussions, observations, photographs, surveys, compilation of existing data sources and press information and satellite imagery. Most research is carried out by local field researchers, with information elicited from key informants such as drug users, producers, farmers, traders and the public.

Needless to say, much of the research data collected is sensitive in nature. They may deal with illicit activities and include ‘stories’ or responses whereby people or organisations may be identifiable. That is certainly the case for life history interviews and photographs. Unwanted identity disclosure, or the data getting into the wrong hands, may put interviewees and researchers at risk.

An additional challenge stems from the fact that the customs and requirements for data protection, research ethics and the understanding of consent, vary in the three research countries. And are different from UK standards. Therefore, careful management and security of the research data generated in the project, from the moment they are ‘captured’, is important.

Finally, the project donor – UKRI-ESRC under the umbrella of the Global Challenges Research Fund (GCRF) – has the expectation that the generated data will be managed and preserved for the long-term, and made available for future use.

Understand how data are managed

Strengthening capacity amongst partners and field researchers is one of the key pillars for GCRF projects. In Drugs & (dis)order we pay much attention to strengthening capacity for data management.

In practice, this means that as the project’s research data manager I work with our partners in Afghanistan, Colombia, Myanmar and the UK on good data management practices.

Understanding current data management practices in each organisation is the first step. This starts with discussing which data they collect and use in their research activities, and understanding the different steps the data may take. Who collects the data? In what format? How are data brought back from the field to the office? What transformations take place and which people, other than the field researchers, may be involved in this?

For example, interviews may be audio-recorded on a digital recorder, or notes taken during an interview. Recorded interviews may be brought back to the office on the recorder, or copied onto a laptop. Interviews may then be transcribed in local languages and translated to English. The researchers may do this, or interviews may be sent to external transcribers and translators. Surveys can be captured on paper questionnaire forms, with responses later entered into a digital database. Or surveys may be captured directly in digital format via an online form.

Each data transfer or transformation that takes place, and each person involved in this data ‘pipeline’ can have an influence on the quality of the data, and will require certain data security measures.

Discussing all the steps in the research data process in detail helps to develop practical data management guidance. For example, how to store and transfer data on different devices to use unique codes as identifiers for interviewees, to anonymise data, to transcribe it, and so on. It also helps to identify the security measures that need to be put in place.

Security first

The safe and secure storage, transfer and handling of all collected research data has been a main priority for Drugs & (dis)order, especially since some of the local partners have a fairly basic IT infrastructure and no dedicated IT staff. Researchers may carry out all work on laptops, with no networked server available.

An audit was done of all the laptops (and PCs) used within the project for working with sensitive data. The audit recorded: their operating system, version and edition; the antivirus software used on the machine; and whether or not encryption is in place.

An encryption strategy was developed for laptops, whereby either the entire drive (disk) of the laptop was encrypted, or an encrypted ‘container’ was set up on the laptop in which sensitive files can be placed. Practical guidance for encryption was sent to all researchers, explaining which tools can be used and how encryption is implemented in practice.

Glasscubes was already used by the project as a collaborative workspace to share and exchange reports and information. Since it is an ISO27001 ( Information Security Management Systems) certified tool and has Cyber Essentials certification, it is now also used to share data files with colleagues on the project and to transfer data files between partners securely.

Hands-on data management

Visits to the partner organisations in Afghanistan and Myanmar helped to develop data management practices further.

Working directly with the researchers and staff members that handle and manage the data at different stages of the research process, I could observe how this is done in practice and where there may be scope for improvement.

It also means that the guidance can be put into practice for the specific data an organisation is working with, by the individuals responsible for data management, and for the software packages they are using. It is important to fit good data management around the software packages and tools people are used to working with.

With the data manager and researchers of the Afghanistan Research and Evaluation Unit, I worked on implementing standardised data entry and standard coding for an Excel-based database of thousands of development projects and investments, as well as the use of formulas for analysis of the data.

With the Organization for Sustainable Development and Research in Afghanistan, I worked on the best ways to collect questionnaire and interview data in an anonymous way, using unique respondent and surveyor codes, and to remove and code sensitive information in the database where responses are submitted.

With Kachinland Research Centre (KRC) and Shan Herald Agency for News (SHAN) in Myanmar, I worked on an optimal system for uniquely coding interviews carried out by multiple field researchers in different areas, and tracking the audio recordings, field notes, summaries, transcripts and translations of each interview through the use of these codes in file names and in data sheets that record the demographic and organisational information of each interview.

Country visits were also used to provide formal data management training to partner organisations, on topics such as organising data, file naming, version control, data storage and transfer, data anonymisation, quality control, documentation and metadata, transcription and tools and coding qualitative data.

Data management guidance

All the data management measures implemented by liaising with the partners are also written out into practical Data management guidance that will continue to develop as the project progresses.

In months to come the focus will shift to further requirements, such as selecting a tool to help organise, manage and tag digital images; investigating solutions for a repository system for partners to archive their data files; and developing capacity for coding qualitative data.