UKOLN Informatics Research Group » Postgraduate researchers

Updated postgraduate data management planning guidance

Jez Cope — Wed, 10 Jul 2013 10:04:06 +0000

Since our original post about data management planning for postgraduate researchers we’ve updated the template a couple of times. We’ve also created a guidance document to accompany it, which will help researchers develop a data management plan even if they haven’t been able to attend a face-to-face workshop.

We’ve started using the template as part of our main data management workshop for PGRs, and we’ve also had a group of Doctoral Training Centre students complete DMPs using it as well. Feedback from both groups has been very positive.

You can access both documents from our institutional repository:

Research data management training take 2

Jez Cope — Tue, 24 Jul 2012 13:07:59 +0000

On Thursday 28 June, Cathy and I ran the latest of our postgraduate workshops on research data management.

The session was structured similarly to our last workshop, though without the extended hands-on section on data management planning. The loss of this section was down to time pressures (we had an hour less this time).

We started by showing a series of statements about research data management, such as "I am satisfied that my data is safe", and asking the participants to rate (anonymously using clickers) how much they agreed or disagreed with each statement. The answers gave us an opportunity to start some discussions and get a picture of what the current level of knowledge was.

Cathy then gave a more formal presentation (slides available here) covering the major aspects of research data management with me pitching in on more technical bits.

We finished up by revisiting the statements from the start of the session to see how opinions had changed, and handing out some leaflets from the DCC.

Feedback

Feedback from the attendees was overwhelmingly positive:

95% of respondents were satisfied with the course;
91% would recommend the course to others;
95% found it relevant to their needs.

Some interesting answers to the question "What was most useful?" include:

"Need to plan for good management"
"Reminding me of how easy it would be to lose my data and how I should look after it better"
"Notes on data cycle, info on making data public"
"Knowing where to go for help, helping to focus my data management plan"

Actions that participants said they would take as a result of the workshop fell mostly into two categories:

"Prepare a management plan"; and
"Back up my data"

Although we don’t know whether anyone carried out their actions, this was very encouraging to read, as our message had clearly got through.

Room for improvement

One participant felt that there was a bias towards science. This is understandable, since both Cathy and I have science backgrounds, and science/engineering is the main focus of Research360, but we’ll see what we can do to rectify this.

Another comment referred to "lack of more contemporary ways of storing data, e.g. Dropbox". We’d intentionally steered clear of Dropbox, as the official University stance on cloud storage is still being decided. Whatever that decision turns out to be, we’ll need to deal with Dropbox and other cloud tools.

There was a request for more group discussion, and I think this would be a valuable addition, so we’ll try to make the next session a bit more interactive. Group discussions could usefully focus on differences and similarities between disciplines, for example, as I think people in different subjects would have quite a lot of pre-existing knowledge that they could share.

I’d also like to give the participants something concrete to do after they’ve left the workshop. This could be something specific like "write a data management plan", but I think there would be more likelihood of these actions being carried out if the participants take some ownership. One way to achieve this would be to wrap up the session with an action-planning section and ask each student to define their own data management goal or goals.

Reaching out to staff

Although the session was also open to research staff, only one staff member registered to attend and in the end they didn’t turn up.

Helping busy research staff to gain new skills is a difficult task. They have many demands on their time, pulling in many different directions, and many already work far more than their contracted hours.

We aim to deal with this in a number of ways:

Providing an e-learning module which researchers can study in their own time at their own pace;
Developing a website with concise and practical guidance, structured around specific tasks and situations;
Reaching out in a variety of different ways, including:
- A single point of contact email address for all inquiries relating to research data — this is tied into the university request tracker system, so an individual requests can be passed on to the team best placed to help;
- Presentations to Deans and Heads of Department by Professor Matthew Davidson, chair of the Research Data Steering Group and Associate Dean (Research) for the Faculty of Science, as well as an active researcher himself;

We’re also in discussions with the professional services around the University to understand (and help them understand) how research data management fits into their roles and how we can provide the support they need.

View slides from this session

Postgraduate DMP template first draft

Jez Cope — Mon, 19 Mar 2012 10:46:50 +0000

A lot of people have asked for this to be made available, so here it is: Data Management Plan for PGRs v0.2. It’s also available as a PDF.

This template is licensed under a Creative Commons Attribution 3.0 Unported License.

We welcome comments and suggestions for improvements, and we’d love to hear from anyone who finds it useful. I’ll be updating it soon based on feedback from our students.

Research Data Management 101 — Data Management Planning

Jez Cope — Fri, 02 Mar 2012 08:22:37 +0000

A few weeks ago, we got together some of our students from the Doctoral Training Centre in Sustainable Chemical Technologies to run a pilot training session on data management. As part of that, we asked them to trial a selection of data management plan (DMP) templates.

We split the students into smaller groups, and assigned each group a different template. We then allowed them about an hour to complete as much as they could, while hovering around to answer questions and make note of the discussions the students were having. After this, we then used an audience response system (ARS or “clickers”) to gather feedback on how useful the templates were, using the votes from the clickers as a starting point for discussion.

The templates

DMPonline

We used the “GenInst” template in DMPonline as an example of an institutional template. As this template is aimed at Principle Investigators, students were told to skip any questions that didn’t seem relevant.

The students were immediately put off by the amount of detail they were asked to input, though on a positive note, they definitely felt that this was the most comprehensive template! None of the students using this template got anywhere near to finishing it.

The students reported that very little of what they were being asked felt relevant to them, and that for at least some of the questions it was difficult to understand what they were being asked for.

DataTrain post-graduate DMP form

This is a single-page template developed as part of the DataTrain project and now available via the Archaeology Data Service.

This was found to be the quickest and easiest template to fill in, and all of the students attempting this one completed it fully. However, not all of the students felt that it was sufficiently comprehensive.

This view is borne out by a review of the completed plans: it seems that the questions as phrased don’t bring out issues like backup and security.

Expanded post-graduate DMP form

This was developed specifically for this session as an expanded version of the DataTrain form, as an attempt to provide more structure and elicit more detailed answers.

Although this form took longer to complete, most of the students managed to finish it, and felt that it was comprehensive enough.

However not all of the students found it completely relevant, and some found some of the questions difficult to understand — both of these could probably be improved by some rephrasing.

Twenty questions about your data

This is a set of questions devised by David Shotton of Oxford University. They are arranged under the headings What, Where, How, When, Who and Why, and include examples of possible responses to each question.

These questions were considered to be mostly relevant and easy to understand, and the students had no problem completing them in the time available. The example responses made it easier to understand what was required for each question.

The only real problem was in the ordering of the questions. Because they were arranged under What, Where, etc., the students found it difficult at times to see how the questions related to each other. Perhaps because of this, the students were undecided as to whether it was comprehensive enough.

Update: Now available on the web — David Shotton’s Twenty Questions for Research Data (now restructured based on this feedback)

Discussion

Getting the template right

The number of students trying each template was very small (2–3), so it’s difficult to draw concrete conclusions at this stage, but they have given us some hints as to how to proceed.

The DMPonline approach is attractive, because it is easy to access (being web-based) and comprehensive (mapping directly onto the DCC checklist). However, there isn’t currently a template which seems appropriate for PG students — far too much detail is required, and some of the questions that are relevant are phrased in terms that research students don’t really understand.

DMPonline is specifically designed to allow custom templates to be added easily, so it should be possible to greatly improve this situation with some work. In particular, it will be necessary to either reword some of the questions or provide some detailed guidance to clarify what each one is asking for — it became apparent from some of the discussion that part of the perceived irrelevance of some questions came from difficulties understanding them.

It would be useful to be able to not only specify which questions are included in a DMPonline template, but also what order the questions appear in so that they better mirror the research workflow and relate to the aspects of data management that students will already have some experience of.

The students fared better with the shorter templates, managing for the most part to complete them. The DataTrain template, seems a good option to fill in as part of an introductory DMP training session, but needs to be augmented with further prompts, though these could perhaps be administered to the students later as their understanding of their project improves.

The structure of the expanded DMP form appeared to aid students in working through all of the questions, with the resulting plans being fairly comprehensive, while the style of the questions in the Twenty Questions template, with example responses given, made it very easy to understand. These strengths could be usefully combined to produce a better template.

Action planning

One thing common to all of these tools is that they focus on recording facts about the researcher’s data. This is valuable, but doesn’t necessarily lead to action — too often, a data management plan is seen as something that is written at the start of a project then filed away.

For PG student training, we are more concerned with students developing the skills for data management rather than having a comprehensive data management audit for a project. It therefore seems that an action planning approach might be worth trying, along these lines:

Where am I now?
Where do I need to be?
What do I need to do to get there?

with the emphasis placed more on the third point than the first two. This will lead to a plan which is much easier to execute, and hopefully encourage the student to review it periodically by making it easy to measure progress against the plan.

Research Data Management 101 — Intro & definitions

Jez Cope — Fri, 24 Feb 2012 10:12:36 +0000

On Wednesday 15 February, we ran our first workshop/focus group with PhD students from the Doctoral Training Centre for Sustainable Chemical Technologies. This is the first of a series of posts summarising the outcomes of that event.

Overview

We had three aims for this session:

To introduce the participants to data management planning and have them start writing their own data management plan (DMP);
To better understand their current knowledge so that we can plan future training activities;
To get feedback on what DMP template would be appropriate for PGR students.

We ran the session with 10 students in the 2010 cohort, who all started in October 2010 and are currently in the first year of their PhD proper, having completed an MRes in 2011. We also invited the 2009 cohort (in their second PhD year), of whom 3 volunteered.

The session consisted of an introductory presentation, given by Professor Matthew Davidson, followed by a hands-on session during which the students worked through a DMP template with support from myself and Cathy Pink. Our colleagues Kara Jones and Katy Jordan from the library were also present, and made notes on what was discussed.

Data management definitions

Early on in the session, we split the students up into groups of 2–3 and asked them to discuss what they understood by a handful of common data management terms. Here’s what they came up with:

Data: There was general consensus (as you might expect from a single-discipline group) that data is information gathered directly by experiment, survey, etc. for the purposes of research. It became clear that with more thought, ‘data’ isn’t a hard-edged concept — processing data can produce new data, metadata is also data and so perhaps are the samples from which experimental data were derived.
Metadata: Metadata was described as data behind the data you want to use that gives context and background details. It was noted that this is distinct from the data itself. Chemistry is relatively rare in having a strong history of using metadata in the context of depositing crystallographic data.
Secure storage: The students immediately identified the two sides of security: both guaranteeing that data is (and remains) accessible to those who create and use it, and that it cannot be accessed without permission. It was generally agreed that your required level of security depends on how sensitive your data is.
Access: The most important aspect was seen as ensuring access for the researchers who created the data. Raw data was perceived as not being of much interest to third parties, but a need to better preserve and share experimental protocols was identified.
Intellectual property: It was generally accepted that, for PGR students, the university owns their data and the intellectual property therein. We’re hoping to clarify this with our legal team soon, as Bath is unusual in leaving ownership of “scholarly outputs” with the originators — it would be useful to know whether we define data as a scholarly output now. Good data management practice was identified as one way to create a ‘paper trail’ to prove ownership of ideas in the event of a patent dispute.

Thoughts

Katy Jordan made an interesting comment in her notes:

“Listening in, it struck me quite forcibly that this session needs academics from the relevant department(s) to lead it. A good level of familiarity with the field, its processes, the department itself, and the way research is carried out, is required to make the session meaningful for the students.”

It’s occurred to me (and others) before that although the core skills of research data management are mostly discipline-independent, there is a strong need to provide “discipline-flavoured” training sessions, with relevant examples and expertise to ensure that the participants can relate to the content.

We’ll be following up soon with more posts on the later part of the session, particularly a discussion of the DMP templates the students tried. Watch this space!

Doctoral Training Centres as catalysts for research data management

Jez Cope — Thu, 15 Dec 2011 16:49:03 +0000

As I’ve already mentioned, the focal point for a lot of the pilot work in Research360 is the Doctoral Training Centre (DTC) in Sustainable Chemical Technologies. This presents us with some interesting opportunities and challenges, and is well worth investigating: DTCs (or CDTs depending on the research council) are fast becoming the norm for PhD funding in these straitened times.

A Doctoral Training Centre is typically formed by the award of a large grant to a university or consortium to strengthen a particular area of research excellence. This funding covers the cost of training a large number of PhD students in cohorts over a period of several years including the infrastructure and administration requirements. The expectation is that other funding sources will be found and the centre will become self-sustaining.

As an example, our DTC in SCT is funded by the EPSRC for 5 cohorts of 10-15 students. It’s a 4-year integrated PhD course, which begins with an MRes year followed by 3 years of more conventional PhD research. The MRes year involves a lot of generic and specialist training alongside two short research projects, one of which may lead into the main PhD project.

In line with the current culture in science, our research is strongly interdisciplinary, involving both chemists and chemical engineers, along with biologists, mathematicians, mechanical engineers and others.

Funding doctoral training in this way has a number of benefits. Because we recruit in cohorts which all start together at the beginning of the academic year, we have groups of students who all work and train together. Not only is this more efficient in terms of the training courses that we provide, it also means that the students are able to support each other through the challenging transition from undergraduate to professional scientist.

For Research360, this is great, as we’re able to provide RDM training to a whole cohort together, and they can (and will, in my experience) support each other as they develop data management plans at the same point in the PhD process.

Because Doctoral Training Centres are well funded and provide a consistent source of high quality students to work on projects, academics are motivated to engage with them. And because DTCs are typically highly interdisciplinary, these academics and their collaborators will be spread right across the institution, or a whole consortium. This gives us many opportunities to roll out good data management practice organically institution-wide. If our researchers in the centre routinely practice good RDM, they will expect it of their collaborators elsewhere in the University.

Furthermore much of the administration, such as project proposals and transfer reports, still gets processed through the graduate schools in the Faculties of Science and Engineering & Design. By having our students include data management plans in this documentation, we can get it onto the radar of the graduate schools from below as well as above.

This approach is not without its problems however. Many of the features which make the DTC model work so well also mean that they are generally not representative of the institution as a whole. There are still many PhD students funded by specific research projects or institutional studentships who may have begun their studies at any point in the academic year, and training developed for DTC students may not be as appropriate for these.

Whatever the advantages and disadvantages, I look forward to exploring the potential of using our doctoral training centre as a catalyst to improve data management here at the University of Bath. It would be great to hear from anyone facing similar issues.