Survey: Research data in the Oriental Studies
(Not barrier-free)
Concept
Research data form an essential basis for scientific work. If your research results are based on data that is no longer available or that only rest on your storage media, you should be interested in the topic.
To ensure the traceability and quality of scientific work and for subsequent use in other projects, research data should be processed and made available on a long-term basis. Not only scientific, but also legal, technical and financial aspects play a role here.
The DFG is therefore calling on researchers from all disciplines to develop a concept for the appropriate handling of research data at the project planning and proposal submission stage.
As a DFG infrastructure project, we at the Specialised Information Service Middle East would like to support our specialist community in research data management. To this end, we would like to know which data are generated in the context of Middle Eastern studies, what their special features are and how they are currently handled.
Against the backdrop of the establishing National research data infrastructure (NFDI), the Oriental sciences are increasingly called upon to point out the specifics of their research data before the technical framework conditions are established by neighbouring disciplines with no experience of the particularities of Oriental studies. The FID Middle East would like to initiate a dialogue with its community and invites you to participate with your expertise.
Results of the survey
Here you will find the evaluation of our 2018 survey on the use of research data in the orientation science subjects.
We would like to thank all participants. A solid basis can be for a functioning research data management system.
The answers to the individual questions are analysed below and their consequences for further procedures are presented.
What kind of data forms the basis of your scientific work?
The first question was directed at the research bases used in the department. As was to be expected, a large part of the research was based on non-Latin literature, manuscripts and other text material. However, websites and photos were also cited as important sources.
Support with non-Western literature is an important concern for the FID Middle East, so we see this as confirmation of our cataloguing practice. For the time being, there is no need for special measures when dealing with research data.
What kind of data types do you generate in your projects?
The second question focussed on the type of data generated. Based on the first question, we expected text data in particular, but in addition to this, there is a fairly high proportion of numerical data, photos, audio or video recordings and personal data.
Personal data requires special care when handling research data, which is why we will pay particular attention to this point in order to develop a good advisory service. Special circumstances must also be considered for the archiving of databases, for which we are making appropriate preparations.
Which data formats are used?
The third question focussed on the data formats used in your research data. In addition to an expected wider distribution, a focus on Word and Excel files is noticeable. This is partly problematic, as both are proprietary data formats that are not well suited for long-term archiving. The use of JPG files is similar, but these are solvable tasks.
The use of proprietary file formats - especially for Office files - is widespread and has therefore already been clarified in the field of research data. A general recommendation is to publish this data in an open data format such as RTF or CSV in addition to the original data format, so that long-term archiving can be ensured. JPG files should be saved converted to uncompressed TIFF files in a similar way, as the compressed JPG format can lead to data loss. The numerous open formats mentioned can be considered uncritical.
Where is your research data stored?
The answers to this question were more or less in line with expectations, but are clearly difficult. In particular, the storage of research data - i.e. data that is considered valuable - on private devices or storage media must be assessed as extremely critical, as data security is not guaranteed or only insufficiently guaranteed here.
Storage on work computers is also only better to a limited extent. So there is a clear need here. We see a very clear need for information. Research data is categorised as valuable data worthy of protection, which is why it should be stored as securely as possible and archived for the long term. This means that expanding MENAdoc as a repository for the secure storage of research data is a good goal for further development.
What are the specifics of research data in the field of orientational science?
This question dealt with very subject-specific features of research data in the orientational science subjects. The distribution of answers here is quite homogeneous, which was to be expected.
These difficulties can be largely avoided or simplified through consistent documentation and the use of UTF-8 coding and, as a positive side effect, also ensure better data transparency. For this reason, we will continue to expand our range of information here.
What are the legal/political challenges of research data in the Oriental Studies?
This question takes us into an area that can pose hurdles, especially in Oriental Studies. The answers here refer to difficult political and legal situations, which must of course be resolved as legally securely as possible. The biggest issues here were the unstable political situation (71%) and the risk of losing sources (65%). There is a (partial) connection between the two, so this is where we want to start with our strategy.
The risk of losing sources poses a challenge in the field of research data management. It can lead to data that would not normally be considered worthy of preservation suddenly becoming highly relevant. The transcript of an Ottoman manuscript may not seem significant at first, as any scholar with access to the manuscript could produce one themselves within a short time - however, if this manuscript is located in an unstable region where the loss of the original poses a foreseeable risk, the transcript becomes important information and a digitised copy becomes a valuable source.
The FID is planning to expand its advisory services and make available a solution for the institutional storage of research data without open access if legal conditions do not allow this - so that the data is stored in a secure location and preserved for posterity.
The handling of personal data also requires special attention, but methods of anonymisation and pseudonymisation can be used here to secure the data itself and the interviewees and still keep it usable.
What reasons could you have for not publishing your research data?
Of course, the question arises: if research data is generally considered important, what are the actual reasons against publication? The answers point to very specific, but above all solvable problems that allow us as FID to take a clear approach.
The most frequently mentioned answers are interrelated: These are problems with the time involved with FDM, lack of expertise and budget. With regard to the budget, this is of course an item that must be taken into account in future analyses. However, the FID can advise you right from the project idea and support you in building up the expertise, which can also reduce the time required.
Fortunately, the fear of plagiarism when publishing research data is unfounded: Studies show that the detection of plagiarism is considerably simplified by Open Access - this means that your research data is even better protected against plagiarisation after publication.
We are also happy to advise you on questions of copyright and the handling of personal rights and, if necessary, also offer models for storing research data that do not have to be made open access - centralised storage is important so that the data can be preserved in the long term.
Of course, in some cases there may be no research data that is worthy of publication - it is important that this is not simply assumed, but can be determined in a review of the planned project or publication as part of a consultation.
What are your reasons for publishing your research data?
The reverse question to the previous one asked you about the reasons for publication. We were pleased to see that it is not the requirements of the funding organisations that are seen as the driving force here, but that the positive aspects of publishing research data clearly outweigh the negative ones.
These responses are a central part of our argumentation when promoting research data management in the department. The survey showed that ensuring the quality and traceability of the data and the associated citation capability and long-term archiving of the data are top priorities for them - our plan is to move to a new infrastructure with MENAdoc in the course of 2019, with which we can do exactly this for you.
You also see the improvement of exchange and co-operation between scientists and institutions as very important - a way of thinking that we at the FID naturally want to support!
Where are you unsure and would like advice?
In our last question, we wanted to know in which areas it would make sense to expand counselling services. As you can see from the diagram itself, there is a fairly homogeneous distribution here.
The FID is planning to set up clear pages for initial information as well as a counselling service for research data and support in the creation of data management plans. We will use the responses from this survey as a basis and continue to develop our services.