Data Management and Storage

Data management and storage are integral and essential aspects of scientific enterprise. The principal investigator is responsible for the development and implementation of acceptable principles of data management and storage. The principal investigator is likewise responsible for assuring that all individuals associated with the research project adhere scrupulously to these principles. Acceptable principles of data management and storage must insure preservation of original data and must provide for the generation and maintenance of a clear, unambiguous, and comprehensible trail describing the scientific database, its construction, and such manipulations as may have been performed.

Traditionally, the matter of data storage and management has been considered to fall entirely within the province of the individual investigator. According to this view, good record keeping was a matter of personal concern for the principles of rigorous experimentation. However, changes in the policies of federal granting institutions have emphasized an institutional responsibility for data that were generated from government‑funded projects. Federal funding agencies now assert that since the grant was made to UTMB, UTMB is responsible for the data. One effect of this policy is that the entire community of scholars may be compromised if one of its members, through ignorance, carelessness, or malfeasance, fails to adhere to sound principles of data management.

The Committee on Data Management and Storage, serving ad hoc, has attempted to summarize a number of critical questions that pertain to acceptable principles of data management and storage.  The questions are addressed in the body of this report, given below.  The Committee has attempted to reconcile the traditional prerogatives of the principal investigator, the reasonable rights of collaborators and co‑investigators, and UTMB’s responsibility to conform to the policies of the granting agencies.

The Federal Acquisition Regulation, which pertains to contracts and grants awarded by the federal government, defines data as recorded information regardless of form of the media on which it may be recorded.  This definition applies to “any laboratory worksheets, memoranda, notes or exact copies thereof, that are the result(s) of original observations and activities of a study and are necessary for the reconstruction and evaluation of the report of that study.” The Public Health Service broadens this definition to include “Unique Research Resources such as synthetic compounds, organisms, cell lines, viruses, cell products, cloned DNA, as well as DNA sequence, mapping information, crystallographic coordinates, plants, animals and spectroscope data.

It is the policy of the federal funding agencies that ownership of the data generated from federally-funded research resides with the university to which the grant was made.  Some private agencies have similar policies, whereas other granting agencies may stipulate ownership of data in the contracts that govern the relevant research activities (e.g. contracts from drug companies).  In those cases in which UTMB is the contractual owner of the data, UTMB designates the principal investigator as custodian of those data.  This assignment is made with the understanding that the principal investigator, acting in the capacity of custodian, is responsible for providing access to the data and for adequate data storage and maintenance. The principal investigator maintains the data in trust and must do so in compliance with UTMB policies on data management and storage, as described below.

Custody of the data and other unique research resources (as defined above) resides with the principal investigator who is primarily responsible for assuring access to the data and for determining how the data are manipulated, how the results are recorded, how these records are maintained and preserved, and ultimately how the results are interpreted.  Other individuals and institutions also have rights of access to data that pertain to specific experiments or projects.

  • The individual who collected the data (if other than the principal investigator) has right of access to the extent that the person who collected the data also provided significant intellectual input (i.e., individuals who author publication of data) into experimental design and execution and interpretation of the results. Furthermore, this individual has certain rights that apply to the extent to which the data are made available to third parties, including the public. Data should not be made public without the consent of all individuals who provided significant intellectual input.  The choice of media for presentation (e.g., journals, scientific meetings) ultimately resides with the principal investigator, but this choice should be made in consultation with collaborators.
  • UTMB has a right of access to data.  This right of access resides specifically with the Dean of the faculty member’s school and such individuals as he or she may duly designate to investigate allegations of impropriety or mismanagement of data. The exercise of UTMB's right of access is restricted to circumstances in which there is probable cause to suspect mismanagement or misconduct.
  • The National Institutes of Health (and all other federal funding agencies) claims right of access to all data that were collected during the course of any project that received partial or total support from that agency.
  • The public has rights of access to the data.  This right derives from the fact that this is a public institution and all research carried out at UTMB is directly or indirectly supported by public funds.  The principal investigator, the individuals who collected the data, and even the University may impose certain restrictions upon public access to the data. Such restrictions are necessary to prevent dissemination of the results of incomplete studies, which might mislead the public or discredit UTMB or the researchers who generated the data. The principal investigator and his or her agents also have the right to limit access to data that are being prepared for publication. However, after some period of time, all data should ordinarily become freely available to anyone who requests it. Exceptions to this general policy may arise when patient data, or other data on human subjects is involved, as discussed below.

In special cases, access to data relating to human subjects may be limited by Public Health Service confidentiality agreements. Clinical data also represent a special circumstance. If clinical data are collected explicitly for a research study, then the policies described above apply (subject to PHS and IRB oversight of the use of clinical data). If clinical data are collected as part of a patient's care, the data belong to UTMB and should be available to any researcher at UTMB (subject to IRB considerations and such restrictions as may be necessary to ensure confidentiality). This policy should hold for charts, lab data, pathology, radiologic images, etc.

As custodian of the data, the principal investigator has a compelling obligation to ensure that adequate data maintenance and storage procedures are developed for the laboratory. The principal investigator is responsible to ascertain that all who are involved in data collection are familiar with these procedures.  Finally, the principal investigator must ascertain that those who collect the data adhere to the established laboratory procedures for data maintenance and storage.  These obligations go beyond any concern for accountability or issues of data mismanagement or misrepresentation. All researchers have an unavoidable professional responsibility to inculcate their staff, students, and trainees with the highest standards of data recording. Indeed, the standards for recording of data should be recognized as equal in importance to the standards that apply to acquisition of data.

The removal of original data to another institution could, in principle, infringe upon the rights of individuals, other than the principal investigator, who may have vested interests in such data (e.g. the principal investigator’s collaborators). Neither does the removal of original data abrogate UTMB’s responsibilities to federal granting agencies, which hold UTMB responsible for maintenance and storage of data that were obtained as a result of federally‑funded experiments carried out by its faculty. Consequently, the removal of original data must be undertaken with a view toward protecting the rights of all individuals concerned, including the principal investigator, his or her collaborators, and UTMB.

A principal investigator who intends to remove original data should inform the Dean of their school of this intent.  The principal investigator would normally be expected to retain custody of the data, on behalf of UTMB. As custodian, the principal investigator will remain bound by UTMB policies for storage and maintenance of data generated at UTMB, even though the investigator is no longer affiliated with this institution. Under certain circumstances, UTMB may exercise the right to retain the original data or to transfer custodianship of the original data to one or more of the principal investigator’s collaborators who remain affiliated with UTMB.  Under such circumstances, the principal investigator may remove a copy of the original data. If the original data consist of specimens, tissue samples, histological slides, cell products, or other material that cannot be duplicated, then the disposition of such material will be determined by negotiation. The negotiation should involve the Dean of the faculty member’s school, the principal investigator, his or her collaborators, and any other individuals who may have legitimate rights that pertain to the data.  Under some circumstances, it may be appropriate to involve the chairmen of concerned departments in the negotiations. As stated above, the objective of any such negotiation should be to insure that the rights of all interested parties are protected.

The final outcome of the negotiation must be approved by the Dean of the school involved.  Under certain circumstances the collection, maintenance, and disposition of data are defined by contractual agreement. The terms and conditions of such contracts will remain in force if the principal investigator moves, unless the terms of the contract are altered by negotiation. For example, confidential data that are removed from UTMB remain bound by the IRB protocol under which the data were collected.

The maintenance of a complete record of every experiment undertaken is central to the experimental process.  This record should be maintained in the form of a laboratory notebook (or its equivalent). The notebook should include the objectives, experimental components, original data, and such conclusions as pertain to each experiment. Each experiment should be identifiable by the date on which it was conducted (or begun, as the case may be), and the individual who was primarily responsible for data collection should be clearly identified. Any extraordinary circumstances that may bear upon interpretation of the data should be noted.  In those cases in which the results may be derived by manipulation of the original data, the rationale for that manipulation (e.g., mathematical equation, the basis for selecting or excluding certain data from the analysis) should be clearly recorded.

The experimental records should be kept in a form that will allow interpretation of the results by a specialist in the field of study to which the data pertain. There are many different kinds of data, and the precise form of preservation must be appropriate to the manner in which the results were generated.

Circumstances that may bear upon interpretation of the data should be noted. In those cases in which the results may results, histological data, and many diverse types of data will not conform to a single standard for recording and preservation. The standard to which the investigator should conform is that, given the data and the written record of the experiment, a knowledgeable colleague could reconstruct the process by which the experiment was carried out, the data collected, and the results interpreted without any additional information. This requires that the primary data should be retained, labeled, and cross referenced to the written experimental record in such manner as to permit unassisted analysis of the results.

Manipulation of data is inherent in experimentation. Almost all data are manipulated as a part of the process that gives rise to the experimental results. A simple example of data manipulation is calculation of a mean of a number of quantitative observations.  In expressing one's results as a mean, one is reporting a derivative that was obtained by manipulation of the original data. Any process that intervenes between the original data and the conclusions is a form of data manipulation.  In this sense, any manipulation of the original data is allowable as long as that manipulation is clearly described in the experimental record (laboratory notebook or its equivalent). Stated in other terms, an investigator may do anything with the data as long as the original data are preserved and the experimental record clearly documents what was done by way of manipulating those data.. Equally stringent criteria for describing the way in which data were manipulated apply to public dissemination of experimental results.

The length of storage of laboratory notebooks or their equivalent depends upon many factors including whether a) there are ongoing projects that pertain to the data, b) the data involve patients or other human subjects, c) the data pertain to sensitive issues such as sexual behavior, and d) the results pertain to current controversial research issues. Data that pertain to non‑sensitive, non‑controversial issues and not to ongoing research should be preserved for a minimum of five years after the date of collection if the data are not published. Published data should be maintained for five years after the date of publication. Other types of data may require storage for longer periods of time.  For example, there are special requirements for storage of data derived from clinical trials of drugs and devices.