PRINCIPLES OF DATA
MANAGEMENT AND STORAGE
Data management and storage
are integral and essential aspects of scientific enterprise. The principal
investigator is responsible for the development and implementation of
acceptable principles of data management and storage. The principal
investigator is likewise responsible for assuring that all individuals
associated with the research project adhere scrupulously to these principles.
Acceptable principles of data management and storage must insure preservation of
original data and must provide for the generation and maintenance of a clear,
unambiguous, and comprehensible trail describing the scientific database, its
construction, and such manipulations as may have been performed.
Traditionally, the matter of
data storage and management has been considered to fall entirely within the
province of the individual investigator. According to this view, good record
keeping was a matter of personal concern for the principles of rigorous
experimentation. However, changes in the policies of federal granting
institutions have emphasized an institutional responsibility for data that were
generated from government‑funded projects. Federal funding agencies now
assert that since the grant was made to UTMB, UTMB is responsible for the data.
One effect of this policy is that the entire community of scholars may be
compromised if one of its members, through ignorance, carelessness, or
malfeasance, fails to adhere to sound principles of data management.
The Committee on Data
Management and Storage, serving ad hoc, has attempted to summarize a number of
critical questions that pertain to acceptable principles of data management and
storage. The questions are addressed in
the body of this report, given below.
The Committee has attempted to reconcile the traditional prerogatives of
the principal investigator, the reasonable rights of collaborators and co‑investigators,
and UTMB’s responsibility to conform to the policies of the granting agencies.
What are data?
The Federal Acquisition
Regulation, which pertains to contracts and grants awarded by the federal
government, defines data as recorded information regardless of form of the
media on which it may be recorded. This definition applies to “any laboratory worksheets,
memoranda, notes or exact copies thereof, that are the result(s) of original
observations and activities of a study and are necessary for the reconstruction
and evaluation of the report of that study.” The Public Health Service broadens
this definition to include “Unique Research Resources such as synthetic
compounds, organisms, cell lines, viruses, cell products, cloned DNA, as well
as DNA sequence, mapping information, crystallographic coordinates, plants,
animals and spectroscope data.
Who owns the data?
It is the policy of the
federal funding agencies that ownership of the data generated from
federally-funded research resides with the university to which the grant was
made. Some private agencies have
similar policies, whereas other granting agencies may stipulate ownership of
data in the contracts that govern the relevant research activities (e.g.
contracts from drug companies). In
those cases in which UTMB is the contractual owner of the data, UTMB designates
the principal investigator as custodian of those data. This assignment is made with the
understanding that the principal investigator, acting in the capacity of
custodian, is responsible for providing access to the data and for adequate
data storage and maintenance. The principal investigator maintains the data in
trust and must do so in compliance with UTMB policies on data management and
storage, as described below.
Who controls access to the data?
Custody of the data and
other unique research resources (as defined above) resides with the principal
investigator who is primarily responsible for assuring access to the data and
for determining how the data are manipulated, how the results are recorded, how
these records are maintained and preserved, and ultimately how the results are
interpreted. Other individuals and
institutions also have rights of access to data that pertain to specific
experiments or projects.
A. The individual who collected the data (if
other than the principal investigator) has right of access to the extent that
the person who collected the data also provided significant intellectual input
(i.e., individuals who author publication of data) into experimental design and
execution and interpretation of the results. Furthermore, this individual has
certain rights that apply to the extent to which the data are made available to
third parties, including the public. Data should not be made public without the
consent of all individuals who provided significant intellectual input. The choice of media for presentation (e.g.,
journals, scientific meetings) ultimately resides with the principal
investigator, but this choice should be made in consultation with
collaborators.
B. UTMB has a right of access to data. This right of access resides specifically
with the Dean of the faculty member’s school and such individuals as he or she
may duly designate to investigate allegations of impropriety or mismanagement
of data. The exercise of UTMB's right of access is restricted to circumstances
in which there is probable cause to suspect mismanagement or misconduct.
C. The National Institutes
of Health (and all other federal funding agencies) claims right of access to
all data that were collected during the course of any project that received
partial or total support from that agency.
D. The public has rights of access to the data. This right derives from the fact that this is a public institution and all research carried out at UTMB is directly or indirectly supported by public funds. The principal investigator, the individuals who collected the data, and even the University may impose certain restrictions upon public access to the data. Such restrictions are necessary to prevent dissemination of the results of incomplete studies, which might mislead the public or discredit UTMB or the researchers who generated the data. The principal investigator and his or her agents also have the right to limit access to data that are being prepared for publication. However, after some period of time, all data should ordinarily become freely available to anyone who requests it. Exceptions to this general policy may arise when patient data, or other data on human subjects is involved, as discussed below.
In special cases, access to
data relating to human subjects may be limited by Public Health Service
confidentiality agreements. Clinical data also represent a special
circumstance. If clinical data are collected explicitly for a research study,
then the policies described above apply (subject to PHS and IRB oversight of
the use of clinical data). If clinical data are collected as part of a
patient's care, the data belong to UTMB and should be available to any
researcher at UTMB (subject to IRB considerations and such restrictions as may
be necessary to ensure confidentiality). This policy should hold for charts,
lab data, pathology, radiologic images, etc.
Who is responsible for maintenance and storage of
the data?
As custodian of the data,
the principal investigator has a compelling obligation to ensure that adequate
data maintenance and storage procedures are developed for the laboratory. The
principal investigator is responsible to ascertain that all who are involved in
data collection are familiar with these procedures.
Finally, the principal investigator must ascertain that those who
collect the data adhere to the established laboratory procedures for data
maintenance and storage. These
obligations go beyond any concern for accountability or issues of data
mismanagement or misrepresentation. All researchers have an unavoidable
professional responsibility to inculcate their staff, students, and trainees
with the highest standards of data recording. Indeed, the standards for
recording of data should be recognized as equal in importance to the standards
that apply to acquisition of data.
What becomes of data when a principal investigator moves to another
institution?
The removal of original data to another institution
could, in principle, infringe upon the rights of individuals, other than the
principal investigator, who may have vested interests in such data (e.g. the
principal investigator’s collaborators). Neither does the removal of original
data abrogate UTMB’s responsibilities to federal granting agencies, which hold
UTMB responsible for maintenance and storage of data that were obtained as a
result of federally‑funded experiments carried out by its faculty.
Consequently, the removal of original data must be undertaken with a view
toward protecting the rights of all individuals concerned, including the
principal investigator, his or her collaborators, and UTMB.
A principal investigator who
intends to remove original data should inform the Dean of their school of this
intent. The principal investigator
would normally be expected to retain custody of the data, on behalf of UTMB. As
custodian, the principal investigator will remain bound by UTMB policies for
storage and maintenance of data generated at UTMB, even though the investigator
is no longer affiliated with this institution. Under certain circumstances,
UTMB may exercise the right to retain the original data or to transfer
custodianship of the original data to one or more of the principal
investigator’s collaborators who remain affiliated with UTMB. Under such circumstances, the principal
investigator may remove a copy of the original data. If the original data
consist of specimens, tissue samples, histological slides, cell products, or
other material that cannot be duplicated, then the disposition of such material
will be determined by negotiation. The negotiation should involve the Dean of
the faculty member’s school, the principal investigator, his or her
collaborators, and any other individuals who may have legitimate rights that
pertain to the data. Under some
circumstances, it may be appropriate to involve the chairmen of concerned
departments in the negotiations. As stated above, the objective of any such
negotiation should be to insure that the rights of all interested parties are
protected.
The final
outcome of the negotiation must be approved by the Dean of the school involved.
Under certain circumstances the
collection, maintenance, and disposition of data are defined by contractual
agreement. The terms and conditions of such contracts will remain in force if
the principal investigator moves, unless the terms of the contract are altered
by negotiation. For example, confidential data that are removed from UTMB
remain bound by the IRB protocol under which the data were collected.
What principles apply to preservation of original data?
The maintenance of a complete record of every experiment undertaken is central to the experimental process. This record should be maintained in the form of a laboratory notebook (or its equivalent). The notebook should include the objectives, experimental components, original data, and such conclusions as pertain to each experiment. Each experiment should be identifiable by the date on which it was conducted (or begun, as the case may be), and the individual who was primarily responsible for data collection should be clearly identified. Any extraordinary circumstances that may bear upon interpretation of the data should be noted. In those cases in which the results may be derived by manipulation of the original data, the rationale for that manipulation (e.g., mathematical equation, the basis for selecting or excluding certain data from the analysis) should be clearly recorded.
The experimental records should be kept in a form that will allow interpretation of the results by a specialist in the field of study to which the data pertain. There are many different kinds of data, and the precise form of preservation must be appropriate to the manner in which the of data, and the precise form of preservation must be appropriate to the manner in which the results were generated. The results derived from counting cells in a hemocytometer, radiographic by manipulation of the original data, the rationale for that manipulation (e.g., mathematical equation, the basis for selecting or excluding certain data from the analysis) should be clearly recorded.
Circumstances
that may bear upon interpretation of the data should be noted. In those cases
in which the results may results, histological data, and many diverse types of
data will not conform to a single standard for recording and preservation. The
standard to which the investigator should conform is that, given the data and
the written record of the experiment, a knowledgeable colleague could
reconstruct the process by which the experiment was carried out, the data
collected, and the results interpreted without any additional information. This
requires that the primary data should be retained, labeled, and cross
referenced to the written experimental record in such manner as to permit
unassisted analysis of the results.
What general guidelines apply to manipulation of original data?
Manipulation of data is inherent in experimentation.
Almost all data are manipulated as a part of the process that gives rise to the
experimental results. A simple example of data manipulation is calculation of a
mean of a number of quantitative observations.
In expressing one's results as a mean, one is reporting a derivative
that was obtained by manipulation of the original data. Any process that
intervenes between the original data and the conclusions is a form of data
manipulation. In this sense, any
manipulation of the original data is allowable as long as that manipulation is
clearly described in the experimental record (laboratory notebook or its
equivalent). Stated in other terms, an investigator may do anything with the
data as long as the original data are preserved and the experimental record
clearly documents what was done by way of manipulating those data.. Equally
stringent criteria for describing the way in which data were manipulated apply
to public dissemination of experimental results.
How long should laboratory
records be stored?
The length of storage of
laboratory notebooks or their equivalent depends upon many factors including
whether a) there are ongoing projects that pertain to the data, b) the data
involve patients or other human subjects, c) the data pertain to sensitive
issues such as sexual behavior, and d) the results pertain to current controversial
research issues. Data that pertain to non‑sensitive, non‑controversial
issues and not to ongoing research should be preserved for a minimum of five
years after the date of collection if the data are not published. Published
data should be maintained for five years after the date of publication. Other
types of data may require storage for longer periods of time. For example, there are special requirements
for storage of data derived from clinical trials of drugs and devices.
Ad Hoc Committee to
Establish Guidelines for Data Management
E. Aubrey Thompson, PhD,
Chair
Robert Brecht, PhD
Howard Foyt, MD, PhD
Daniel Freeman, Jr., PhD
Armond Goldman, MD
Karen Westlund-High, PhD
Gilbert Hillman, PhD
Scott Pencil, MD, PhD
J. Thomas Molina, Student Representative