You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/index.html
+7-7Lines changed: 7 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -2527,7 +2527,7 @@ <h2 id="overview-of-the-data-infrastructure-framework">Overview of the data infr
2527
2527
<p>For this project, the data infrastructure <em>framework</em> is defined as 1) a set of software programs, 2) a defined and fixed set of conventions on the structure and format of the filesystem and URL paths, and 3) a defined structure to the data and associated documentation, all of which are linked together as modular components. The framework will serve as an open source starting template for setting up data infrastructures that make use of modern tools and processes.</p>
2528
2528
<p>This framework encompasses four target users and three layers, with a complete schematic shown in Figure 2. The three layers are the web portal frontend, the database and documentation backend, and the API (Application Programming Interface) that interacts with both. The four users and their associated use cases are:</p>
2529
2529
<oltype="1">
2530
-
<li><strong>User 1</strong>: Those _inputting data, _e.g., authorized centers and researchers. The use cases are:
2530
+
<li><strong>User 1</strong>: Those <em>inputting data</em>, e.g., authorized centers and researchers. The use cases are:
2531
2531
<oltype="1">
2532
2532
<li>Authorized centers (GPs and hospitals) upload standardized and routine data through the data entry web portal.</li>
2533
2533
<li>Authorized researchers upload generated data from completed projects. Non-standardized data is manually processed and cleaned before entering into the database.</li>
@@ -2537,7 +2537,7 @@ <h2 id="overview-of-the-data-infrastructure-framework">Overview of the data infr
2537
2537
<li>Interested researchers browse the catalogue of available data and the data dictionary.</li>
2538
2538
<li>Researchers request access to data by submitting a description of their proposed project and selecting the relevant data from the catalogue. This request is sent to a list of projects to await approval from the data controllers (User 4).</li>
2539
2539
</ol></li>
2540
-
<li><strong>User 3</strong>: Those _viewing updates on findings and results _such as aggregate statistics, e.g., policymakers, healthcare workers, journalists, researchers, and the general public. Use cases are:
2540
+
<li><strong>User 3</strong>: Those <em>viewing updates on findings and results</em> such as aggregate statistics, e.g., policymakers, healthcare workers, journalists, researchers, and the general public. Use cases are:
2541
2541
<olstart="5" type="1">
2542
2542
<li>Users view and read through the list of completed, ongoing, and proposed projects that use the database.</li>
2543
2543
<li>Users access and view aggregate statistics and the latest published findings that are relevant to them/their practice.</li>
@@ -2664,13 +2664,13 @@ <h2 id="applying-the-framework-to-dd2">Applying the framework to DD2</h2>
2664
2664
<p>The biggest potential challenge to applying the framework to DD2 is getting the database backend into the appropriate structure to fit within the framework. With the current state of the DD2 data, considerable time and effort is needed to organize it. Our initial steps will be to:</p>
2665
2665
<oltype="1">
2666
2666
<li><em>Survey and map out all data, documentation, and processing steps</em>. Currently, the original enrollment data are stored in a secure server at the DD2 headquarters at Odense, while subsets of data collected for specific research projects are spread across several research institutions.</li>
2667
-
<li><em>Map the data input sources and formats from the various collection centers.</em> GP clinics have an existing pipeline for sending data to DD2 through their system, “sundhedsdatanettet” (national healthcare portal), following a patient visit. Other collection centers like hospitals have custom, but not standardized, approaches to sending data to DD2. No formal approach is available for returning data from completed projects. The core DD2 data is sent to “forskermaskine” and gets merged there with the registry data, which is in a standard DST format.<br/>
2667
+
<li><em>Map the data input sources and formats from the various collection centers</em>. GP clinics have an existing pipeline for sending data to DD2 through their system, “sundhedsdatanettet” (national healthcare portal), following a patient visit. Other collection centers like hospitals have custom, but not standardized, approaches to sending data to DD2. No formal approach is available for returning data from completed projects. The core DD2 data is sent to “forskermaskine” and gets merged there with the registry data, which is in a standard DST format.<br/>
2668
2668
</li>
2669
-
<li><em>Move as much data as possible to a central location.</em> All data will be stored at the DD2 headquarters in Odense, except large-scale data that will be either stored or transferred as needed to a high-performance computing (HPC) platform. For linkage with registry data, we will acquire a dedicated DD2 “forskermaskine” server.</li>
2669
+
<li><em>Move as much data as possible to a central location</em>. All data will be stored at the DD2 headquarters in Odense, except large-scale data that will be either stored or transferred as needed to a high-performance computing (HPC) platform. For linkage with registry data, we will acquire a dedicated DD2 “forskermaskine” server.</li>
2670
2670
<li><em>Re-structure current data into the framework’s CDM</em>.</li>
2671
-
<li><em>Build software to automate the cleaning, processing, and merging of the existing and established data input pipelines into the framework’s required backend format.</em></li>
2672
-
<li><em>Establish automated processes for linkages between the data storage servers.</em></li>
2673
-
<li><em>Implement framework’s remaining modules, starting with User 4 and then moving from User 1 to 3, in that order.</em></li>
2671
+
<li><em>Build software to automate the cleaning, processing, and merging of the existing and established data input pipelines into the framework’s required backend format</em>.</li>
2672
+
<li><em>Establish automated processes for linkages between the data storage servers</em>.</li>
2673
+
<li><em>Implement framework’s remaining modules, starting with User 4 and then moving from User 1 to 3, in that order</em>.</li>
2674
2674
</ol>
2675
2675
<p>Currently, User 3 can request data by filling out a Word <ahref="https://dd2.dk/media/1410/standard-dd2-protocol_final.doc">application form</a> and emailing it to the chair of the advisory board Kurt Højlund and programme leader Jens Steen Nielsen. Applications are reviewed by the steering committee and, once approved, the data manager at the Department of Clinical Epidemiology (KEA) in Aarhus University Hospital then manually extracts the requested data and transfers the data subset to the applicant’s secure server and does this for each individual research project. If requested, KEA may also perform analyses on the data. Researchers must already have valid authorized access to the secure servers on an existing “forskermaskine” or an <ahref="https://www.deic.dk/en/supercomputing/national-hpc-facilities">HPC facility</a> for the large-scale data, such as <ahref="https://www.computerome.dk/">Computerome 2</a> and <ahref="https://genome.au.dk/">GenomeDK</a>.</p>
2676
2676
<p>The costs of storing the original data are covered by DD2, while applicants cover the costs related to storing the transferred data. We will not charge for data access. As per legal requirements, researchers can only use the data for the intended purposes listed in the application. After project completion, the researchers must delete or close access to the data and inform DD2 as legally required. Any newly generated data must be returned to DD2 by uploading via the User 1 portal.</p>
Copy file name to clipboardExpand all lines: index.Rmd
+9-9Lines changed: 9 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -124,8 +124,8 @@ researchers and is a role model to building a modern research infrastructure.
124
124
125
125
While the UK Biobank is a source of inspiration on the state-of-the-art, the
126
126
underlying infrastructure itself is not openly accessible and reusable. The
127
-
same applies to a similar Danish initiative, the “_Single path to access Danish
128
-
health data_” project [@sundhedsdata], where
127
+
same applies to a similar Danish initiative, the "_Single path to access Danish
128
+
health data_" project [@sundhedsdata], where
129
129
the Danish government and individual regions are collaborating to map out all
130
130
Danish health data. Another state-of-the-art initiative led by the University
131
131
of Chicago, USA is Gen3 [@gen3],
@@ -168,15 +168,15 @@ database and documentation backend, and the API (Application Programming
168
168
Interface) that interacts with both. The four users and their associated use
169
169
cases are:
170
170
171
-
1.**User 1**: Those _inputting data, _e.g., authorized centers and
171
+
1.**User 1**: Those _inputting data_, e.g., authorized centers and
172
172
researchers. The use cases are:
173
173
1. Authorized centers (GPs and hospitals) upload standardized and routine data through the data entry web portal.
174
174
2. Authorized researchers upload generated data from completed projects. Non-standardized data is manually processed and cleaned before entering into the database.
175
175
2.**User 2**: Those _requesting access_, e.g., researchers and clinicians. Use
176
176
cases are:
177
177
3. Interested researchers browse the catalogue of available data and the data dictionary.
178
178
4. Researchers request access to data by submitting a description of their proposed project and selecting the relevant data from the catalogue. This request is sent to a list of projects to await approval from the data controllers (User 4).
179
-
3.**User 3**: Those _viewing updates on findings and results _such as
179
+
3.**User 3**: Those _viewing updates on findings and results_ such as
0 commit comments