Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions .devcontainer/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
services:
couchdb:
image: couchdb:3.1.1
build:
context: ..
dockerfile: Dockerfile
environment:
- COUCHDB_USER=admin
- COUCHDB_PASSWORD=secret
Expand All @@ -9,6 +11,12 @@ services:
- ../seed:/opt/couchdb/seed:ro
ports:
- "5984:5984"
healthcheck:
test: ["CMD", "curl", "-sf", "http://localhost:5984/_up"]
interval: 10s
timeout: 5s
retries: 10
start_period: 15s
networks:
- statusdb-network

Expand All @@ -20,7 +28,8 @@ services:
networks:
- statusdb-network
depends_on:
- couchdb
couchdb:
condition: service_healthy

volumes:
couchdb-data:
Expand Down
220 changes: 220 additions & 0 deletions DATABASES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,220 @@
# Database Documentation

This file documents the purpose and contents of each CouchDB database in the seed data.

---

## `agreement_templates`

Storing templates used to create user agreements. One document per template. Seems to be both versions (different documents) and editions (updates of document)?

---

## `agreements`

Stores user agreements created in Genomics Status, connected to the cost calculator implementation in Genoimcs Status.

---

## DEPRECATED: `analysis`

Summary statistics from various analysis, likely updated from `Piper` or similar since its not updated anymore. Likely stopped being updated around 2020 or similar.

---

## `application_categories`

Listing our application categories and abrevation, with one document per application. Used by the statistics api of genstat which in turn feeds the dashboard.

---

## `bioinfo_analysis`

Listing status flags for bioinfo tab on Genomics Status. One document per sample-flowcell-lane combination, which gets problematic when setting a status on flowcell level when there is a lot of samples on many lanes.
Documents are created by ngi-pipeline or TACA and statuses can be set from Genomics Status UI.

---

## `biomek_logs`

Short log messages from the biomek runs.

---

## `charon`

Huge database tracking sample status on the bioinfo side. Database for the Charon web application which has up to 2025 also been used by the Uppsala node of NGI. Lots of different types of documents with different content and structure.

---

## `cost_calculator`

Storing the prices for our different offerings. One document per version of the cost calculator. Updated from Genomics Status and used by Genomics Status to display cost and generate agreements.

---

## `cronjobs`

Storing the output of `crontab -l` for different users and servers, updated by TACA in a cronjob, and displayed on Genomics Status. One document per server, with different users appended to the document.

---

## `element_runs`

Flowcells database for the Element Bioisciences sequencing instrument. Created by TACA and displayed on Genomics Status.

---

## DEPRECATED `expected_yields`

One document per run mode and sequencer. Not sure where this is used.

---

## DEPRECATED `flowcells`

Illumina flowcells database used for pre HiSeq-X. Not updated after 2016.

---

## `gs_configs`

Miscelaneous config files used for genomics status.

---

## `gs_links`

Tracking the links added to projects on Genomics Status. One project link per document.

---

## `gs_users`

Genomics Status users, one document for each user. Keeps track of Initials, roles and user presets.

---

## DEPRECATED `instrument_logs`

Instrument logs very similar to the biomek_logs databse, but seems to have been used by the Bravo computers, last update was August 2025 but I think the Bravo computers were taken offline after that.

---

## `instruments`

Might be out of date, used by fc_trends plot to name the instruments. A database to list all sequencing instrument IDs and corresponding names. Can be replaced with a gs_config document.

---

## `nanopore_runs`

The flowcells database for ONT. One document per flowcell and run. A reused flowcell will generate a new document.

---


## `people_assignments`

Keeps track of assignment of people to projects. Used in Genomics Status project cards and the project page introduced around 2025.

---

## [DEPRECATED] `pricing_components`

Was used for cost calculator on Genstat before it was actually launched. The information is now instead baked into cost_calculator documents.

---

### `pricing_exchange_rates`

Tracking exchange rates USD to SEK and EUR to SEK. Keeping a historical record so we can rewind time when changing agreements that was generated some time ago and we would like to keep the same exchange rate. One document per update, updated weekly by a cronjob.

---

## [DEPRECATED] `pricing_products`

Was used for cost calculator on Genstat before it was actually launched. The information is now instead baked into cost_calculator documents.

---

## `projects`

One of the major databases that has been around from the start. One document per project. Since it has been around so long, there might be some differences in format/missing fields in documents from different years.

---

## [Might be DEPRECATED] `reference`

Seems to be a mapping between fields we use in genstat or in other places and how we fetch them from LIMS. Not sure if its used.

---


## `running_notes`

Keeps all the running notes, one document per note. A partitioned database, which is a bit different, the partition key is the `parent` of the running note, which is commonly the specific project, workset or flowcell.

---

## [WARNING] `sample_requirements`

Never made it into use, was supposed to keep track of sample requirements for different preps. Still visible on Genomics Status.

---

## `sensorpush`

Uploades time series data for our freezers and fridges that are connected with a sensorpush sensor. One document per sensor and 24h. Data is uploaded by a script running on ngi_internal polling the sensorpush API. We suspect that the API isn't great and sometimes misses out data, but it could also be sensors that drop out.

Data is visualized on Genomics Status.

---

## `server_status`

Tracking storage of server file systems, used to display status on Genomics Status index page. One document per server/system and day. Updated by a TACA command run as a cronjob on ngi-preproc. It's a bit broken since the script is using `df -h` and the network mounted storage systems are not always giving correct stats for that command.

---

## `suggestion_box`

Keeps track of which items have been added to the suggestion box. Keeps a link to the Jira item and whether its been archived or not.

---

## [DEPRECATED] `taca_flowcells`

Abandoned project for taca to uploade flowcells I think. Only one document and uploaded 2015.

---

## `userman`

Database to keep track of the Charon users.

---

## `worksets`

The worksets database populated by LIMS2DB. One document per workset.

---

## `x_flowcells`

The Illumina flowcells database, updated by TACA and visualised on Genomics Status. Was named `x` since it was introduced with the HiSeq X machines which had a very different metadata output from the previous generation.

---

## `y_flowcells`

A database to track the transfer status of all flowcells handled by the transfer script introduced with Project Helix. Updated every hour using a cronjob on ngi-preproc. The name was chosen because it comes after x_flowcells.

---

## `yggdrasil`

Internal database used by yggdrasil for high level logging.

---
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,23 +52,24 @@ The easiest way to develop seed data is using the VS Code Dev Container:

## Seed Data Structure

The `seed/` directory contains JSON documents that are loaded into CouchDB on startup.
The `seed/data/` directory contains JSON documents that are loaded into CouchDB on startup.

### Directory Structure

```
seed/
├── <database_name>/ # Creates a database and loads all JSON files into it
│ ├── doc1.json
│ └── doc2.json
└── *.json # Top-level JSON files are loaded into 'statusdb' database
└── data/
├── <database_name>/ # Creates a database and loads all JSON files into it
│ ├── doc1.json
│ └── doc2.json
└── *.json # Top-level JSON files are loaded into 'statusdb' database
```

### Document Format

Each JSON file should contain a single CouchDB document. If the document has an `_id` field, it will be used as the document ID. Otherwise, CouchDB will auto-generate an ID.

Example document (`seed/example_project.json`):
Example document (`seed/data/example_project.json`):

```json
{
Expand All @@ -89,7 +90,7 @@ docker run -p 5984:5984 -e COUCHDB_USER=admin -e COUCHDB_PASSWORD=admin statusdb

## Adding New Seed Data

1. Add JSON files to the `seed/` directory (or subdirectories for specific databases)
1. Add JSON files to the `seed/data/` directory (or subdirectories for specific databases)
2. Commit and push to `main` branch
3. GitHub Actions will automatically build and publish a new image

Expand Down
2 changes: 1 addition & 1 deletion scripts/init-seed-data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ COUCHDB_HOST="${COUCHDB_HOST:-localhost}"
COUCHDB_PORT="${COUCHDB_PORT:-5984}"
COUCHDB_USER="${COUCHDB_USER:-admin}"
COUCHDB_PASSWORD="${COUCHDB_PASSWORD:-admin}"
SEED_DIR="${SEED_DIR:-/opt/couchdb/seed}"
SEED_DIR="${SEED_DIR:-/opt/couchdb/seed/data}"
INIT_MARKER="/opt/couchdb/data/.seed_initialized"

COUCHDB_URL="http://${COUCHDB_USER}:${COUCHDB_PASSWORD}@${COUCHDB_HOST}:${COUCHDB_PORT}"
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
function (doc) {
emit(doc["edition"], doc);
}
3 changes: 3 additions & 0 deletions seed/views/agreements/entire_document/project_id.map.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
function (doc) {
emit(doc.project_id, doc);
}
3 changes: 3 additions & 0 deletions seed/views/agreements/project/project_id.map.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
function (doc) {
emit(doc.project_id, doc._id);
}
19 changes: 19 additions & 0 deletions seed/views/analysis/RNA_report.map.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
function(doc) {
var project_id=Object.keys(doc.samples)[0].split('_')[0];
var data={};
for (s in doc.samples){
data[s]={};
data[s]['total_reads']=parseInt(doc.samples[s].mapping_statistics.Total_No_reads);
data[s]['unique_reads_mapped']=doc.samples[s].mapping_statistics.bef_dup_rem['%uniq_mapped'];
data[s]['unique_reads_mapped_nodup']=doc.samples[s].mapping_statistics.aft_dup_rem['%uniq_mapped'];
data[s]['CDS_per_kb']=parseFloat(doc.samples[s].read_distribution.CDS_Exons['Tags/Kb']);
data[s]['5UTR_per_kb']=parseFloat(doc.samples[s].read_distribution["5'UTR_Exons"]['Tags/Kb']);
data[s]['3UTR']=parseFloat(doc.samples[s].read_distribution["3'UTR_Exons"]['Tags/Kb']);
data[s]['Introns']=parseFloat(doc.samples[s].read_distribution.Introns['Tags/Kb']);
data[s]['TSS']=parseFloat(doc.samples[s].read_distribution.TSS_up_1kb['Tags/Kb']);
data[s]['TES']=parseFloat(doc.samples[s].read_distribution.TES_down_1kb['Tags/Kb']);
data[s]['%mRNA']=parseFloat(doc.samples[s].read_distribution.mRNA_frac);
data[s]['%rRNA']=parseFloat(doc.samples[s].percent_rRNA);
}
emit(project_id, data);
}
5 changes: 5 additions & 0 deletions seed/views/analysis/multiqc/full_doc.map.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
function(doc) {
if(doc.entity_type == 'MultiQC_data'){
emit( doc.project_id, doc);
}
}
15 changes: 15 additions & 0 deletions seed/views/analysis/multiqc/modules.map.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
function(doc) {
if(doc.entity_type == 'MultiQC_data'){
mqc_modules = [];
for (s in doc.samples){
mods = Object.keys(doc.samples[s]);
for(i=0; i<mods.length; i++){
if(mqc_modules.indexOf(mods[i]) > -1) {
continue;
}
mqc_modules.push(mods[i]);
}
}
emit( doc.project_id, mqc_modules);
}
}
1 change: 1 addition & 0 deletions seed/views/analysis/names/id_to_name.map.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
function(doc) {emit(doc["_id"], doc["project_name"]);}
Loading
Loading