The DACC has been working with IGVF members in preparation to assist with submission of your data to the IGVF Portal. If you have data ready to be submitted, please initiate contact with the DACC at igvf-portal-help@lists.stanford.edu to get your submission process started. Once notified, the wrangling team at DACC will reach out and help set up the submission process by providing each submitter with an API access key to the portal, instructions on collecting metadata and tools available to help with data submission.
API access key pairs (provided by the wrangling team) are used to authenticate a user before giving access to data submission. Each submitter will be given a unique access key ID and secret access key pair. Please make a note of them as they are shared only once; however, new key pairs can be requested if the previous pair is lost.
Providing rich, reliable metadata is essential for maintaining high standards set by the IGVF consortium and making the Portal a valuable resource for the scientific community. Our current data model includes multiple components (for example: tissue, primary_cell, human_donor, etc. https://data.igvf.org/profiles ) Each component has its own set of metadata properties specifically designed to capture the relations of components to each other. All metadata prepared will be reviewed by the wrangling team at DACC. Any data submitted to the portal becomes accessible for internal IGVF consortium members. However, it is not going to become publicly available (“released”) until the DACC has finished the review of the submitted data and received approval from the submitting lab.
The data model supports the submission of objects classified under the following general categories: samples, donors, file sets, files, ontology terms, and other.
Samples | Donors | Files/File Sets | Files | Ontology Terms | Other types: |
---|---|---|---|---|---|
in_vitro_system, primary_cell, tissue, whole_organism, technical_sample | human_donor, rodent_donor | analysis_set, curated_set, measurement_set | reference_data, sequence_data | assay_term, phenotype_term, sample_term | award, biomarker, document, gene, image, lab, page, phenotypic_feature, publication, technical_sample, software, software_version, source, treatment |
*Note: The data model is being actively developed, see github schemas for further detail.
The IGVF data model includes schemas organized by object type that list the different properties (metatdata) describing the associated experimental artifact. These pages are key resources to refer to while preparing spreadsheets for submission.
There are two tools available for submitters to use: GoogleSheet (AppScript) - a web-browser google spreadsheet utilizing an embedded script to facilitate submission. igvf_utils - prepackaged python scripts that take tab-separated files (tsv) as input.
Each object type, also known as profiles, will need its own spreadsheet as it has its own set of metadata properties. Please note that although primary_cells, tissue, etc. are categorized as biosamples, they are considered as different object types in our system. Same concept applies to human_donor and rodent_donor. For that reason, multiple sheets will have to be prepared depending on the number of object types being submitted.
Let’s go through the HumanDonor and the biosample Tissue schema, assess which properties are needed, what type of property it is and assign an example value.
*Please remember that for any property that links to another object, an identifier of an existing object on the Portal will have to be provided for reference. If you are unsure of what identifier to use, please contact the wrangling team.
Descriptions of both required and optional properties for Human Donor can be found here in JSON format. Required properties must be described to successfully submit an object record. Optional properties are recommended to provide if they are available and applicable.
{
"title": "Human Donor",
"$id": "/profiles/human_donor.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "Derived schema submitting human donors.",
"type": "object",
"required": [
"award",
"lab",
"taxa"
]
}...
Required Property | Type | Comments | Example Value |
---|---|---|---|
award | string | Link to an associated award or grant object. | /awards/HG012012 |
lab | string | Link to an associated lab. | /labs/john-doe |
taxa | string (enum) | Donor’s taxa. |
Optional Property | Type | Comments | Example Value |
---|---|---|---|
phenotypic_features | array of strings | List of links to the associated phenotypic features of the donor. | [“HP:0000726”, “MONDO:0004975”] |
ethnicities | array of strings (enums) | http://bioportal.bioontology.org/ontologies/HANCESTRO terms are used. | [“Hispanic”, “Arab”] |
*Note: Not all optional properties are listed here in the example, for more properties see schema page.
{
"title": "Tissue",
"$id": "/profiles/tissue.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"description": "Schema for submitting a tissue sample",
"type": "object",
"required": [
"award",
"lab",
"source",
"donors",
"taxa",
"biosample_term"
]
}...
Required Property | Type | Description | Example Value |
---|---|---|---|
award | string | Grant associated with the submission. | /awards/HG012012 |
lab | string | Lab associated with the submission. | /labs/john-doe |
source | string | Sample provider lab or a vendor. | /sources/atcc |
donors | array of strings | Donor(s) the sample was derived from. | [“IGVFDO1645ZWSY”, “IGVFDO2416VXNA”] |
taxa | string | Organism of the sample. Enum options include: ‘Homo sapiens’, ‘Mus musculus’, ‘Saccharomyces’. | Homo sapiens |
biosample_term | string | Ontology term identifying a biosample. Links to Sample Term object (unique identifier). | /sample-terms/UBERON_0000955 |
Optional Property | Type | Description | Example Value |
---|---|---|---|
pmi | integer | Post-mortem Interval, the amount of time that elapsed since the death of the donor. | 3 |
pmi_units | string | The unit in which the PMI time was reported. Enum list includes: second, minute, hour, day, week. | day |
preservation_method | string | The method by which the tissue was preserved. Enum list includes cryopreservation, flash-freezing. | flash-freezing |
date_obtained | string | Date harvested. Date should be submitted as YYYY-MM-DD. | 2022-04-02 |
In the sheets or tab-separated files, the first row is designated as the header containing the names of each property to be submitted. Following rows are designated for object records. Multiple records can be submitted at once.
aliases | award | lab | taxa |
---|---|---|---|
john-doe:donor_01 | /awards/HG012012 | /labs/john-doe | Homo sapiens |
john-doe:donor_02 | /awards/HG012012 | /labs/john-doe | Homo sapiens |
For every object that is submitted to the portal, the system automatically generates a unique identifier (uuid). For a subset of objects in addition to the uuid an accession is generated, following the format IGVF[SM|DO][0]9]{4}[A-Z]{4}, where [SM|DO] refer to the object type. The examples Human Donor and Tissue will have accessions automatically generated, IGVFDO[0]9]{4}[A-Z]{4} and IGVFSM[0]9]{4}[A-Z]{4}, respectively.
IMPORTANT: While accessions and unique identifiers (UUIDs) are automatically generated and can be used to find your object of interest, we highly encourage the use of aliases property, another form of a unique identifier. Aliases are not assigned by the system and provide an opportunity for submitters to assign an identifier that makes sense for internal records such as the identifier coming from the lab's LIMS system.
Aliases are to be formatted in the following way: ‘[lab name]:[chosen identifier]’ (e.g. john-doe:experiment_01).
*Note: These three types of IDs (uuid, accession, and aliases) can be used interchangeably to refer to an object in the spreadsheets used for object submission or modification.
Following successful submission, appending the object type followed by an identifier of the object such as uuid, accession, or alias to the URL of the server will allow you to view your object.
Examples:
If your objects have a metadata error(s) you need to fix, you can easily patch your object property values. The first column header in your spreadsheet should be either accession (for Google Sheets Submitter) or record_id (for igvf_utils). The property(s) to be updated should be specified in the next columns.
Example: for the tissue pmi and pmi_units properties, both records initially specified as 3 days will be changed to 5 weeks.
accession | pmi | pmi_units |
---|---|---|
john-doe:tissue_01 | 5 | week |
john-doe:tissue_02 | 5 | week |
Order of Submission Matters IMPORTANT: The order of submission by object type matters! Objects can be related or linked to each other. Creation of these relationships depends on the proper order of submission. For example, a tissue object relates to a specific donor object (a unique identifier must be specified), see the example above. Therefore, the donor(s) needs to be submitted first, otherwise you will not be able to reference them upon submission, causing an error if the donor property is required. .
Current order of object types for submissions link