Introduction

The DACC has been working with IGVF members in preparation to assist with submission of your data to the IGVF Portal. If you have data ready to be submitted, please initiate contact with the DACC at igvf-portal-help@lists.stanford.edu to get your submission process started. Once notified, the wrangling team at DACC will reach out and help set up the submission process by providing each submitter with an API access key to the portal, instructions on collecting metadata and tools available to help with data submission.

API access key pairs

API access key pairs (provided by the wrangling team) are used to authenticate a user before giving access to data submission. Each submitter will be given a unique access key ID and secret access key pair. Please make a note of them as they are shared only once; however, new key pairs can be requested if the previous pair is lost.

Collecting Metadata for submission

Providing rich, reliable metadata is essential for maintaining high standards set by the IGVF consortium and making the Portal a valuable resource for the scientific community. Our current data model includes multiple components (for example: tissue, primary_cell, human_donor, etc. https://data.igvf.org/profiles ) Each component has its own set of metadata properties specifically designed to capture the relations of components to each other. All metadata prepared will be reviewed by the wrangling team at DACC. Any data submitted to the portal becomes accessible for internal IGVF consortium members. However, it is not going to become publicly available (“released”) until the DACC has finished the review of the submitted data and received approval from the submitting lab.

The data model supports the submission of objects classified under the following general categories: samples, donors, file sets, files, ontology terms, and other.

SamplesDonorsFiles/File SetsFilesOntology TermsOther types:
in_vitro_system, primary_cell, tissue, whole_organism, technical_samplehuman_donor, rodent_donoranalysis_set, curated_set, measurement_setreference_data, sequence_dataassay_term, phenotype_term, sample_termaward, biomarker, document, gene, image, lab, page, phenotypic_feature, publication, technical_sample, software, software_version, source, treatment

*Note: The data model is being actively developed, see github schemas for further detail.

IGVF Schemas

The IGVF data model includes schemas organized by object type that list the different properties (metatdata) describing the associated experimental artifact. These pages are key resources to refer to while preparing spreadsheets for submission.

Data Submission Tools

There are two tools available for submitters to use: GoogleSheet (AppScript) - a web-browser google spreadsheet utilizing an embedded script to facilitate submission. igvf_utils - prepackaged python scripts that take tab-separated files (tsv) as input.

Each object type, also known as profiles, will need its own spreadsheet as it has its own set of metadata properties. Please note that although primary_cells, tissue, etc. are categorized as biosamples, they are considered as different object types in our system. Same concept applies to human_donor and rodent_donor. For that reason, multiple sheets will have to be prepared depending on the number of object types being submitted.

Submission Examples

Let’s go through the HumanDonor and the biosample Tissue schema, assess which properties are needed, what type of property it is and assign an example value.

*Please remember that for any property that links to another object, an identifier of an existing object on the Portal will have to be provided for reference. If you are unsure of what identifier to use, please contact the wrangling team.

-Human Donor Example-

Descriptions of both required and optional properties for Human Donor can be found here in JSON format. Required properties must be described to successfully submit an object record. Optional properties are recommended to provide if they are available and applicable.

{
  "title": "Human Donor",
  "$id": "/profiles/human_donor.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Derived schema submitting human donors.",
  "type": "object",
  "required": [
    "award",
    "lab",
    "taxa"
  ]
}...
Required PropertyTypeCommentsExample Value
awardstringLink to an associated award or grant object./awards/HG012012
labstringLink to an associated lab./labs/john-doe
taxastring (enum)Donor’s taxa.
Optional PropertyTypeCommentsExample Value
phenotypic_featuresarray of stringsList of links to the associated phenotypic features of the donor.[“HP:0000726”, “MONDO:0004975”]
ethnicitiesarray of strings (enums)http://bioportal.bioontology.org/ontologies/HANCESTRO terms are used.[“Hispanic”, “Arab”]

*Note: Not all optional properties are listed here in the example, for more properties see schema page.

-Tissue Example-

{
  "title": "Tissue",
  "$id": "/profiles/tissue.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "Schema for submitting a tissue sample",
  "type": "object",
  "required": [
    "award",
    "lab",
    "source",
    "donors",
    "taxa",
    "biosample_term"
  ]
}...
Required PropertyTypeDescriptionExample Value
awardstringGrant associated with the submission./awards/HG012012
labstringLab associated with the submission./labs/john-doe
sourcestringSample provider lab or a vendor./sources/atcc
donorsarray of stringsDonor(s) the sample was derived from.[“IGVFDO1645ZWSY”, “IGVFDO2416VXNA”]
taxastringOrganism of the sample. Enum options include: ‘Homo sapiens’, ‘Mus musculus’, ‘Saccharomyces’.Homo sapiens
biosample_termstringOntology term identifying a biosample. Links to Sample Term object (unique identifier)./sample-terms/UBERON_0000955
Optional PropertyTypeDescriptionExample Value
pmiintegerPost-mortem Interval, the amount of time that elapsed since the death of the donor.3
pmi_unitsstringThe unit in which the PMI time was reported. Enum list includes: second, minute, hour, day, week.day
preservation_methodstringThe method by which the tissue was preserved. Enum list includes cryopreservation, flash-freezing.flash-freezing
date_obtainedstringDate harvested. Date should be submitted as YYYY-MM-DD.2022-04-02

Submitting Objects

In the sheets or tab-separated files, the first row is designated as the header containing the names of each property to be submitted. Following rows are designated for object records. Multiple records can be submitted at once.

-Human Donor Example Input Sheet-

aliasesawardlabtaxa
john-doe:donor_01/awards/HG012012/labs/john-doeHomo sapiens
john-doe:donor_02/awards/HG012012/labs/john-doeHomo sapiens

Understanding Identifiers and the Importance of the Alias Identifier

For every object that is submitted to the portal, the system automatically generates a unique identifier (uuid). For a subset of objects in addition to the uuid an accession is generated, following the format IGVF[SM|DO][0]9]{4}[A-Z]{4}, where [SM|DO] refer to the object type. The examples Human Donor and Tissue will have accessions automatically generated, IGVFDO[0]9]{4}[A-Z]{4} and IGVFSM[0]9]{4}[A-Z]{4}, respectively.

IMPORTANT: While accessions and unique identifiers (UUIDs) are automatically generated and can be used to find your object of interest, we highly encourage the use of aliases property, another form of a unique identifier. Aliases are not assigned by the system and provide an opportunity for submitters to assign an identifier that makes sense for internal records such as the identifier coming from the lab's LIMS system.

Aliases are to be formatted in the following way: ‘[lab name]:[chosen identifier]’ (e.g. john-doe:experiment_01).

*Note: These three types of IDs (uuid, accession, and aliases) can be used interchangeably to refer to an object in the spreadsheets used for object submission or modification.

Reviewing Submissions

Following successful submission, appending the object type followed by an identifier of the object such as uuid, accession, or alias to the URL of the server will allow you to view your object.

Examples: appending identifier to url

Updating Submitted Objects

If your objects have a metadata error(s) you need to fix, you can easily patch your object property values. The first column header in your spreadsheet should be either accession (for Google Sheets Submitter) or record_id (for igvf_utils). The property(s) to be updated should be specified in the next columns.

Example: for the tissue pmi and pmi_units properties, both records initially specified as 3 days will be changed to 5 weeks.

accessionpmipmi_units
john-doe:tissue_015week
john-doe:tissue_025week

Order of Submission Matters IMPORTANT: The order of submission by object type matters! Objects can be related or linked to each other. Creation of these relationships depends on the proper order of submission. For example, a tissue object relates to a specific donor object (a unique identifier must be specified), see the example above. Therefore, the donor(s) needs to be submitted first, otherwise you will not be able to reference them upon submission, causing an error if the donor property is required. .

Current order of object types for submissions link