FireCloud, developed and maintained by the Broad Institute, is one of the three National Cancer Institute’s Cancer Genomics Cloud Pilots. https://software.broadinstitute.org/firecloud/. This documment provides step-by-step description of creating a project on the FireCloud and upload RNA-seq (BAM) data. Before completing this tutorial, please set up an user account and a billing account with the instructions on the Firecloud’s website.
https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-registration
- Go to <Portal.FireCloud.org>
- Click "Register"
- Sign In with a Gmail or Google Apps Account (e.g., FireCloudUser@gmail.com). When asked to allow FireCloud to view your email address and basic profile info, etc., please click Allow. An explanation of these requests is posted in the Help Forum. You will be prompted to enter New User Registration information.
- Please enter a Contact Email if it differs from your Gmail or Google Apps Account.
- Click Register
(After registering, you will see a message indicating your FireCloud account is inactive. A FireCloud administrator will activate your account within 24 hours.)
https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-google
Here is some information on estimating costs in FireCloud from the development team.
FireCloud runs on the Google Cloud Platform (GCP). All FireCloud costs (cloud storage, compute, data egress etc.) are ultimately billed via Google Billing Accounts. In order to be compatible with multiple cloud environments, institutional payment systems, and security requirements, the FireCloud interface does not directly display any part of the Google Billing Account interface.
Instead, FireCloud will connect a Google Billing Account to an associated FireCloud Billing Project. When you are using FireCloud, it is these FireCloud Billing Projects that you will see in the interface, and to which FireCloud will charge your usage costs.
In order to create or clone a new workspace in FireCloud, you must have access to at least one FireCloud Billing Project. To get started with a FireCloud Billing Project or Google Billing Account, you can check out this [Help Topic] https://software.broadinstitute.org/firecloud/guide/topic?name=firecloud-google in the FireCloud User Guide.
Google Cloud Platform (GCP) Pricing Structure
Example - FireCloud testers ran a MuTect analysis on 1089 tumor/normal pairs - Compute: At $0.05/core hour and 23,305.4 core hours, the cost was $1,165.27 ($1.07 dollars/run) - Storage: There were 1089 output files with a total size of 0.001141 TB. At $26/TB/month, the storage/month was $0.03. Note: In this example, the pairs referred to BAMs that resided in the TCGA Open Access bucket, so no storage costs were incurred by the testers. - Download: To download these files, at $120/TB, the cost is $0.14. - Note: if the testers wanted to estimate costs, they could first run the analysis on a few pairs and calculate the core hours per run.
For more information, you can check out these links: - Google Compute Pricing - Google Storage Pricing - FireCloud Projects and Billing Accounts
https://portal.firecloud.org/
This tutorial is written using the Google Chrome as browser.
Click "Sign In"
Sign in with your google credential
(If you have multiple google accounts, log in in an incognito window by clicking File -> New Incognito Window)
After you log in, your profile is displayed.
If you have an NIH account, link your NIH account.
Click "Log-In to NIH to re-link your account"
Log in with your NIH credential
(You will need to renew/re-link your account periodically)
To upload file,
Click on the line under "Google Bucket" which opens the Google Cloud Storage browser for this workspace
You will be taken to the Google Cloud Platform
Click "Upload Files"
Select all the .bam and .bai files
Click "Open"
(Multiple file tends to only upload 3 files simoutaneously, repeat this process as necessary until all bam/bai files are uploaded and can be seen in the Google Console)
Once the files are uploaded to the Google console, you need to add them to the workspace.
To do this, you need two TSV files that describes the bam/bai files.
The first one, called “participants.tsv” has information about the participants and disease type.
participant_id/sample_id fields are set arbitrarily and should be unique.
participant.tsv
entity:participant_id disease
G28029 BRCA
G41676 BRCA
G41659 LCLL
G41707 LCLL
The second one, called “sample.tsv” has information about the bam/bai files. sample.tsv has four columns, “sample_id”, “BAM_index”, “participant_id”, and “BAM.”
“sample_id” is arbitrarily set and should be unique.
“pariticpant_id” should match the participant_id in “participant.tsv” “BAM_index”/“BAM” are addresses where the bam/bai files are in the Google Cloud Storage.
The address should have the format “gs://[Google_Bucket]/[file_name]”
If your Google Bucket location is “fc-974b342c-b6f2-45e7-b901-xxxxxxxxxxxx” and file name is “NGS.bam”, your address in that column should be “gs://c-974b342c-b6f2-45e7-b901-xxxxxxxxxxxx/NGS.bam” For clarity, full file address is not listed below.
sample.tsv
entity:sample_id BAM_index participant_id BAM
G28029_sample bai_address1 G28029 bam_address1
G41676_sample bai_address2 G41676 bam_address2
G41659_sample bai_address3 G41659 bam_address3
G41707_sample bai_address4 G41707 bam_address4
Once your have the two TSV files created, go back to the workspace.
Click on "Data" tab
Click "Import Data..."
Click "Importa from file"
Click "Choose file..."
Select "participant.tsv" that you created above
Click "Open"
Click "Upload" and you should see a "Upload Successful" message
Click "Import Data..."
Click "Importa from file"
Click "Choose file..."
Select "sample.tsv" that you created above
Click "Open"
Click "Upload" and you should see a "Upload Successful" message
That’s it. You have uploaded the files to the workspace.
When you click on the address for the bam files, you will see
Object: G28029_pe.Aligned.sortedByCoord.out.sorted.bam
File size: 974.40 MBOpen (right-click to download)Warning: Downloading this file may incur a large data egress charge
Please note that uploading to the Firecloud is free, but you will be charged for downloading files.
More to come later on running analysis with those uploaded files.