#!/bin/bash
##################################################################################
#Andy Rampersaud, 01.27.16
##################################################################################
#Assumptions for this job to run correctly:
#1. You have already run the TopHat_Single_End job
#2. Your data is organized in the following way:
#You have a data set dir such as:
#/projectnb/wax-es/aramp10/G83_Samples
#Within this dir you have sample specific folders such as:
#G83_M1
#G83_M2
#G83_M3
#G83_M4
#Within each sample specific folder you have a "fastq" folder with:
#Files: *_R1_*.fastq.gz and *_R2_*.fastq.gz such as:
#Waxman-TP17_CGATGT_L007_R1_001.fastq.gz
#Waxman-TP17_CGATGT_L007_R2_001.fastq.gz
#Within each sample specific folder you have a "tophat2" folder containing output files
##################################################################################
#Fill in the following information:
##################################################################################
#Information about your data set
#As mentioned above, you should have a data set dir containing your sample specific folders:
#Dataset_DIR=/projectnb/wax-es/aramp10/G83_Samples
##################################################################################
#Samples to process
#To facilitate processing of samples in parallel we can use a text file that lists the samples to analyze
#Note: this text file is still valid even if there is only one sample to process
#You need to have a "Sample_Labels" dir within your Dataset_DIR
#Within the Sample_Labels dir have a Sample_Labels.txt such that:
################################################
#The text file is formatted like the following:
#----------------------------------------------
#Sample_DIR	Sample_ID	Description
#Sample_Waxman-TP17	G83_M1	Male 8wk-pool 1
#Sample_Waxman-TP18	G83_M2	Male 8wk-pool 2
#Sample_Waxman-TP19	G83_M3	Female 8wk-pool 1
#Sample_Waxman-TP20	G83_M4	Female 8wk-pool 2	
#----------------------------------------------
#The 1st column: The Sample_DIR name
#The 2nd column: Waxman Lab Sample_ID 
#The 3rd column: Sample's description 
################################################
#Sample_Labels_DIR=${Dataset_DIR}/Sample_Labels
##################################################################################
#GTF_Files_DIR
#Need this dir that contains the various GTF files
#Feel free to use my dir but it's better practice to have a copy in your own Dataset_DIR
#GTF_Files_DIR=/projectnb/wax-es/aramp10/GTF_Files
##################################################################################
#The featureCounts program does not have different counting modes like HTSeq
#Here's a description of how the program works:
#http://bioinf.wehi.edu.au/featureCounts/
#---------------------------------------------------------------------------------
# Overlap between reads and features
#A read is said to overlap a feature if at least one read base is found to overlap the feature. For paired-end data, a fragment (or template) is said to overlap a feature if any of the two reads from that fragment is found to overlap the feature.

#By default, featureCounts does not count reads overlapping with more than one feature (or more than one meta-feature when summarizing at meta-feature level). Users can use the -O option to instruct featureCounts to count such reads (they will be assigned to all their overlapping features or meta-features).

#Note that, when counting at the meta-feature level, reads that overlap multiple features of the same meta-feature are always counted exactly once for that meta-feature, provided there is no overlap with any other meta-feature. For example, an exon-spanning read will be counted only once for the corresponding gene even if it overlaps with more than one exon. 
#---------------------------------------------------------------------------------
##################################################################################
#Type of sequencing:
#-s <int>  	Indicate if strand-specific read counting should be performed.
#              	It has three possible values:  0 (unstranded), 1 (stranded) and
#              	2 (reversely stranded). 0 by default.
#what is the strandedness of your rna-seq dataset:
#0 (unstranded):		the dataset is unstranded
#1 (stranded):			the dataset is stranded. Also, reads are mapped to the same strand of the transcription
#2 (reversely stranded):	the dataset is stranded, Also, reads are mapped to the opposite direction of the transcription
#---------------------------------------------------------------------------------
#Emailed note regarding the "NEBNext Ultra Directional kit":
#The RNA molecules from our samples are complementary to the template strand of our gene.  So first strand of DNA synthesis is actually re-creating the template strand of our gene.  Sequencing this fragment, then mapping to the reference genome, the RNA-Seq reads will map to the template strand of the reference genome.  But we actually want our RNA-Seq reads to map to the coding strand of the reference genome.  So for this reason, the "reverse" option makes sense because the data is single-end reads mapping to the opposite strand as the feature (which satisfies the definition of "reverse" according to the HTSeq documentation)
#---------------------------------------------------------------------------------
#Choose one:
#STRANDEDNESS="0"
#STRANDEDNESS="1"
#STRANDEDNESS="2"
#---------------------------------------------------------------------------------
##################################################################################
#FEATURE_ID will be "gene_id"
#FEATURE_ID="gene_id"
##################################################################################
#Need to get the current dir
#SCRIPT_DIR=$(pwd)
##################################################################################
#Time hour limit
#On SCC a 12-hour runtime limit is enforced on all jobs, unless specified explicitly. 
#A runtime limit can be specified in the format "hh:mm:ss"
#Dont change the following time limit value unless you know that your job is going to go over 12 hrs 
#TIME_LIMIT="12:00:00"
##################################################################################