MIGEC log structureΒΆ
Below is the description of log files produced by various MIGEC routines.
Checkout
De-multiplexing, barcode extraction and overlapping:
INPUT_FILE_1
first input file containing R1 readsINPUT_FILE_2
second input file containing R2 readsSAMPLE
sample nameMASTER
number of reads where primary (master) barcode was detectedSLAVE
number of reads where secondary (slave) barcode was detectedMASTER+SLAVE
number of reads where both barcodes wereOVERLAPPED
number of succesfully overlapped reads
Histogram
The routine produces a number of histograms for UMI coverage, i.e. statistics of the number of reads tagged with a given UMI:
overseq.txt
contains sample id and sample type (single/paired/overlapped) in the header, followed by UMI coverage (MIG size). Each row has total read counts for UMIs corresponding to a given UMI coverageoverseq-units.txt
same asoverseq.txt
, but lists numbers of unique UMIs, not total read countsestimates.txt
contains sample id, sample type, total number of reads (TOTAL_READS
) and UMIs (TOTAL_MIGS
) in the sample and selected thresholds:OVERSEQ_THRESHOLD
- UMI coverage threshold,COLLISION_THRESHOLD
- if greater or equal toOVERSEQ_THRESHOLD
will search for UMIs that differ by a single mismatch and have a huge count difference and treat them as being the same UMI,UMI_QUAL_THRESHOLD
- threshold for min UMI sequence quality,UMI_LEN
- UMI lengthcollision1.txt
- same asoverseq.txt
, but lists only UMIs that are likely to be erroneous (i.e. have a 1-mismatch UMI neighbour with a substantially higher count)collision1-units.txt
- same ascollision1.txt
, but lists numbers of unique UMIs, not total read countspwm.txt
andpwm-units.txt
- a position weight matrix (PWM) representation of all UMI sequences
Assemble
Statistics of MIG (group of reads tagged with the same UMI) consensus sequence assembly. Note that it also contains summary of pre-filtering steps, e.g. UMIs with low coverage are filtered at this stage:
SAMPLE_ID
sample nameSAMPLE_TYPE
sample type (single/paired/overlapped)INPUT_FASTQ1
first input file containing R1 readsINPUT_FASTQ2
second input file containing R2 readsOUTPUT_ASSEMBLY1
first output file containing R1 consensusesOUTPUT_ASSEMBLY2
second output file containing R2 consensusesMIG_COUNT_THRESHOLD
UMI coverage threshold used in assemble procedureMIGS_GOOD_FASTQ1
number of succesfully assembled consensuses from R1MIGS_GOOD_FASTQ2
same for R2MIGS_GOOD_TOTAL
number of succesfully assembled consensuses that have both R1 and R2 partsMIGS_TOTAL
total number of input UMIs prior to coverage filteringREADS_GOOD_FASTQ1
number of reads in succesfully assembled consensuses from R1READS_GOOD_FASTQ2
same for R2READS_GOOD_TOTAL
number of paired reads in succesfully assembled consensuses that have both R1 and R2 parts. If a given assembled consensus contains inequal number of reads in R1 and R2, an average number is added to this statisticREADS_TOTAL
total number of input reads prior to coverage filteringREADS_DROPPED_WITHIN_MIG_1
number of reads dropped during consensus assembly as they had high number of mismatches to the consensus in R1READS_DROPPED_WITHIN_MIG_2
same for R2MIGS_DROPPED_OVERSEQ_1
number of UMIs dropped due to insufficient coverage in R1MIGS_DROPPED_OVERSEQ_2
same for R2READS_DROPPED_OVERSEQ_1
number of reads in UMIs dropped due to insufficient coverage in R1READS_DROPPED_OVERSEQ_2
same for R2MIGS_DROPPED_COLLISION_1
number of UMIs dropped due to being an erroneous (1-mismatch) variant of some UMI with higher count in R1MIGS_DROPPED_COLLISION_2
same for R2READS_DROPPED_COLLISION_1
number of reads in UMIs dropped due to being an erroneous (1-mismatch) variant of some UMI with higher count in R1READS_DROPPED_COLLISION_2
same for R2
CdrBlast
Statistics of V(D)J mapping with BLAST algorithm:
SAMPLE_ID
sample nameDATA_TYPE
raw reads (raw) or assembled consensuses (asm)OUTPUT_FILE
output file nameINPUT_FILES
list of input filesEVENTS_GOOD
number of MIGs (group of reads tagged with the same UMI, equals to number of reads for raw data) that were V(D)J mapped and passed the quality thresholdEVENTS_MAPPED
number of MIGs that were V(D)J mappedEVENTS_TOTAL
number of input MIGsREADS_GOOD
number of reads that were V(D)J mapped and passed the quality thresholdREADS_MAPPED
number of reads that were V(D)J mappedREADS_TOTAL
number of input reads
FilterCdrBlastResults
Statistics of the second round of TCR/Ig clonotype filtering that considers the number of supporting reads before and after consensus assembly:
SAMPLE_ID
sample nameOUTPUT_FILE
output file nameINPUT_RAW
input file containing CdrBlast results for raw readsINPUT_ASM
input file containing CdrBlast results for assembled consensusesCLONOTYPES_FILTERED
number of clonotypes (unique TCR/Ig V+CDR3 nucleotide+J combinations) that were filteredCLONOTYPES_TOTAL
number of input clonotypesEVENTS_FILTERED
number of MIGs in filtered clonotypesEVENTS_TOTAL
number of input MIGsREADS_FILTERED
number of reads in filtered clonotypesREADS_TOTAL
number of input readsNON_FUNCTIONAL_CLONOTYPES
number of non-functional clonotypes that contain stop codon/frameshift in CDR3NON_FUNCTIONAL_EVENTS
number of MIGs in non-functional clonotypesNON_FUNCTIONAL_READS
number of reads in non-functional clonotypes