Skip to content

9.6 Summary TSVs

*_breaksite_summary.tsv and *_breaksite_summary.condensed.tsv summarise all non-singleton break sites for each condition.

Columns

Column Description
chr Chromosome of the break site.
start Genomic start coordinate.
end Genomic end coordinate.
count Total breaks at the site (sum of plus_count and minus_count).
plus_count Breaks on the positive (+) strand.
minus_count Breaks on the negative (–) strand.
count_ratio Ratio of strands: max(plus_count, minus_count) / min(...). Value is always > 1.
width Width of the recurrent break site.
site_to_context_relative_density Break site density (count / width) divided by context density (breaks in flanking ±50 bp region divided by 100).
percent_id_to_guide_name Percent identity score from semi-global alignment of the guide sequence against the break site ±25 bp.
guide_name_match_seq Guide-like target sequence at the site (if the site intersects an in silico prediction).
guide_name_mismatches Number of mismatches to the guide sequence (if intersecting a prediction).
reproducibility_count How frequently the break site is observed across replicates (n/m).
count_(sample_name) Break count at the site in each individual replicate.
intersect_to_repeat_mask Intersection with repeat annotations.
intersect_to_gene Intersection with reference genome gene annotations.
control_count Breaks observed at the same interval in the control sample.
normalized_sample_count Breaks at the site in the treated sample normalised per million total breaks.
normalized_control_count Breaks at the site in the control normalised per million total breaks.
normalized_sample_to_control_ratio Ratio of normalised treated to normalised control break counts at the site.
normalized_sample_to_control_context_ratio As above, including a ±50 bp context window.
sample_context_count Breaks in the treated sample within the context region (site ±50 bp).
control_context_count Breaks in the control within the context region (site ±50 bp).
context_width Width of the context region (site width + 100 bp).
rationale Reason the site was nominated: frequency-based, homology-based, or frequency-based,homology-based.
guide_name Name of the matched guide sequence (if the site intersects an in silico prediction).
break_site_probability_score Percentage probability that the site resulted from the treatment rather than an endogenous process. A cutoff of ≥ 80% is suggested to select sites more likely to be true positives.