The distinctive Perl camel is (c) O'Reilly
Perl Workshop Home Page
Home of the Bioinformatics Perl Workshop perl workshop > courses > two problems (0.1.0.1) > First Look at Perl (.1/1) > altsplice.pl (.c1)

course 0.1.0.1

Level: all
0.1.0.1.1
Two real-life problems will be presented to show you how Perl is used. In Part I, the example will deal with analyzing C elegans data to address a biological question. We'll read sequence from a FASTA file and perform in-silico digests to analyze SAGE data. In Part II, we'll cover fetching, munging, and outputting - a common cycle. We'll show you how LWP::Simple and HTML::TreeBuilder can be used to fetch sample data from the web. Next, we'll examine how grep/map/sort can be used to manipulate hashes and arrays. We'll make some graphs using Graph::Undirected and GraphViz. Finally, we'll dump the munged data to a file and use grep/sort/uniq on it in bash.

legend

course code

cat.course.level.sessions.session

e.g. 1.0.1.8

categories

0 | introduction and orientation

1 | perl fundamentals

2 | shell and prompt tools

3 | web development

4 | CPAN Modules

5 | Ruby

levels

level: all all ( 0 )

level: beginner beginner ( 1 )

level: intermediate intermediate ( 2 )

level: advanced advanced ( 3 )

[ have you wondered how many different banners there are? ]

lecture code viewer

downloads

Code
First Look at Perl
First Look at Perl
Sheldon McKay
#!/usr/local/bin/perl -w # # file aptsplice.pl -- caount the number of tags/gene and relate # to the tag/transcript abundance # use strict; my $file = shift or die "Usage: ./altsplice.pl input_file\n"; die "File $file not found\n" unless -e $file; my $count = 0; # initialize a hash reference my $tags = {}; # load data from external files my @file = `grep 'coding_RNA' $file`; my $partial = `cat NlaIII_partial`; my $polyA = `cat possible_polyA`; my @introns = `cat intron_hits`; # add introns to the other tags push @file, @introns; # iterate through the list for (@file) { # split the line into 'words' my @line = split; # this is what a line would typically look like: # count tag source pos'n strand gene locus descrption # assign some variables my $freq = $line[0]; my $tag = $line[1]; my $pos = $line[3]; my $gene = $line[5]; # if no gene, it's an intron hit -- find the gene name # at the end of the line if ( !$gene ) { ($gene) = /(\S+)"$/; } # remove alternatice splice suffix $gene =~ s/[a-z]$//; # intialize an array reference for the gene if we do not have one $tags->{$gene} ||= []; # skip tags that may be due to internal polyA's or # NlaIII partial digestion next if reject( $gene, $tag ); # add the tag's count to the array push @{$tags->{$gene}}, $freq; } # initialize some hashes we will need my (%count, %total); for ( sort keys %{$tags} ) { # get the list of tag counts for this gene my @tags = @{$tags->{$_}}; # set the counter to zero my $sum = 0; # add up all the tag counts for my $f (@tags) { $sum += $f; } # count the number of genes in each *number of tags) category for my $num ( 1..10 ) { $count{$num}++ and $total{$num} += $sum if @tags == $num; } $count{11}++ and $total{11} += $sum if @tags > 10; } print "\nLibrary $file: ", scalar( keys %{$tags} ), " genes detected\n"; print "\nTranscripts\tGenes\tAbundance (average)\n"; for ( 1..11 ) { next unless $total{$_} ; my $average = int( $total{$_}/$count{$_} + 0.5 ); my $transcripts = $_ == 11 ? '>10' : $_; print "$transcripts\t\t$count{$_}\t\t$average\n"; } # this subroutine evaluated each gene/tag pair to see if # they were previosuly identified as potential artifacts sub reject { my ($gene, $tag) = @_; # are tag and gene in the partial digest file? return 1 if $partial =~ /^${gene}[a-z]?\s+$tag/m; # are tag and gene in the intternal polyA file? return 1 if $polyA =~ /^${gene}[a-z]?\s+$tag/m; return 0; }

1 | First Look at Perl | 0.1.0.1.1

0.1.0.1.1a.p1 | Processing C Elegans Data | Sheldon McKay | ppt
0.1.0.1.1a.a1 | Processing C Elegans Data | Sheldon McKay | pdf
0.1.0.1.1b.p2 | Fetching Web Data and Making Graphs | Martin Krzywinski | ppt
0.1.0.1.1b.a2 | Fetching Web Data and Making Graphs | Martin Krzywinski | pdf
0.1.0.1.1.c1 | altsplice.pl | Sheldon McKay | code
0.1.0.1.1.c2 | grabdata | Sheldon McKay | code
0.1.0.1.1.c3 | partial.pl | Sheldon McKay | code
0.1.0.1.1.c4 | tagger.pl | Sheldon McKay | code
0.1.0.1.1a.d1 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d2 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d3 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d4 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d5 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d6 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d7 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.d8 | First Look at Perl | Sheldon McKay | data
0.1.0.1.1.i1 | First Look at Perl | Sheldon McKay | images
0.1.0.1.1.i2 | First Look at Perl | Sheldon McKay | images
0.1.0.1.1.i3 | First Look at Perl | Sheldon McKay | images
0.1.0.1.1.i4 | First Look at Perl | Sheldon McKay | images
0.1.0.1.1.i5 | First Look at Perl | Sheldon McKay | images
0.1.0.1.1a.s1 | Processing C Elegans Data | Sheldon McKay | slides
0.1.0.1.1b.s1 | Fetching Web Data and Making Graphs | Martin Krzywinski | slides