File format conversion/line counting/counting number of files etc.
1. $ wc –l
2. $ ls | wc –l : count number of files in a directory.
3. $ tac
4. $ rev
5. $ sed 's/.$//' or sed 's/^M$//' or sed 's/\x0D$//' : converts a dos file into unix mode.
6. $sed "s/$/`echo -e \\\r`/" or sed 's/$/\r/' or sed "s/$//": converts a unix newline into a DOS newline.
7. $ awk '1; { print "" }' : Double space a file.
8. $ awk '{ total = total + NF }; END { print total+0 }' : prints the number of words in a file.
9. $sed '/^$/d' or [grep ‘.’] : Delete all blank lines in a file.
10. $sed '/./,$!d' : Delete all blank lines in the beginning of the file.
11. $sed -e :a -e '/^\n*$/{$d;N;ba' -e '}': Delete all blank lines at the end of the file.
12. $sed -e :a -e 's/<[^>]*>//g;/
13. $sed 's/^[ \t]*//' : deleting all leading white space tabs in a file.
14. $ sed 's/[ \t]*$//' : Delete all trailing white space and tab in a file.
15. $ sed 's/^[ \t]*//;s/[ \t]*$//' : Delete both leading and trailing white space and tab in a file.
2.2 Working with Patterns/numbers in a sequence file
16. $awk '/Pattern/ { n++ }; END { print n+0 }' : print the total number of lines containing the word pattern.17. $sed 10q : print first 10 lines.
18. $sed -n '/regexp/p' : Print the line that matches the pattern.
19. $sed '/regexp/d' : Deletes the lines that matches the regexp.
20. $sed -n '/regexp/!p' : Print the lines that does not match the pattern.
21. $sed '/regexp/!d' : Deletes the lines that does NOT match the regular expression.
22. $sed -n '/^.\{65\}/p' : print lines that are longer than 65 characters.
23. $sed -n '/^.\{65\}/!p' : print lines that are lesser than 65 characters.
24. $sed -n '/regexp/{g;1!p;};h' : print one line before the pattern match.
25. $sed -n '/regexp/{n;p;}' : print one line after the pattern match.
26. $sed -n '/^.\{65\}/ {g;1!p;};h' < sojae_seq > tmp : print the names of the sequences that are larger than 65 nucleotide long.
27. $sed -n '/regexp/,$p' : Print regular expression to the end of file.
28. $sed -n '8,12p' : print line 8 to 12(inclusive)
29. $sed -n '52p' : print only line number 52.
30. $seq ‘/pattern1/,/pattern2/d’ < inputfile > outfile : will delete all the lines between pattern1 and pattern2.
31. $sed ‘/20,30/d’ < inputfile > outfile : will delete all lines between 20 and 30. OR sed ‘/20,30/d’ < input > output will delete lines between 20 and 30.
32. awk '/baz/ { gsub(/foo/, "bar") }; { print }' : Substitute foo with bar in lines that contains ‘baz’.
33. awk '!/baz/ { gsub(/foo/, "bar") }; { print }' : Substitute foo with bar in lines that does not contain ‘baz’.
34. grep –i –B 1 ‘pattern’ filename > out : Will print the name of the sequence and the sequence having the pattern in a case insensitive way(make sure the sequence name and the sequence each occupy a single line).
35. grep –i –A 1 ‘seqname’ filename > out : will print the sequence name as well as the sequence into file ‘out’.
2.3 Inserting Data into a file:
36. gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1'
37. gawk --re-interval 'BEGIN{ s="YourName" }; { sub(/^.{6}/,"&" s) }; 1' : Insert your name at the 6 th position in every line.
3. Working with Data Files[Tab delimited files]:
3.1 Error Checking and data handling:
38. awk '{ print NF ":" $0 } ' : print the number of fields of each line followed by the line.
39. awk '{ print $NF }' : print the last field of each line.
40. awk 'NF > n' : print every line with more than ‘n’ fields.
41. awk '$NF > n' : print every line where the last field is greater than n.
42. awk '{ print $2, $1 }'
43. awk '{ temp = $1; $1 = $2; $2 = temp; print }' : prints all the fields in the correct order except the first 2 fields.
44. awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }' : prints all the fields in reverse order.
45. awk '{ $2 = ""; print }' : deletes the 2nd field in each line.
46. awk '$5 == "abc123"' : print each line where the 5th field is equal to ‘abc123’.
47. awk '$5 != "abc123"' : print each line where 5th field is NOT equal to abc123.
48. awk '$7 ~ /^[a-f]/' : Print each line whose 7th field matches the regular expression.
49. awk '$7 !~ /^[a-f]/' : print each line whose 7th field does NOT match the regular expression.
50. cut –f n1,n2,n3..
51. sort –n –k 2,2 –k 4,4 file > fileout : Will conduct a numerical sort of column 2, and then column 4. If –n is not specified, then, sort will do a lexicographical sort(of the ascii value).
4. Miscellaneous:
52. uniq –u inputfile > out : will print only the uniq lines present in the sorted input file.53. uniq –d inputfile > out : will print only the lines that are in doubles from the sorted input file.
54. cat file1 file2 file3 … fileN > outfile : Will concatenate files back to back in outfile.
55. paste file1 file2 > outfile : will merge two files horizontally. This function is good for merging with same number of rows but different column width.
56. !
57. !! : repeat the last command entered at the shell.
58. ~ : Go back to home directory
59. echo {a,t,g,c}{a,t,g,c}{a,t,g,c}{a,t,g,c} : will generate all tetramers using ‘atgc’. If you want pentamers/hexamers etc. then just increase the number of bracketed entities.NOTE: This is not a efficient sequence shuffler. If you wish to generate longer sequences then use other means.
60. kill -HUP ` ps -aef | grep -i firefox | sort -k 2 -r | sed 1d | awk ' { print $2 } ' ` : Kills a hanging firefox process.
61. csplit -n 7 input.fasta '/>/' '{*}' : will split the file ‘input.fasta’ wherever it encounters delimiter ‘>’. The file names will appear as 7 digit long strings.
62. find . -name data.txt –print: finds and prints the path for file data.txt.
Sample Script to make set operations on sequence files:
63. grep ‘>’ filenameA > list1 # Will list just the sequence names in a file names.
grep ‘>’ filenameB > list2 # Will list names for file 2
cat list1 list2 > tmp # concatenates list1 and list2 into tmp
sort tmp > tmp1 # File sorted
uniq –u tmp1 > uniq # AUB – A ∩ B (OR (A-B) U (B-A))
uniq –d tmp1 > double # Is the intersection (A ∩ B)
cat uniq double > Union # AUB
cat list1 double > tmp
sort tmp | uniq –u > list1uniq # A - B
cat list2 double > tmp
sort tmp | uniq –u > list2uniq # B - A
PERL ONELINERS:
1. perl -pe '$\="\n"'
2. perl -pe '$_ .= "\n" unless /^$/' : double space a file except blank lines
3. perl -pe '$_.="\n"x7' : 7 space in a line.
4. perl -ne 'print unless /^$/' : remove all blank lines
5. perl -lne 'print if length($_) < 20' : print all lines with length less than 20.
6. perl -00 -pe '' : If there are multiple spaces, delete all leaving one(make the file a single spaced file).
7. perl -00 -pe '$_.="\n"x4' : Expand single blank lines into 4 consecutive blank lines
8. perl -pe '$_ = "$. $_"': Number all lines in a file
9. perl -pe '$_ = ++$a." $_" if /./' : Number only non-empty lines in a file
10. perl -ne 'print ++$a." $_" if /./' : Number and print only non-empty lines in a file
11. perl -pe '$_ = ++$a." $_" if /regex/' ; Number only lines that match a pattern
12. perl -ne 'print ++$a." $_" if /regex/' : Number and print only lines that match a pattern
13. perl -ne 'printf "%-5d %s", $., $_ if /regex/' : Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)
14. perl -le 'print scalar(grep{/./}<>)' : prints the total number of non-empty lines in a file
15. perl -lne '$a++ if /regex/; END {print $a+0}' : print the total number of lines that matches the pattern
16. perl -alne 'print scalar @F' : print the total number fields(words) in each line.
17. perl -alne '$t += @F; END { print $t}' : Find total number of words in the file
18. perl -alne 'map { /regex/ && $t++ } @F; END { print $t }' : find total number of fields that match the pattern
19. perl -lne '/regex/ && $t++; END { print $t }' : Find total number of lines that match a pattern
20. perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m' : will calculate the GCD of two numbers.
21. perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m' : will calculate lcd of 20 and 35.
22. perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n' : Generates 10 random numbers between 5 and 15.
23. perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8': Generates a 8 character password from a to z and number 0 – 9.
24. perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20': Generates a 20 nucleotide long random residue.
25. perl -le 'print "a"x50': generate a string of ‘x’ 50 character long
26. perl -le 'print join ", ", map { ord } split //, "hello world"': Will print the ascii value of the string hello world.
27. perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)': converts ascii values into character strings.
28. perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"': Generates an array of odd numbers.
29. perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"': Generate an array of even numbers
30. perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file: Convert the entire file into 13 characters offset(ROT13)
31. perl -nle 'print uc' : Convert all text to uppercase:
32. perl -nle 'print lc' : Convert text to lowercase:
33. perl -nle 'print ucfirst lc' : Convert only first letter of first word to uppercas
34. perl -ple 'y/A-Za-z/a-zA-Z/' : Convert upper case to lower case and vice versa
35. perl -ple 's/(\w+)/\u$1/g' : Camel Casing
36. perl -pe 's|\n|\r\n|' : Convert unix new lines into DOS new lines:
37. perl -pe 's|\r\n|\n|' : Convert DOS newlines into unix new line
38. perl -pe 's|\n|\r|' : Convert unix newlines into MAC newlines:
39. perl -pe '/regexp/ && s/foo/bar/' : Substitute a foo with a bar in a line with a regexp.
Some other Perl Tricks
Want to display some progress bars while perl does your job:
For this perl provides a nice utility called "pipe opens" ('perldoc -f open' will provide more info)
open(my $file, '-|', 'command','option', 'option', ...) or die "Could not run tar ... - $!";
while (<$file>) {
print "-";
}
print "\n";
close($file);
Will print - on the screen till the process is completed
No Response to "Important "How to" commands for Bioinformatics (Perl)"
Post a Comment