Biotechnosium...

DNA Editing Tool Flips Its Target

Left to right: The side chains of V451 of the base flipping loop and R496 of the CpG recognition loop are in direct van der Waals contact. Next, the two loops—CpG recognition and base flipping—penetrate into the DNA helix from opposite directions. Finally, the 5mC flips out and binds in a cage-like pocket.

DNA Sequencing

DNA tool has been applied to the DNA sequence in this figure is restriction enzymes

DNA Nanotechnology

Around 2000, Andrew Turberfield (Oxford University's Department of Physics) used DNA to make tweezers, with arms 7 nanometers long. "Of course it's all very speculative," said Dr Turberfield, "but you can imagine, for instance, little factories on chips doing chemistry or simple assembly. You can think of production lines made up of little motors with different reactants being passed from one place to the next."

Tuesday, April 2, 2013 Tags: Bioinformatics, Coding, Perl 0 comments

Important "How to" commands for Bioinformatics (Perl)

File format conversion/line counting/counting number of files etc.

1.    $ wc –l   : count number of lines in a file.
2.    $ ls | wc –l        : count number of files in a directory.
3.    $ tac     : print the file in reverse order e.g; last line first, first line last.
4.    $ rev     : reverse the file in lines.
5.    $ sed 's/.$//' or sed 's/^M$//' or sed 's/\x0D$//' : converts a dos file into unix mode.
6.    $sed "s/$/`echo -e \\\r`/" or sed 's/$/\r/' or sed "s/$//": converts a unix newline into a DOS newline.
7.    $ awk '1; { print "" }' : Double space a file.
8.    $ awk '{ total = total + NF }; END { print total+0 }' : prints the number of words in a file.
9.    $sed '/^$/d' or [grep ‘.’] : Delete all blank lines in a file.
10.    $sed '/./,$!d' : Delete all blank lines in the beginning of the file.
11.    $sed -e :a -e '/^\n*$/{$d;N;ba' -e '}': Delete all blank lines at the end of the file.
12.    $sed -e :a -e 's/<[^>]*>//g;/
13.    $sed 's/^[ \t]*//' : deleting all leading white space tabs in a file.
14.    $ sed 's/[ \t]*$//' : Delete all trailing white space and tab in a file.
15.    $ sed 's/^[ \t]*//;s/[ \t]*$//' : Delete both leading and trailing white space and tab in a file.

2.2 Working with Patterns/numbers in a sequence file

16.    $awk '/Pattern/ { n++ }; END { print n+0 }' : print the total number of lines containing the word pattern.
17.    $sed 10q : print first 10 lines.
18.    $sed -n '/regexp/p' : Print the line that matches the pattern.
19.    $sed '/regexp/d' : Deletes the lines that matches the regexp.
20.    $sed -n '/regexp/!p' : Print the lines that does not match the pattern.
21.    $sed '/regexp/!d' : Deletes the lines that does NOT match the regular expression.
22.    $sed -n '/^.\{65\}/p' : print lines that are longer than 65 characters.
23.    $sed -n '/^.\{65\}/!p' : print lines that are lesser than 65 characters.
24.    $sed -n '/regexp/{g;1!p;};h' : print one line before the pattern match.
25.    $sed -n '/regexp/{n;p;}' : print one line after the pattern match.
26.    $sed -n '/^.\{65\}/ {g;1!p;};h' < sojae_seq > tmp : print the names of the sequences that are larger than 65 nucleotide long.
27.    $sed -n '/regexp/,$p' : Print regular expression to the end of file.
28.    $sed -n '8,12p' : print line 8 to 12(inclusive)
29.    $sed -n '52p' : print only line number 52.
30.    $seq ‘/pattern1/,/pattern2/d’ < inputfile > outfile : will delete all the lines between pattern1 and pattern2.
31.    $sed ‘/20,30/d’ < inputfile > outfile : will delete all lines between 20 and 30.   OR sed ‘/20,30/d’ < input > output will delete lines between 20 and 30.
32.    awk '/baz/ { gsub(/foo/, "bar") }; { print }' : Substitute foo with bar in lines that contains ‘baz’.
33.    awk '!/baz/ { gsub(/foo/, "bar") }; { print }' : Substitute foo with bar in lines that does not contain ‘baz’.
34.    grep –i –B 1 ‘pattern’ filename > out : Will print the name of the sequence and the sequence having the pattern in a case insensitive way(make sure the sequence name and the sequence each occupy a single line).
35.    grep –i –A 1 ‘seqname’ filename > out : will print the sequence name as well as the sequence into file ‘out’.

2.3 Inserting Data into a file:

36. gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1' > fileout : will insert 49 ‘X’ in the sixth position of every line.

37. gawk --re-interval 'BEGIN{ s="YourName" }; { sub(/^.{6}/,"&" s) }; 1' : Insert your name at the 6 th position in every line.

3. Working with Data Files[Tab delimited files]:

3.1    Error Checking and data handling:
38.    awk '{ print NF ":" $0 } ' : print the number of fields of each line followed by the line.
39.    awk '{ print $NF }' : print the last field of each line.
40.    awk 'NF > n' : print every line with more than ‘n’ fields.
41.    awk '$NF > n' : print every line where the last field is greater than n.
42.    awk '{ print $2, $1 }' : prints just first 2 fields of a data file in reverse order.
43.    awk '{ temp = $1; $1 = $2; $2 = temp; print }' : prints all the fields in the correct order except the first 2 fields.
44.    awk '{ for (i=NF; i>0; i--) printf("%s ", $i); printf ("\n") }' : prints all the fields in reverse order.
45.    awk '{ $2 = ""; print }' : deletes the 2nd field in each line.
46.    awk '$5 == "abc123"' : print each line where the 5th field is equal to ‘abc123’.
47.    awk '$5 != "abc123"' : print each line where 5th field is NOT equal to abc123.
48.    awk '$7 ~ /^[a-f]/' : Print each line whose 7th field matches the regular expression.
49.    awk '$7 !~ /^[a-f]/' : print each line whose 7th field does NOT match the regular expression.
50.    cut –f n1,n2,n3.. > output file : will cut n1,n2,n3 columns(fields) from input file and print the output in output file. If delimiter is other than TAB then give additional argument such as cut –d ‘,’ –f n1,n2.. inputfile > out
51.    sort –n –k 2,2 –k 4,4 file > fileout : Will conduct a numerical sort of column 2, and then column 4. If –n is not specified, then, sort will do a lexicographical sort(of the ascii value).

4. Miscellaneous:

52.    uniq –u inputfile > out : will print only the uniq lines present in the sorted input file.
53.    uniq –d inputfile > out : will print only the lines that are in doubles from the sorted input file.
54.    cat file1 file2 file3 … fileN > outfile : Will concatenate files back to back in outfile.
55.    paste file1 file2 > outfile : will merge two files horizontally. This function is good for merging with same number of rows but different column width.
56.    !:p : will print the previous command run with the ‘pattern’ in it.
57.    !! : repeat the last command entered at the shell.
58.    ~ : Go back to home directory
59.    echo {a,t,g,c}{a,t,g,c}{a,t,g,c}{a,t,g,c} : will generate all tetramers using ‘atgc’. If you want pentamers/hexamers etc. then just increase the number of bracketed entities.NOTE: This is not a efficient sequence shuffler. If you wish to generate longer sequences then use other means.
60.    kill -HUP ` ps -aef | grep -i firefox | sort -k 2 -r | sed 1d | awk ' { print $2 } ' ` : Kills a hanging firefox process.
61.    csplit -n 7 input.fasta '/>/' '{*}' : will split the file ‘input.fasta’ wherever it encounters delimiter ‘>’. The file names will appear as 7 digit long strings.
62.    find . -name data.txt –print: finds and prints the path for file data.txt.
Sample Script to make set operations on sequence files:
63.    grep ‘>’ filenameA > list1 # Will list just the sequence names in a file names.
grep ‘>’ filenameB > list2 # Will list names for file 2
cat list1 list2 > tmp # concatenates list1 and list2 into tmp
sort tmp > tmp1 # File sorted
uniq –u tmp1 > uniq    # AUB – A ∩ B (OR (A-B) U (B-A))
uniq –d tmp1 > double # Is the intersection (A ∩ B)
cat uniq double > Union # AUB
cat list1 double > tmp
sort tmp | uniq –u > list1uniq # A - B
cat list2 double > tmp
sort tmp | uniq –u > list2uniq # B - A

PERL ONELINERS:

1.    perl -pe '$\="\n"'   : double space a file
2.    perl -pe '$_ .= "\n" unless /^$/' : double space a file except blank lines
3.    perl -pe '$_.="\n"x7' : 7 space in a line.
4.    perl -ne 'print unless /^$/' : remove all blank lines
5.    perl -lne 'print if length($_) < 20' : print all lines with length less than 20.
6.    perl -00 -pe '' : If there are multiple spaces, delete all leaving one(make the file a single spaced file).
7.    perl -00 -pe '$_.="\n"x4' : Expand single blank lines into 4 consecutive blank lines
8.    perl -pe '$_ = "$. $_"': Number all lines in a file
9.    perl -pe '$_ = ++$a." $_" if /./' : Number only non-empty lines in a file
10.    perl -ne 'print ++$a." $_" if /./' : Number and print only non-empty lines in a file
11.    perl -pe '$_ = ++$a." $_" if /regex/' ; Number only lines that match a pattern
12.    perl -ne 'print ++$a." $_" if /regex/' : Number and print only lines that match a pattern
13.    perl -ne 'printf "%-5d %s", $., $_ if /regex/' : Left align lines with 5 white spaces if matches a pattern (perl -ne 'printf "%-5d %s", $., $_' : for all the lines)
14.    perl -le 'print scalar(grep{/./}<>)' : prints the total number of non-empty lines in a file
15.    perl -lne '$a++ if /regex/; END {print $a+0}' : print the total number of lines that matches the pattern
16.    perl -alne 'print scalar @F' : print the total number fields(words) in each line.
17.    perl -alne '$t += @F; END { print $t}' : Find total number of words in the file
18.    perl -alne 'map { /regex/ && $t++ } @F; END { print $t }' : find total number of fields that match the pattern
19.    perl -lne '/regex/ && $t++; END { print $t }' : Find total number of lines that match a pattern
20.    perl -le '$n = 20; $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $m' : will calculate the GCD of two numbers.
21.    perl -le '$a = $n = 20; $b = $m = 35; ($m,$n) = ($n,$m%$n) while $n; print $a*$b/$m' : will calculate lcd of 20 and 35.
22.    perl -le '$n=10; $min=5; $max=15; $, = " "; print map { int(rand($max-$min))+$min } 1..$n' : Generates 10 random numbers between 5 and 15.
23.    perl -le 'print map { ("a".."z",”0”..”9”)[rand 36] } 1..8': Generates a 8 character password from a to z and number 0 – 9.
24.    perl -le 'print map { ("a",”t”,”g”,”c”)[rand 4] } 1..20': Generates a 20 nucleotide long random residue.
25.    perl -le 'print "a"x50': generate a string of ‘x’ 50 character long
26.    perl -le 'print join ", ", map { ord } split //, "hello world"': Will print the ascii value of the string hello world.
27.    perl -le '@ascii = (99, 111, 100, 105, 110, 103); print pack("C*", @ascii)': converts ascii values into character strings.
28.    perl -le '@odd = grep {$_ % 2 == 1} 1..100; print "@odd"': Generates an array of odd numbers.
29.    perl -le '@even = grep {$_ % 2 == 0} 1..100; print "@even"': Generate an array of even numbers
30.    perl -lpe 'y/A-Za-z/N-ZA-Mn-za-m/' file: Convert the entire file into 13 characters offset(ROT13)
31.    perl -nle 'print uc' : Convert all text to uppercase:
32.    perl -nle 'print lc' : Convert text to lowercase:
33.    perl -nle 'print ucfirst lc' : Convert only first letter of first word to uppercas
34.    perl -ple 'y/A-Za-z/a-zA-Z/' : Convert upper case to lower case and vice versa
35.    perl -ple 's/(\w+)/\u$1/g' : Camel Casing
36.    perl -pe 's|\n|\r\n|' : Convert unix new lines into DOS new lines:
37.    perl -pe 's|\r\n|\n|' : Convert DOS newlines into unix new line
38.    perl -pe 's|\n|\r|' : Convert unix newlines into MAC newlines:
39.    perl -pe '/regexp/ && s/foo/bar/' : Substitute a foo with a bar in a line with a regexp.

Some other Perl Tricks

Want to display some progress bars while perl does your job:

For this perl provides a nice utility called "pipe opens" ('perldoc -f open' will provide more info)

open(my $file, '-|', 'command','option', 'option', ...) or die "Could not run tar ... - $!";
  while (<$file>) {
       print "-";
  }
  print "\n";
  close($file);

Will print - on the screen till the process is completed

Thursday, February 28, 2013 Tags: Bioinformatics, Bioinformatics Database, DNA, Docking, Proteomics 0 comments

How to Use ADT (Auto Dock Tool) Docking Manual

Full Explanation of docking tool using ADT Auto Dock Tool.

Thursday, June 14, 2012 Tags: Bioinformatics, Bioinformatics Database, Comparative genomics, DNA, DNA Sequencing, Gene structure, Nucleotide Sequenc, RNA sequence 0 comments

Bioinformatics Database

1.Nucleotide Sequence Databases

1.1 International Nucleotide Sequence Database Collaboration

Database name	Full name and/or description	URL
DDBJ-DNA Data Bank of Japan	All known nucleotide and protein sequences	http://www.ddbj.nig.ac.jp
EMBL-Nucleotide Sequence Database	All known nucleotide and protein sequences	http://www.ebi.ac.uk/embl.html
GenBank	All known nucleotide and protein sequences	http://www.ncbi.nlm.nih.gov/Entrez

1.2. DNA sequences: genes, motifs and regulatory sites

1.2.1. Coding and coding DNA

Database name	Full name and/or description	URL
ACLAME	A classification of genetic mobile elements	http://aclame.ulb.ac.be/
CUTG	Codon usage tabulated from GenBank	http://www.kazusa.or.jp/codon/
Genetic Codes	Genetic codes in various organisms and organelles	http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
Entrez Gene	Gene-centered information at NCBI	http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene
HERVd	Human endogenous retrovirus database	http://herv.img.cas.cz
Hoppsigen	Human and mouse homologous processed pseudogenes	http://pbil.univ-lyon1.fr/databases/hoppsigen.html
Imprinted Gene Catalogue	Imprinted genes and parent-of-origin effects in animals	http://www.otago.ac.nz/IGC
Islander	Pathogenicity islands and prophages in bacterial genomes	http://www.indiana.edu/~islander
MICdb	Prokaryotic microsatellites	http://www.cdfd.org.in/micas
NPRD	Nucleosome positioning region database	http://srs6.bionet.nsc.ru/srs6/
STRBase	Short tandem DNA repeats database	http://www.cstl.nist.gov/div831/strbase/
TIGR Gene Indices	Organism-specific databases of EST and gene sequences	http://www.tigr.org/tdb/tgi.shtml
Transterm	Codon usage, start and stop signals	http://uther.otago.ac.nz/Transterm.html
UniGene	Non-redundant set of eukaryotic gene-oriented clusters	http://www.ncbi.nlm.nih.gov/UniGene/
UniVec	Vector sequences, adapters, linkers and primers used in DNA cloning, can be used to check for vector contamination	http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html
VectorDB	Characterization and classification of nucleic acid vectors	http://genome-www2.stanford.edu/vectordb/
Xpro	Eukaryotic protein-encoding DNA sequences, both intron-containing and intron- less genes	http://origin.bic.nus.edu.sg/xpro/

1.2.2. Gene structure, introns and exons, splice sites

Database name	Full name and/or description	URL
ASAP	Alternative spliced isoforms	http://www.bioinformatics.ucla.edu/ASAP
ASD	Alternative splicing database at EBI, includes three databases AltSplice, AltExtron and AEdb	http://www.ebi.ac.uk/asd
ASDB	Alternative splicing database: protein products and expression patterns of alternatively spliced genes	http://hazelton.lbl.gov/~teplitski/alt
ASHESdb	Alternatively spliced human genes by exon skipping database	http://sege.ntu.edu.sg/wester/ashes/
EASED	Extended alternatively spliced EST database	http://eased.bioinf.mdc-berlin.de/
ECgene	Genome annotation for alternative splicing	http://genome.ewha.ac.kr/ECgene/
EDAS	EST-derived alternative splicing database	http://www.ig-msk.ru:8005/EDAS/
ExInt	Exon intron structure of eukaryotic genes	http://sege.ntu.edu.sg/wester/exint/
HS3D	Homo sapiens splice sites dataset	http://www.sci.unisannio.it/docenti/rampone/
Intronerator	Alternative splicing in C.elegans and C.briggsae	http://www.cse.ucsc.edu/~kent/intronerator/
SpliceDB	Canonical and non-canonical mammalian splice sites	http://www.softberry.com/berry.phtml?topic= splicedb&group=data&subgroup=spldb
SpliceInfo	Modes of alternative splicing in human genome	http://140.115.50.96/SpliceInfo/
SpliceNest	A tool for visualizing splicing of genes from EST data	http://splicenest.molgen.mpg.de/

1.2.3. Transcriptional regulator sites and transcription factors

Biotechnosium...

Follow Us On Twitter

Keep UpTo Date

Labels

Followers

My Blog List

Pages

Blog Archive

Feedjit