Python 3: Biopython: Bio.SeqIO.parse

Parse DNA Sequence Records in FASTA or GenBank Formats

nick3499
1 min readDec 24, 2017

FASTA Format

First, navigate to the working directory. Then, download FASTA-formatted data file, containing DNA sequence records by entering the following in a Unix-like CLI:

wget https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.fasta

Next, to parse the records, enter the following:

python3 ls_orchid_fasta.py

Which will return the following:

gi|2765658|emb|Z78533.1|CIZ78533 C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', SingleLetterAlphabet())
gi|2765657|emb|Z78532.1|CCZ78532 C.californicum 5.8S rRNA gene and ITS1 and ITS2 DNA
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAACAG...GGC', SingleLetterAlphabet())
gi|2765656|emb|Z78531.1|CFZ78531 C.fasciculatum 5.8S rRNA gene and ITS1 and ITS2 DNA
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAGCAG...TAA', SingleLetterAlphabet())
gi|2765655|emb|Z78530.1|CMZ78530 C.margaritaceum 5.8S rRNA gene and ITS1 and ITS2 DNA
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAAACAACAT...CAT', SingleLetterAlphabet())
gi|2765654|emb|Z78529.1|CLZ78529 C.lichiangense 5.8S rRNA gene and ITS1 and ITS2 DNA
. . .

GenBank Format

First, navigate to the working directory. Then, download GenBank-formatted data file, containing DNA sequence records by entering the following in a Unix-like CLI:

wget https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.gbk

Next, to parse the records, enter the following:

python3 ls_orchid_gbk.py

Which will return the following:

Z78533.1 | C.irapeanum 5.8S rRNA gene and ITS1 and ITS2 DNA.
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', IUPACAmbiguousDNA())
Z78532.1 | C.californicum 5.8S rRNA gene and ITS1 and ITS2 DNA.
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAACAG...GGC', IUPACAmbiguousDNA())
Z78531.1 | C.fasciculatum 5.8S rRNA gene and ITS1 and ITS2 DNA.
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGACAGCAG...TAA', IUPACAmbiguousDNA())
Z78530.1 | C.margaritaceum 5.8S rRNA gene and ITS1 and ITS2 DNA.
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAAACAACAT...CAT', IUPACAmbiguousDNA())
Z78529.1 | C.lichiangense 5.8S rRNA gene and ITS1 and ITS2 DNA.
Seq('ACGGCGAGCTGCCGAAGGACATTGTTGAGACAGCAGAATATACGATTGAGTGAA...AAA', IUPACAmbiguousDNA())
. . .

--

--

nick3499
nick3499

Written by nick3499

coder of JavaScript and Python

No responses yet