vuoto
vuoto

REDIdb Flatfile Structure

The REDIdb flatfile is the unit of information in the REDIdb database. Its format and representation is like to a GenBank flatfile. Each REDIdb flatfile is divided into three parts described below:


HEADER

The header contains the main descriptors of the record including accession number, organism name, cellular location, type and name of the gene, GenBank and PubMed accessions.

EDITING

The EDITING describes all editing features that characterize the annotations on the record. Here, there are specific fields for editing type and details, and an additional field containing all positions for each editing event.

SEQUENCES

The last part of REDIdb flatfile contains the nucleotide genomic sequence and the corresponding transcript, annotated as cDNA.


The following is a sample record of REDIdb flatfile. The description of each element or field is available by clicking on any specific link.


ACCESSION   EDI_000000281               2006-03-31
ORGANISM    Anthoceros formosae
LOCATION    chloroplast
SOURCE      gene    rpl21    complete
GENBANK     NC_004543
PUBMED      12527781
EDITING TYPE      Substitution
  DETAILS   C-->U    Experimental evidence
  POSITIONS    n. 5
    178
    181
    311
    322
    326
EDITING TYPE      Substitution
  DETAILS   U-->C    Experimental evidence
  POSITIONS    n. 5
    43
    158
    179
    274
    307
ORIGIN
  GENOMIC
  LENGTH    357
  BASE COUNT  130a  51c  63g  113t
        1 atgaatacat atgcaataat tgataccgga ggtgagcaac tctgagttga accaggaaga
       61 ttttatgata tgcgtcactt tactttacta aatccgagta ttttagattc aaatactaaa
      121 gtattaatct atcgagtatt aatgattcat catgaattga atatcgctct tggagatctc
      181 cggttagaag atgcaacaat taaaggaaga gttttacatt ctcactttaa agacaaaatt
      241 acaatttaca aaatgcgttc taagaaaaag atgtgacgta aattaggata tcgattaaat
      301 ttagcctgat ctgtggtgga tcccacttgt ttcgatggaa aagaattcta caaataa
  cDNA
  LENGTH    357
  BASE COUNT  130a  51c  63g  113t
        1 atgaatacat atgcaataat tgataccgga ggtgagcaac tccgagttga accaggaaga
       61 ttttatgata tgcgtcactt tactttacta aatccgagta ttttagattc aaatactaaa
      121 gtattaatct atcgagtatt aatgattcat catgaatcga atatcgctct tggagattcc
      181 tggttagaag atgcaacaat taaaggaaga gttttacatt ctcactttaa agacaaaatt
      241 acaatttaca aaatgcgttc taagaaaaag atgcgacgta aattaggata tcgattaaat
      301 ttagcccgat ttgtggtgga ttccatttgt ttcgatggaa aagaattcta caaataa
//Stop


FIELD COMMENTS

ACCESSION

The ACCESSION number is a unique identifier for a sequence record. Each REDIdb accession number is generated by a combination of three letters (EDI) and nine numbers separated by an underscore bar, such as EDI_000000281 or EDI_000000253. The "EDI" prefix states for "EDIting". Since accession numbers are unique, they do not change even if information in the record is changed.

The date in the ACCESSION field is the date of last modification. In the sample record shown above, the last modification occurred on 2006-03-31. In general, the modification date format used in REDIdb is YYYY-MM-DD, where YYYY is the year, MM is the month expressed by numbers and DD is the day.
Many times, the modification date corresponds to the release date.


ORGANISM

This field contains the name of the source organism (genus and species), based on the nomenclature adopted by GenBank.


LOCATION

The LOCATION indicates the cellular localization of the editing process. At the moment the localization is limited to organelles, mitochondrion and chloroplast, even if nuclear and viral editing events will be added to REDIdb in a future release.


SOURCE

The field SOURCE gives information about the type of molecular source of the record. It might be a transfer RNA ("tRNA"), a ribosomal ("rRNA"), an intron ("intron") or a coding sequence ("gene"). Flanking the molecular type, there is its current name and an indication about the completeness of the annotation ("complete" for complete sequence and "partial" for partial sequence). Please, note that some genes might be called in different ways and thus, before a REDIdb search, check for gene alias according to the table below or have a look here.

GENE

ALIAS

ccb6c

ccb452 ccmFc

ccb6-nA

ccb382 ccmFN1

ccb6-nB

ccb203 ccmFN2

ccb3

ccb256 ccmC

ccb2

ccmB

ccb6n

ccmFn

When the molecular source of the record is an intron, its name contains the name of the related gene and the relative index. For example, the "nad7i4" is equal to "intron number 4 belonging to the nad7 gene". In the case of genes containing only one intron, the relative index is fixed to 1, such as "rps10i1".


GENBANK

Here is reported the GenBank accession number from which the annotated sequence was extracted. Sometimes, the annotated sequence is a combination of exons independently indexed in GenBank and, thus, more than one accession number might appear.


PUBMED

Like GENBANK field, PUBMED is a list of PubMed accessions to published articles related to the annotation.


EDITING TYPE

This field indicates that the description of the editing process is starting. Here, the type of the occurring editing process is shown and can assume the following values: "Substitution", "Insertion" or "Deletion". This field can appear several times into the REDIdb flatfile structure according to how many different editing processes occurr along the annotation. In reality, a specific sequence might be subjected to RNA editing by insertions or deletions and substitutions at the same time.


DETAILS

DETAILS contains all specific details for each editing type. In general, in the case of substitutions, they are showed as "genomic nucleotide-->modified cDNA nucleotide". For example, C to U or U to C substitutions are represented as "C-->U" or "U-->C", respectively. By contrast, when a sequence is subjected to editing by insertions or deletions, each single or dinucleotide is shown between the major (>) and minor (<) symbol. In particular, insertions are indicated as "->inserted nucleotide<-", whereas deletions are indicated as "<-deleted nucleotide->". As an example, C insertions and deletions are shown as "->C<-" and "<-C->", respectively. Moreover, the DETAILS field gives also information about the evidence of the annotated editing. It might be experimental or detected by computational tools.


POSITIONS

POSITIONS field contains the total number of editing events for each editing type and all positions in which the RNA editing has been observed. When the exact RNA editing position is ambiguos, a star "*" symbol is added. Look at the EDI_000000632 accession for an explanatory example.


ORIGIN

ORIGIN indicates the nucleotide sequence start.


GENOMIC

GENOMIC is the annotation of the genomic sequence.


LENGTH

LENGTH is the length of the following sequence (genomic or cDNA).


BASE COUNT

BASE COUNT is simply the count of each nucleotide. In case of extensive insertions or deletions, the base frequencies might be very different.


cDNA

cDNA is the complementary DNA corresponding to the genomic annotation.


STOP

STOP indicates the end of the flatfile. It is always linked to symbols //.



If you need more details about REDIdb flatfile fields, please contact us.


vuoto

Overview | Structure | Statistics | Help | EdiPy | Links | Download | Update | Contact

vuoto
Copyright ©2006-2007 Ernesto Picardi