|
|
REDIdb Flatfile Structure
|
The REDIdb flatfile is the unit of information in the REDIdb database. Its format and representation is like to a GenBank flatfile.
Each REDIdb flatfile is divided into three parts described below:
|
|
HEADER |
The header contains the main descriptors of the record including accession number, organism name, cellular location, type and name of the gene, GenBank and PubMed accessions.
|
|
EDITING |
The EDITING describes all editing features that characterize the annotations on the record. Here, there are specific fields for editing type and details, and an additional field containing all positions for each editing event.
|
|
SEQUENCES |
The last part of REDIdb flatfile contains the nucleotide genomic sequence and the corresponding transcript, annotated as cDNA.
|
|
|
The following is a sample record of REDIdb flatfile. The description of each element or field is available by clicking on any specific link.
|
|
ACCESSION EDI_000000281 2006-03-31
ORGANISM Anthoceros formosae
LOCATION chloroplast
SOURCE gene rpl21 complete
GENBANK NC_004543
PUBMED 12527781
EDITING TYPE Substitution
DETAILS C-->U Experimental evidence
POSITIONS n. 5
178
181
311
322
326
EDITING TYPE Substitution
DETAILS U-->C Experimental evidence
POSITIONS n. 5
43
158
179
274
307
ORIGIN
GENOMIC
LENGTH 357
BASE COUNT 130a 51c 63g 113t
1 atgaatacat atgcaataat tgataccgga ggtgagcaac tctgagttga accaggaaga
61 ttttatgata tgcgtcactt tactttacta aatccgagta ttttagattc aaatactaaa
121 gtattaatct atcgagtatt aatgattcat catgaattga atatcgctct tggagatctc
181 cggttagaag atgcaacaat taaaggaaga gttttacatt ctcactttaa agacaaaatt
241 acaatttaca aaatgcgttc taagaaaaag atgtgacgta aattaggata tcgattaaat
301 ttagcctgat ctgtggtgga tcccacttgt ttcgatggaa aagaattcta caaataa
cDNA
LENGTH 357
BASE COUNT 130a 51c 63g 113t
1 atgaatacat atgcaataat tgataccgga ggtgagcaac tccgagttga accaggaaga
61 ttttatgata tgcgtcactt tactttacta aatccgagta ttttagattc aaatactaaa
121 gtattaatct atcgagtatt aatgattcat catgaatcga atatcgctct tggagattcc
181 tggttagaag atgcaacaat taaaggaaga gttttacatt ctcactttaa agacaaaatt
241 acaatttaca aaatgcgttc taagaaaaag atgcgacgta aattaggata tcgattaaat
301 ttagcccgat ttgtggtgga ttccatttgt ttcgatggaa aagaattcta caaataa
//Stop
|
| ACCESSION |
The ACCESSION number is a unique identifier for a sequence record. Each REDIdb accession number is generated by a combination of three letters (EDI) and nine numbers separated by an underscore bar, such as EDI_000000281 or EDI_000000253. The "EDI" prefix states for "EDIting".
Since accession numbers are unique, they do not change even if information in the record is changed.
|
|
|
The date in the ACCESSION field is the date of last modification. In the sample record shown above, the last modification occurred on 2006-03-31. In general, the modification date format used in REDIdb is YYYY-MM-DD, where YYYY is the year, MM is the month expressed by numbers and DD is the day.
Many times, the modification date corresponds to the release date.
|
|
| ORGANISM |
This field contains the name of the source organism (genus and species), based on the nomenclature adopted by GenBank.
|
|
| LOCATION |
The LOCATION indicates the cellular localization of the editing process. At the moment the localization is limited to organelles, mitochondrion and chloroplast, even if nuclear and viral editing events will be added to REDIdb in a future release.
|
|
| SOURCE |
The field SOURCE gives information about the type of molecular source of the record. It might be a transfer RNA ("tRNA"), a ribosomal ("rRNA"), an intron ("intron") or a coding sequence ("gene"). Flanking the molecular type, there is its current name and an indication about the completeness of the annotation ("complete" for complete sequence and "partial" for partial sequence). Please, note that some genes might be called in different ways and thus, before a REDIdb search, check for gene alias according to the table below or have a look here.
GENE | ALIAS |
| |
ccb6c | ccb452 ccmFc |
ccb6-nA | ccb382 ccmFN1 |
ccb6-nB | ccb203 ccmFN2 |
ccb3 | ccb256 ccmC |
ccb2 | ccmB |
ccb6n | ccmFn |
When the molecular source of the record is an intron, its name contains the name of the related gene and the relative index. For example, the "nad7i4" is equal to "intron number 4 belonging to the nad7 gene". In the case of genes containing only one intron, the relative index is fixed to 1, such as "rps10i1".
|
|
| GENBANK |
Here is reported the GenBank accession number from which the annotated sequence was extracted. Sometimes, the annotated sequence is a combination of exons independently indexed in GenBank and, thus, more than one accession number might appear.
|
|
| PUBMED |
Like GENBANK field, PUBMED is a list of PubMed accessions to published articles related to the annotation.
|
|
| EDITING TYPE |
This field indicates that the description of the editing process is starting. Here, the type of the occurring editing process is shown and can assume the following values: "Substitution", "Insertion" or "Deletion".
This field can appear several times into the REDIdb flatfile structure according to how many different editing processes occurr along the annotation. In reality, a specific sequence might be subjected to RNA editing by insertions or deletions and substitutions at the same time.
|
|
| DETAILS |
DETAILS contains all specific details for each editing type. In general, in the case of substitutions, they are showed as "genomic nucleotide-->modified cDNA nucleotide". For example, C to U or U to C substitutions are represented as "C-->U" or "U-->C", respectively. By contrast, when a sequence is subjected to editing by insertions or deletions, each single or dinucleotide is shown between the major (>) and minor (<) symbol. In particular, insertions are indicated as "->inserted nucleotide<-", whereas deletions are indicated as "<-deleted nucleotide->". As an example, C insertions and deletions are shown as "->C<-" and "<-C->", respectively.
Moreover, the DETAILS field gives also information about the evidence of the annotated editing. It might be experimental or detected by computational tools.
|
|
| POSITIONS |
POSITIONS field contains the total number of editing events for each editing type and all positions in which the RNA editing has been observed. When the exact RNA editing position is ambiguos, a star "*" symbol is added. Look at the EDI_000000632 accession for an explanatory example.
|
|
| ORIGIN |
ORIGIN indicates the nucleotide sequence start.
|
|
| GENOMIC |
GENOMIC is the annotation of the genomic sequence.
|
|
| LENGTH |
LENGTH is the length of the following sequence (genomic or cDNA).
|
|
| BASE COUNT |
BASE COUNT is simply the count of each nucleotide. In case of extensive insertions or deletions, the base frequencies might be very different.
|
|
| cDNA |
cDNA is the complementary DNA corresponding to the genomic annotation.
|
|
| STOP |
STOP indicates the end of the flatfile. It is always linked to symbols //.
|
|
|
If you need more details about REDIdb flatfile fields, please contact us.
|
|
|
|