|

cDNA CLONES
What is the definition of a full-length clone? How do I search for a specific cDNA clone? What if I do not have a specific gene in mind? Can I search for a gene by its sites of expression? How do I use the cDNA clone?
What is the definition of a full-length clone?
The NCBI RefSeq human mRNA database represents the best effort in defining the most complete and authentic mRNA sequences encoded by the human genome. It currently contains approximately 17,500 "NM" sequences, which have at least some cDNA sequence support, and about 10,000 "XM" sequences, the majority of which are generated by computational prediction. While these reference sequences are continuously being revised, because some may have been derived from aberrantly spliced transcripts or generated by incorrect prediction of intron-exon junctions in silico, the NCBI RefSeq human mRNA database remain the most attested collection of gene sequences.
For this reason, OriGene chooses to provide clones of mRNAs that have nucleotide sequences already in the public domain. These sequences are used only as a "reference" and not as a "standard". Of the 24,000 clones in the TrueClone Collection, approximately 67 percent match NM sequences, 23 percent match XM sequences, and the remaining 10 percent match mRNA sequences that have not yet been accorded an NCBI RefSeq accession number, such as those predicted by Ensembl or those with homology to mouse orthologs.
Each of OriGene's 'full-length' clones has been sequenced at its 5' end and the resulting sequence matched to a corresponding known reference, with the inclusion of the start-site of the longest ORF. On rare occasions, where a transcript has more than one putative ORF or when the ORF is located far downstream from the 5' end of the transcript, the matching was to the extreme 5' end of the reference. The matching clone was then sequenced at its 3' end, and that sequence was matched again to the same reference. In cases where the reference has a truncated 3' UTR, or was totally lacking any 3' UTR sequences because its ORF was generated by prediction, attempts were made to either demonstrate that the 3' read is derived from genomic sequences located downstream from the putative ORF, or to sequence around the end of the putative ORF using a primer that was located immediately upstream of the termination codon. It has to be noted that many predicted reference sequences are only partially correct, hence the availability of physical clones to these predicted sequences will serve to confirm their existence and either validate or disprove their sequence authenticity. Apart from clones whose 5' and 3' reads overlap, OriGene does not sequence clones in entirety, affording the end-user the opportunity to utilize these cDNA clones for discovery of incorrectly predicted transcripts, possible splice variants, and functional polymorphisms, which can be of immense interest and importance. While the internal sequences were not always analyzed, the lengths of the complete cDNA inserts have frequently been confirmed to be similar to the expected sizes by their release from the cloning vector using an appropriate restriction enzyme.
How do I search for a specific cDNA clone?
You may search for a specific gene using the following information: [1] If you have the mRNA sequence, you can select "Nucleotide Sequence", paste the copied sequence within the box provided, and do a BLAST search. It will return the accession number and description of each of the reference sequences that match your query sequence, notifying you that we have a matching clone(s) for that particular reference. You can view the homology score or the alignment of each, and select the accession number that best matches your requirements. It will then return the name of the clone that best matches that reference sequence. [2] If you have the NCBI RefSeq accession number, either NM or XM, you can select "NM or XM Number" and enter the appropriate accession number in the space provided. It will return either the name of the clone that matches that reference or it will inform you that we do not have a matching clone. Since many of the 'known' genes have numerous redundant and overlapping cDNA sequences in the public domain, and each has been assigned a different accession number, by imposing the need to search only with NM and XM numbers avoids missing a match with an inappropriate accession number entered by the end-user. The NM/XM accession number can be obtained easily at NCBI by entering the accession number you are familiar with and asking for a return of 'related sequences'. It should be noted, however, that about 10 percent of our reference sequences have not yet been accorded a RefSeq NM/XM number, and this type of search would miss them. [3] if you have none of the above information, select "Words in Description" and enter the key word(s) that you think should be in the NCBI description for the gene you are trying to find. This is the least efficient method and frequently misses finding your gene of interest. In addition, this type of search may return numerous entries that are irrelevant but contain the respective key word. We do not recommend this search method if you can avoid it.
What if I do not have a specific gene in mind?
Increasingly, investigators will be seeking to work with newly discovered genes with sequence homologies to those with which we are familiar and have technical expertise in pursuing. Alternatively, one may wish to work simultaneously on multiple genes with similar functions or structure, rather than to pursue the analysis of a single gene, in order to increase the throughput of the discovery process. As many as 40 percent of the RefSeq mRNA sequences were discovered by brute-force cDNA sequencing and, not only have they not been given a proper gene name but, they are identified only by their encoding a 'hypothetical' protein.
OriGene has annotated each of the 24,000 gene sequences and has searched for conserved protein domains in each of their protein products. While only about 50 percent of the proteins have sequences that match previously described functional or structural domains, this effort will help assign at least some of the 'new' gene products to specific gene families. By going to "Search" and clicking on "Domain", you can enter either a single domain name or a combination of up to three domain names. For example, you may search for genes that encode proteins with a 7-transmembrane receptor domain of the rhodopsin family (7tm_1), a C2H2-type zinc finger domain (zf-C2H2), an Src homology domain 2 (SH2), a protein kinase domain (pkinase), or a combination of a homeobox domain (homeobox) and a paired box domain (PAX). For each search, it will return a list of accession numbers whose reference sequences contain the domain(s) requested and each returned accession number will be hyper-linked to an OriGene clone and to NCBI, which gives a full description of the respective reference. A list of all InterPro domains can be found at at their web site http://www.ebi.ac.uk/interpro/.
Can I search for a gene by its sites of expression?
Discovery of a gene's function begins with knowing where that gene is expressed. A gene may also be selected for analysis by its sites of expression, perhaps in conjunction with its sequence characteristics. A significant percentage of human genes are expressed in a restricted fashion and function selectively in those specific tissues. Altered expression of these genes may well result in the development of disease.
OriGene has defined the sites of expression of about 15,000 human genes. The resulting gene expression database will be made available through a separate licensing arrangement to institutional users. This database will allow the individuals, for example, to search for brain-specific (perhaps even hippocampus-specific) G protein-coupled receptors or thymus-specific (or even T cell-specific) leucine zipper-containing transcription factors. Alternatively, one may be looking for genes with dual expression specificity, such as in the brain and in the pancreas, as candidates which drive commitment of neural stem cells to differentiate into insulin-producing islet cells. Such expression information will greatly facilitate functional genomics.
How do I use the cDNA clone?
Plasmid DNA containing an insert of the appropriate cDNA fragment is provided in a complex environment on a dry filter, ready for elution and transformation into a competent bacterial host. [1] The OriGene full-length cDNA fragment is housed in an expression vector with the open reading frame located downstream from a eukaryotic transcriptional promoter capable of driving heterologous gene expression in a variety of mammalian cell lines in culture and support heterologous gene expression in a variety of tissues in transgenic mice. This feature facilitates the investigation of gene functions and the development of transfected or transgenic cells for drug screens. It should be noted that there are examples in the literature suggesting post-transcriptional and/or translational regulation that may affect gene expression, uncontrollable by either the strength or the specificity of the transcriptional promoter used. Some of these are the effects of the presence of the 5' or 3' untranslated regions of the respective mRNA. [2] The OriGene expression vector also contains a prokaryotic transcriptional promoter, which supports coupled transcription-translation of the cDNA sequence using an appropriate cell-free system. This approach may be used to generate recombinant proteins for testing activities in vitro and for target identifications. Successful use of such applications is dependent on a variety of factors, including the length of the 5' noncoding sequence, the strength of the protein initiation site, and the properties of the gene product that the cDNA encodes. [3] OriGene cDNA clones may also be used to generate hybridization probes, for DNA immunization to generate antibodies, and to search for polymorphisms and alternatively spliced forms.
|