cDNA Library


The cDNA fragments generated by reverse transcription of all mRNA transcribed by a certain organism's genome are recombined with cloned vectors and introduced into the corresponding host cell (generally E. coli.) to reproduce and amplify. Theoretically, this population contains all mRNA information of the species, called the cDNA library of the organism's genome. cDNA libraries are tissue-specific or cell-specific because they reflect genes encoding proteins expressed in certain tissues or cells at specific developmental stages.

cDNA libraries are obviously much smaller than genomic DNA libraries, and it is easier to select clones from them to obtain cell-specific genes. Especially for eukaryotic cells, genes obtained from genomic DNA libraries are different from those obtained from cDNA libraries, which contain genomic genes with introns and exons, while those obtained from cDNA libraries are cDNAs that have been spliced with introns removed.

Eukaryotic genomic DNA is huge, about 100 times more complex than proteins and mRNAs, and contains a large number of repeated sequences. It is difficult to isolate the target gene directly by electrophoresis and hybridization. This is a major difficulty in directly cloning target genes from chromosomal DNA as starting material.

Higher organisms generally have about 105 different genes, but at a given time, only about 15% of the genes are expressed in a single cell or individual, producing about 15,000 different mRNA molecules. It can be seen that the complexity of cDNA cloning from mRNA is much simpler than that of cloning directly from the genome.


Classification of cDNA Libraries:


There are many classification criteria for cDNA libraries:


According to whether the initial mRNA has been standardized, it can be divided into non-standardized cDNA library and standardized cDNA library. 


The former reflects the expression level of all genes in tissues and is suitable for gene expression profile analysis, but the amount of elements in the library is high, and the efficiency of discovering new genes is low. The latter is usually processed through hybridization (homogenization), subtractive hybridization, and suppression subtractive hybridization, which cannot reflect the gene expression of the materials in the library. It can not be used for the construction of expression profiles, but the efficiency of new gene discovery is improved.

In the study of gene function and expression regulation, the establishment of subtraction library is a good strategy. It is a library constructed by repeated hybridization of wild row DNA and deletion DNA, or two kinds of cDNA samples under different time, spaces, environmental conditions, and cloning the remaining DNA or cDNA fragments after removing the hybridization.


According to the function of the library, it can be divided into clone library and expression library. 


The clone library is constructed by cloning vector, which has replicons, polyclonal sites, selective markers, and can multiply clone fragments by bacterial culture. Cloned genes mainly use nucleic acid probes, protein sequences, or homologous sequences.

The expression library is constructed by the expression vector. In addition to the elements of the cloning vector, the vector also has the sequence controlling gene expression, such as promoter, SD sequence, ATG, terminator, ..., which can express the coding sequence of the cloned fragment in the host cell. It can be divided into fusion protein expression vector and natural protein expression vector. Since the gene expression product protein of the cloned fragment has antigenicity and biological activity, it can be screened by immunological probes and biological functions in addition to nucleic acid probes. Expression libraries are suitable for the isolation of target genes whose amino acid sequence is unknown and which cannot be screened by nucleic acid probes.


According to the vector type, it can be divided into plasmid library, phage library, cosmid library, bacteria artificial chromosome (BAC) library, and yeast artificial chromosome (YAC) library.


Based on the purpose of the study, the researchers need to choose the appropriate vector. Different vectors have different requirements for the length of cDNA, so researchers must choose the appropriate length of cDNA to construct cDNA library.



Lambda phage library

Cosmid library



Carrying capacity







According to whether full-length selection of clones was carried out in the process of library construction, it can be divided into ordinary cDNA library and full-length cDNA library.


Full-length cDNA library refers to the DNA molecular group obtained from a complete set of mRNA molecules in vivo by reverse transcription, which is a complete copy of mRNA molecular group.

Full-length cDNA library can provide complete mRNA information, obtain mRNA splicing information through gene sequence alignment, predict protein sequence and in vitro expression, and study gene function through reverse genetics.


According to the difference of the first chain of reverse transcription primers, it can be divided into random primer cDNA library and Oligo d(T) cDNA library.


Construction of cDNA Libraries:


The basic principle of classical cDNA library construction: Oligo (dT) is used as a reverse transcription primer, or random primers are used to add appropriate linkers to the synthesized cDNA and connect to the appropriate vector to obtain cDNA library. The basic steps include:

(1) Purification of mRNA to obtain high-quality mRNA is one of the key steps to construct high-quality cDNA libraries.

(2) Synthesis of the first strand of cDNA.

(3) Synthesis of the second strand of cDNA.


Fig 1 cDNA production

(4) Modification of double-stranded cDNA.

(5) Molecular cloning of double-stranded cDNA. 

cDNA insertion into a vector.

(6) Amplification of cDNA library.

(7) Identification and evaluation of cDNA library.


Fig 2 flow chart of cDNA library construction

There are a lot of details to pay attention to when constructing cDNA libraries, take ligation for example:

The ds-cDNA may be blunt-ended by trimming it with S1 nuclease, adding terminal transferase to add C's to the end, and then ligating it into a vector. Short restriction-site linkers are initially ligated to both ends since the blunt-end ligation is ineffective. Phage insertion vectors are frequently used to clone cDNAs. Compared to plasmid vectors, bacteriophage vectors have the following advantages:

Since recombinant phages are created by in vitro packaging, they are more appealing when a high number of recombinants are needed for cloning low-abundant mRNAs.

Unlike bacterial colonies harboring plasmids, they can manage and store vast numbers of phage clones with ease.


Use of cDNA Libraries:


Since a significant portion of the library's non-coding regions are removed during the reproduction of eukaryotic genomes, cDNA libraries are frequently employed. Eukaryotic genes can be expressed in prokaryotes using cDNA libraries. Since prokaryotes' DNA does not contain introns, they lack the enzymes necessary to remove them during transcription. cDNA does not have introns; it may be expressed in prokaryotic cells. Reverse genetics, where the additional genomic information is less relevant, is where cDNA libraries are most beneficial. In order to find genes depending on the function of the encoded protein, functional cloning typically makes use of cDNA libraries. Complementary DNA (cDNA) is used to build expression libraries when investigating eukaryotic DNA to assist the insert is truly a gene.

cDNA library has special advantages in studying the expression state of the genome in a specific type of cell and convenient identification of the function of the expressed gene, so it has more extensive application value in the study of life phenomena such as onto development, cell differentiation, cell cycle regulation, cell aging, and death regulation, and is the most commonly used gene library in research work.