GENETIC ENGINEERING: FUNDAMENTALS AND APPLICATIONS II. cDNA Cloning and Protein/Variant Production David C. Tiemeier, Senior Fellow Biological Sciences Department Monsanto Company St. Louis, Missouri (NOTE: Edited for CompuServe by C. E. Styron. Comments on this article can be forwarded through CompuServe E-Mail to 76054,1666 or through SourceMail BBH329.) INTRODUCTION In an earlier article in this series, I discussed some of the basic concepts and techniques associated with recombinant DNA technology. These techniques can be used to isolate and amplify, that is to clone, a piece of DNA from the total set of DNA of an organism. Depending on the amount of DNA required to encode the particular gene(s) one wishes to study, the cloned DNA fragment might encode a portion of one gene or up to several complete genes. This procedure also yields DNA sequences which flank the gene and are critical for the normal physiological control of protein production. The recombinant DNA technology permits one to make not only large amounts of the DNA corresponding to a particular gene but also large amounts of the protein encoded by that gene. This can, in turn, permit detailed studies of the protein's structure and its interaction with substrate and inhibitors. Moreover, this can be the basis for a production process if the protein, itself, is judged to be a product candidate for animal or human health care applications. PROTEIN PRODUCTION The vector used to produce the protein encoded by the cloned gene differs from that described for the simple amplification of the hybrid DNA. DNA elements are incorporated on either side of the foreign gene which direct the host cell to produce an mRNA transcript and subsequently translate the mRNA into protein. By using regulatory elements known to be very efficient in the host cell, high level production of the protein can be achieved. (Figure 1) FIGURE 1: EXPRESSION VECTOR _______ | | <--- REPLICATION ELEMENT ______________________________ | __________________________ | __ || || | SELECTABLE || || | <--- MARKER __ || || __| PROMOTER | || || ---> | || FOREIGN DNA || |__ || INSERTION SITE || ||____________ ____________|| |_____________||_____________| TRANSLATION SIGNALS ---> |______| /\ |______| <--- TERMINATOR / \ || || _________________ _________________ FOREIGN DNA In some cases, proteins have been produced at high levels in E. coli. Human and bovine somatotropin are examples of this. The yeast, Saccharomyces, better known to the fermentation industry, has also been used for the production of some proteins. Proteins are more readily secreted secreted from yeast and that feature may facilitate protein isolation in some instances. In other cases, animal cells have been used as the expression host. Tissue plasminogen activator(tPA) is an example of this. Animal cells may prove particularly useful for producing proteins whose function is affected by post-translational modifications. Many of these modifications are not performed by E. coli. Some are performed by yeast but in a way different from animal cells. cDNA CLONING Gene-containing DNA fragments, isolated directly from the cell's DNA as described in the first article, may be used for the protein production I have just described; but two reasons have prompted scientists to use an alternative cloning approach. First, most genes only occur as single copies in the DNA. Since DNA in higher eukaryotes such as soybeans, cows, and humans have enough information for one to ten million genes, one must mount a non-trivial cloning project to pull out the full gene copy. Second, as mentioned in the first article, many genes have been found to be interrupted by DNA segments called intervening sequences, or introns. These are good for the gene, apparently facilitating the stable accumulation of the gene's mRNA, but rough on the molecular biologist who would like to over-produce the encoded protein. Introns can make the stretch of DNA encoding the gene so big that it doesn't fit into most vectors. Moreover, some of the key host:vector systems for protein production, principally those involving E. coli, do not remove the introns from the mRNA. As a result, even though mRNA might be made within the host it cannot be properly translated into protein. To understand the alternative cloning approach, it is useful to recall the "central dogma" of molecular biology. An organism's traits are encoded in DNA. The information for a specific protein, associated with a particular trait or characteristic, is converted into a second polynucleotide termed messenger RNA (mRNA) by a process called transcription. This specific bit of information is then translated into the particular protein. (Figure 2) FIGURE 2: "CENTRAL DOGMA" OF MOLECULAR BIOLOGY DNA ============================= || || TRANSCRIPTION \||/ \/ RNA ----------------------------- || || TRANSLATION \||/ \/ PROTEIN Different cells in an organism are specialized to produce only a portion of the proteins encoded in the total DNA complement. Some cells produce as much as 1 - 2 percent of their total protein as a single species. Generally, mRNA levels reflect this same bias. Hence, if one can start with a population of cells that are preferentially producing the protein of interest and one can produce a double-stranded DNA copy from the mRNA, the gene cloning will have the advantage of starting with a DNA population highly enriched in the desired gene. Moreover, the mRNA that accumulates in the cell and that is subsequently translated into protein is a mature form lacking the introns. Hence, the double stranded DNA that results also lacks the introns and so contains the amino acid information in an uninterrupted form. Fortunately for the molecular biologist, an enzyme associated with certain animal viruses is capable of using mRNA as a template to generate a complementary piece of DNA. Eukaryotic mRNA typically has a run of adenylic acid at its 3' terminus (FIGURE 3a) so that a short piece of deoxythymidylate can be used as a primer or starter for the enzymatic synthesis (FIGURE 3b). Because this enzymatic process is the opposite of transcription, the enzyme has been named reverse transcriptase. The resulting complementary DNA is referred to as cDNA and the cloning approach based on this initial conversion of mRNA to cDNA is termed cDNA cloning. (Figure 3) FIGURE 3: cDNA CLONING a. ----------------------------AAAA || Reverse || Transcriptase || oligo dT primer \||/ \/ b. -----------------------------AAAA || -------TTT || \||/ \/ c. -----------------------------AAAA -----------------------------TTT || RNAase H || DNA polymerase || \||/ \/ d. ---------- ------------------------------TTT || || \||/ \/ e. ------------------------------ ------------------------------ || DNA ligase || linkers || \||/ ___ \/ ___ | | | | f. |---|-----------------------|---| |---|-----------------------|---| |___| || |___| || Restriction || Enzyme \||/ \/ ___ _ | | g. ---|-------------------------|- -|-------------------------|--- _| |___ The single-stranded cDNA can be converted into a double stranded DNA using ribonuclease H and DNA polymerase. The former chews away the mRNA in the hybrid leaving short RNA pieces which can act as primers to start synthesis of the second DNA strand (FIGURE 3d). This process results in a double-stranded cDNA (FIGURE 3e). Short, chemically- synthesized oligonucleotides containing desired restriction enzyme sites can then be attached with DNA ligase (FIGURE 3f). Subsequent cutting with the appropriate restriction enzyme then generates the single-stranded termini or ends (FIGURE 3g) described last time which can mediate recombination with the vector. There are many variations on the cDNA cloning scheme outlined here. All start, however, with mRNA enriched for the particular sequence of interest and take advantage of enzymatic tools for converting the mRNA into a double- stranded cDNA. As with the cloning of a specific piece of genomic DNA, the identification of the desired cDNA clone typically depends on hybridization with labeled oligonucleotides specific for the desired gene or screening of the cells transformed by the hybrids with antibodies specific for the desired protein. Basically, if one can purify small amounts of the desired protein, its gene can be cloned. Protein microsequencing on as little as 10 - 100 micrograms of protein can provide a partial amino acid sequence on which, since the DNA genetic code is known, gene-specific, synthetic oligonucleotide probes can be based. Alternately, specific antibodies raised against similar quantities of protein can be used to screen cDNA libraries if they are constructed so as to produce the encoded protein. VARIANT PROTEIN PRODUCTION In addition to permitting the production of naturally- occurring proteins, the expression technology can be adapted to the production of variants of naturally-occurring proteins. This can be useful for defining the relationships between protein structure and function and for providing novel proprietary compositions for product applications. There are two basic approaches to the construction of variants: site-specific mutagenesis and random mutagenesis. Site-specific or site-directed mutagenesis is a very precise means of generating proteins with specific alterations in their structure. One needs to have already cloned the desired gene and to know its DNA sequence. The cloned gene is then introduced into a bacterial virus system known as M13. This has the unique characteristic of yielding either double-stranded or single-stranded forms of the hybrid DNA molecule. One then synthesizes an oligodeoxynucleotide whose internal sequence matches the new amino acids one wishes to introduce into the protein. The ends of the oligonucleotide are made such that they match the already known sequence of the gene. (Figure 4) FIGURE 4: ENZYMATIC CONSTRUCTION OF HETERODUPLEX ENCODING BASE CHANGES __________________________ | | | | | | | | | | | ___ | |___________/ \__________| _________ ________ \_x_/ || \||/ \/ __________________________ | | | | | | | || | || | ___ || |___________/ \__________|| _________ ___________| \_x_/ || \||/ \/ ____________________________ | __________________________ | || || || || || || || || || || || ___ || ||___________/ \__________|| |____________ ___________| \_x_/ When the oligonucleotide is added to the single-stranded hybrid, it hybridizes by virtue of its complementary ends. DNA polymerase, using nucleotide triphosphates as building blocks, the oligonucleotide as primer, and the gene- containing hybrid as template proceeds to complete the second strand. A heteroduplex circle results (FIGURE 4). The two circles are mismatched in the region where the first strand contains the old sequence and the second strand contains the new sequence representing the amino acid alterations one desires to make. When the heteroduplex is introduced into E. coli one of the two strands is selected for viral replication and production. Approximately, half the time the viral DNA obtained has the old sequence fixed in both strands; the other half has the new sequences fixed in both strands. One can then re-introduce this altered DNA into the expression vector and obtain the desired protein variant. In addition to amino acid replacements, one can similarly produce amino acid additions and deletions. This has been an important program in our studies of bovine somatotropin. It is obvious that one must have some idea of what amino acids ought to be varied and what specific variations should be made. If the gene is in hand and already sequenced, if synthetic oligonucleotides are available, and if the expression system and protein purification scheme are in place, then one person can hope to generate a half dozen variant proteins in a three to six month period. However, when you consider that any one of twenty amino acids could be put into any one of a typical protein's one hundred amino acid positions and that you could delete, add, or rearrange protein segments as well, it is clear that one cannot realistically expect to construct all possible structural variants. Information from the first set of variants constructed in this way and knowledge of peptide chemistry are critical elements in advancing a site-specific mutagenesis program in a productive fashion. The second approach to variant construction, random mutagenesis, can be a powerful adjunct. This depends on generating mutations randomly throughout the gene or a region of the gene enzymatically or chemically. Its advantage is that it can be applied to proteins for which we have little structure:function information. What is critical in this approach is that a rapid, functional assay be available. When the randomly mutated gene is re-inserted into the expression vector and returned to the host cell, each cell now produces a different variant protein. With E. coli or yeast cell systems, one can reasonably plate and screen thousands of variant proteins on a nutrient agar plate. With mammalian cells hundreds can potentially be screened. The plates are subjected to the particular assay and the rare variants identified on the plates. CONCLUSION The technologies to produce large amounts of specific proteins and variants of those proteins represent powerful tools for the synthesis of protein products and the identification of non-protein products. A very fertile area of research will be the combination of these technologies with techniques for protein crystallization, X-ray crystallography, and computer-assisted protein structure analysis. ACKNOWLEDGEMENT I thank Gwen Krivi and Roger Wiegand for their advice, and Vicki Grant for her tireless assistance, and Clarence Styron for his encouragement and editorial assistance in the completion of this article.