Skip Navigation
Johns Hopkins Bloomberg School of Public HealthCAAT

Animals and Alternatives in Testing: History, Science, and Ethics

Joanne Zurlo, Deborah Rudacille, and Alan M. Goldberg

Chapter 4

Science In Vitro

When the profession can no longer evade anomalies that subvert the existing tradition of scientific practice - then begin the extraordinary series of investigations that lead the professions at last to a new set of commitments, a new basis for the practice of science. The extraordinary episodes in which the shift of professional commitments occurs are the ones known as scientific revolutions.

-The Structure of Scientific Revolutions
Thomas Kuhn (1962)

For many people, the Draize test for ocular irritation represents the whole of toxicity testing. Although this perception is not factually correct, it is true that the Draize test provides a good illustration of the inherent strengths and weaknesses of the whole-animal approach to toxicity testing. The search for an in vitro replacement to the Draize test also provides an exemplar of the practical difficulties involved in implementing the new methodologies.

The test is named for the late Dr. John Draize, a U.S. government scientist who standardized the scoring system of a preexisting test for ocular irritation in 1944. In the standard Draize test, a liquid or solid substance is placed in one of the rabbit's eyes and changes in the cornea, conjunctiva, and iris are observed and scored, with 73% of the score weighted to corneal changes, 18% to conjunctival changes, and 9% to changes in the iris when compared with the untreated eye (Fig. 6). The rabbit's eyes are inspected at 24, 48, 72 hours, and at four and seven days. Different regulatory agencies and different nations use modifications of the standard Draize test in which the number of animals, use of topical anesthetics, rinsing of the eye after application of test material, and methods for interpreting the final outcome may vary.

Figure 6. Cross-section of the Eye


Reprinted from Bruner (1992).

The rabbit eye, although similar in size to the human eye, differs from it in a number of important respects. These differences in physiology, together with the subjectivity of the scoring apparatus and the variability of results produced by different laboratories (Rowan, 1983), form the basis of the scientific objections; the humane objections are obvious. Nonetheless, the Draize test, when performed by trained personnel, has proven quite accurate in predicting human eye irritants, particularly slightly to moderately irritating substances, which are difficult to identify using other methods.

The lack of scientific elegance in its methodology has not prevented the Draize test from performing its primary function of assessing both the damage and potential for recovery after exposure to irritants. For this reason, many toxicologists and ophthalmologists are reluctant to repudiate the Draize test for ocular irritation, although they recognize its shortcomings. Most experts believe that a battery of in vitro alternatives will be necessary to replace the Draize test, since no one test can measure all the necessary variables (Rougier, Cottin, de Silva, et al, 1992; Balls, 1992; Goldberg and Silber, 1992).

Current tests are not yet developed and validated to the point that they comprise an appropriate battery for all chemical substances. Draize tests are performed in many different industries for many different reasons. The risk assessment needs of the chemical industry, for example, differ greatly from those of the cosmetic industry and both of these differ from the needs of companies who manufacture medicines for the eye. Each of these users is, therefore, likely to create a different role for in vitro tests in their risk assessment procedures.

Many in vitro tests are being used as reduction and refinement alternatives while undergoing validation, the process by which the suitability of a particular test is assessed for a specific purpose, with its reliability and reproducibility verified (Frazier, 1990). At the Center for Alternatives to Animal Testing's 10th anniversary symposium, the Cosmetic, Toiletry, and Fragrance Association's Director for Toxicology, Stephen Gettings, announced that from 1980 to 1989, the number of rabbits used in irritancy evaluations had been reduced by 87% in the cosmetics industry.

The Interagency Regulatory Alternatives Group, IRAG, composed of representatives from the Consumer Products Safety Commission, Food and Drug Administration, and Environmental Protection Agency, has recommended that the number of animals used per eye irritation test be decreased from six to three. They have also proposed the used of a tiered assessment process (Fig. 7) to determine the irritancy potential of a test substance before it is placed in the rabbit eye. Although these initiatives do not signify the elimination of the Draize test for ocular irritation, they do indicate a departure from sole reliance on the test and a theoretical and practical acceptance of the scientific validity of developing in vitro methodology and implementation of the three Rs.

Figure 7. Tier-testing approach to assessment of ocular irritation


Reprinted from Frazier, Gad, Goldberg, and McCulley (1987).

Endpoint Assays

In vitro assays consist of three components -- the biological model, the endpoint measurement, and the test protocol (Table 2). The biological model is the system used for evaluation, for example, hepatocytes (liver cells). The greater the ability of the biological model to represent in vivo structure and function, the more valuable the data. An endpoint measurement is the yardstick used to predict toxicity (e.g., cell death). The test protocol is the schedule of events defining the test -- for example, exposing hepatocytes to a test chemical for a specific period of time and measuring the defined endpoint at various times after rinsing the chemical form from dish of cells.

Table 2: Existing In Vitro Toxicity Tests for Acute Dermal Irritation Testing

NameBiological ComponentType of TestingaLevel of TestingbReferencesd
A. Cell Culture Assays
1. Neutral red uptake(1) BALB/c 3T3 cellsSB(1, 3, 4)
(2) NHEKSB(1, 17)
(3) 3T3 Swiss mouse fibroblastSB(7)
2. Uridine incorporationBALB/c 3T3 cellsSB(16)
3. Total cellular protein(1) BALB/c 3T3 cellsSB(15)
(2) NHEKSB(7)
4. KeratinizationXB-2 cellsSB(7)
5. MTT assay3T3 Swiss mouse fibroblastSB(7, 11)
6. Enzyme leakage3T3 Swiss mouse fibroblastSB(6)
7. Arachidonic Acid metabolismNHEKSB(6, 10)
B. Skin culture assays
1. Protein synthesis assayHuman skinSB(12)
Rabbit skinSB(12)
Guinea pig skinSB(12)
2. Nuclear vacuole formationHuman skinSB(12)
Rabbit skinSB(12)
Guinea pig skinSB(12)
3. MTT assay(1) TESTSKIN (organogenesis)SB(2, 5)
(2) Human skin model (Marrow-Tech)SB(5)
4. Release of inflammatoryTESTSKIN (organogenesis)SB(13)
5. Neutral red uptakeHuman skin model (Marrow-Tech)SB(2, 5)
6. Electrical conductivity assayEpidermal sliceSB(14)
C. Other
2. Computer-based structure-activity relationshipcSA(8)

aS = screening, A = adjunct, R = replacement.
bA = toxic potential, B = potency, C = hazard/risk.
cNo living biological component.
dReferences: 1. Babich et al (1989); 2. Bell et al (1988); 3. Borenfreund and Puerner (1984); 4. Borenfreund and Puerner (1985); 5. Center for Animals and Public Policy (1989); 6. DeLeo et al (1987); 7. Duffy et al (1986); 8. Enslein et al (1987); 9. Gordon et al (1989); 10. Lamont et al (1989); 11. Mol et al (1986); 12. More et al (1986); 13. Naughton et al (1989); 14. Oliver and Pemberton (1985); 15. Shopsis and Eng (1985); 16. Shopsis (1984); 17. Triglia et al (1989).

Reprinted from Dermatoxicology, by F. Marzulli and H. Maibach, Hemisphere Publishing, Washington, DC, 1991.

The neutral-red assay is an example of an in vitro test designed to provide indication of cell membrane integrity as an endpoint. In this test, cells are cultured in plastic petri dishes and treated with various concentrations of a test chemical. The neutral-red dye, which is added to the cell culture after rinsing out the test chemical, is accumulated and stored by cells. The amount of dye retained by cells indicates the number of living cells in the dish (Fig. 8).

Figure 8. Microtiter plate used in in vitro studies


Reprinted by permission from Clonetics, Inc.

The cellular protein assay, which assesses amount of protein present, proceeds in a similar manner except that an analytical reagent called kenacid blue, which reacts with protein in the cells, is added to the cell culture after the test chemical is rinsed away. A dish full of healthy, rapidly growing cells will stain dark blue, while dishes in which cell death has occurred will be lighter in tone, depending upon the extent of the damage. These endpoints may be quantified by sensitive instruments that measure the amount of light absorbed by the dye at specific wavelengths.

Other processes that can be used as endpoint assays are the MTT (3[4,5-dimethylthiazol-2-yl]-2,5-diphenyltetrazolium bromide) reduction assay, a test that assesses energy production by the mitochondria, and the LDH (lactate dehydrogenase) assay that measures the amount of this enzyme leaking out of dead or damaged cells. The introduction of molecules that sense the intracellular environment can be used to detect other toxicological endpoints such as the elevation of intracellular calcium and changes in relative acidity (pH) by fluorescence (the emission of light). All of these very general assays provide measures of cellular responses to chemicals that can, in turn, be interpreted as indications of acute toxicity, particularly if a ranking of known toxic substances deduced from such test agrees with appropriate in vivo data.

A series of chemicals can be tested with a particular in vitro method, such as the MTT assay, and the concentration of each chemical, which affects the endpoint response by 50% (designated the EC50), used to rank that chemical. This ranking reflects the relative toxicity of the series of chemicals in vitro. Whether or not this in vitro rank ordering of chemicals will correspond to the relative potency of the chemicals in vivo will depend on many factors, for example, in vivo kinetics, metabolism, repair, and defense mechanisms. At this time, the techniques needed to extrapolate in vitro data to the in vivo situation are just developing. However, chemicals that demonstrate high toxicity during in vitro cytotoxicity tests relative to well-understood toxins should be flagged for further investigation.

A great deal of in vivo data, both human and animal, has been accumulating over the past 50 years, although much of it is not currently available to the public. The compilation of this vast multitude of data into an easily accessible and comprehensive database would be a tremendous boon to alternatives researchers. Problems associated with the compilation of this database include the inability to access all of the data due to proprietary restrictions imposed by corporations, the lack of identified sources of funding for such a massive undertaking, and the lack of a broad-based unified effort to support such a project (see Sidebar, The LD50 Test).

The LD50 Test

The LD50 test was used worldwide for over 50 years as a means of assessing the acute toxicity of chemicals. The test was introduced in 1927 by a British biologist, Trevan, to assess and standardize the potency of batches of therapeutic substances such as digitalis, insulin, and diptheria toxin. It eventually came to be used as a standard measure of acute toxicity and a key element in the toxicological profile of a chemical or chemical mixture.

The goal of the LD50 test is to determine the amount of a test substance required to kill half of the test animals, hence the acronym (L)ethal (D)ose50. Like the Draize test for ocular irritation, the LD50 has been criticized by scientists as well as animal protectionists, and an International Coalition to Abolish the LD50 was established in 1980. In 1991, representatives of regulatory agencies in Japan, Europe, and the United States agreed to drop the classic LD50 as a required measure of acute toxicity.

New approaches to acute toxicity testing include the determination of Approximate Lethal Dose, the Up and Down Procedure, and acute toxicity testing in the nonlethal dose range. The Approximate Lethal Dose is derived by the administration of graduated doses of the test substance to individual animals, rather than cohorts, and can usually be determined using four to ten animals. The Up and Down Procedure also involves administering a substance to animals one at a time, with doses increased by a factor of 1.3 unless the animal dies, in which case the dose is decreased by 1.3. In such tests, fewer animals are needed compared to the classic LD50.

In 1989, a working group of the British Toxicology Society developed a new approach to acute testing which, rather than using death as the only endpoint, also accepts "sign of toxicity of sufficient severity that treatment at the next higher dose level is likely to lead to mortality." According to the authors of a 1990 paper "the advantage of our that the whole range of intoxication is shifted to the lower toxicity range and the average pain is much lower, while at the same time the scientific value of the results is much higher" (Taborini, Sigg, and Zbinden, 1990).

Tissue Culture

Although attempted on a primitive scale as far back as 1885, tissue culture was not a practical methodology until the discovery of antibiotics, which inhibit the growth of bacteria (see Appendix C: Timeline of Tissue Culture). It is possible to culture both normal and abnormal tissue and to harvest both cells and tissues from humans and animals.

Cell culture is complicated by the tendency of isolated cells to "dedifferentiate" in culture, taking on the qualities of unspecialized cells instead of keeping the characteristics that define them as cells from a specific organ such as the liver. For that reason, the easiest type of cells to maintain in culture are the less differentiated cells such as fibroblasts. Cell lines can be developed from these proliferating cells and maintained continuously, eliminating the need for further animal or human donors, but due to their relative lack of specific functions, they are not really useful for many types of toxicity testing.

Primary cells, taken from a specific organ in an animal or human donor, are very useful but maintain their differentiated functions only for a few days, or at most a few weeks. The challenge is finding a way to keep primary cells, with all their special structures and functions, from dedifferentiating. One possibility is by co-culturing them with different types of cells from the same organ. Studies have shown that in co-cultures, the cells survive longer and are better able to maintain their differentiated functions. Another potentially useful technique for long-term culture of differentiated cells is immortalization, in which the DNA encoding viral genes are transfected into primary cells. Ideally, the gene can be turned on and off, causing the cell to replicated indefinitely when the gene is turned on and to redifferentiate when the gene is turned off.

Tissue slices are another in vitro option (one gaining in popularity) as the cellular architecture of the organ is preserved with all cell types present in the whole organism present in the tissue sample used for testing. However, given the status of current culture techniques, the slice is viable for a relatively brief period of time (hours to days). An additional technique involves the use of isolated organs. At the present time certain organs, such as the liver, can be maintained outside of the animal for several hours via perfusion, the pumping of blood or artificial media through the organ to nourish it. It is then possible to infuse chemicals into the organ and examine their effects.

Each of the above methodologies can be used to measure toxicity in a specific organ. However, to mimic better the effect of a toxic substance in a whole animal researchers, are beginning to co-culture cells from multiple organs. Many toxic substances are metabolized by the liver. Sometimes the metabolism of a nontoxic substance by liver enzymes will result in the formation of a toxic metabolite, which will then affect another organ in the body, for example the kidney. By co-culturing liver and kidney cells and then introducing a toxic chemical, the investigator can observe the process wherein the liver cells metabolize the chemical and the kidney cells respond to the toxic effects of the metabolite.

While cell culture techniques have become increasingly sophisticated over the past decade, a great deal of work still needs to be done to define the optimum culture conditions for different types of cells, so that they may propagate indefinitely and still function and they do in vivo.

Physiologically-Based Toxicokinetic Modeling

Tissue and cell culture alone cannot predict the effects of toxic substances in an animal or human. The cells and tissues must be placed in context within the organism. Physiologically-based toxicokinetic modeling attempts to do just that. Using computer simulation, researchers are attempting to relate the concentration of a chemical that causes toxicity at the cellular level as measured with an in vitro test to the corresponding in vivo exposure level that might have produced that effect. The simulations are performed using a computer to solve the differential equations that describe the model (Fig. 9).

Figure 9. Toxicokinetics (absorption, distribution, metabolism, storage, and excretion of a chemical) and toxicodynamics (effects of the chemical and its metabolites on an organism


Reprinted from Goldberg & Frazier (1989).

In this type of modeling, the toxicokinetics of various drugs and chemicals are determined by their tissue solubility characteristics, metabolic rates, and the physiology of the test species. The information required to construct the model can be gained through literature searches, in vitro studies, and in vivo experiments, which are limited to those designed to obtain very specific information relevant to the model. Due to the systemic nature of the model, much of the necessary information retrieved through literature search and databases is the result of past whole-animal studies.

The construction of a model for the first chemical in a class demands a great deal of time and effort, but once the prototype is created and validated, it can be extrapolated and applied to other related chemicals under various exposure conditions. Mathematical modeling offers a tool to predict the concentration of the active form of the toxicant at the site where it generates its effects. However, as researchers working in the field admit, physiologically-based toxicokinetic modeling is, like much of in vitro toxicology, in its infancy with an enormous array of practical and theoretical challenges to be overcome before it is able to serve as a general basis for risk management.

Structure-Activity Relationships and Databases

The hypothesis on which the concept of structure-activity relationships is based states that the structure of a chemical inherently possesses all of the information necessary to predict its toxicity, including the manner in which both the parent chemical and its metabolites will interact with macromolecules in the cell. This principle has been successfully applied to certain classes of carcinogens; however, its broader applications to general toxicity have not yet been established.

In applying structure-activity relationships, biological effects are expressed in qualitative terms. A mathematical equation is prepared to correlate the toxicant's chemical properties with the biologic effect. The relationship derived from the equation is used to make predictions about the toxicity of a chemical. Computers are used to establish these relationships. Advances in computer technology over the past 20 years have contributed greatly to the development of both structure-activity relationships and physiologically-based toxicokinetic modeling.

The construction and use of comprehensive databases will also help reduce the number of animals used in testing and refine the testing process. John Frazier of the Johns Hopkins Center for Alternatives to Animal Testing and Mildred Green of Technical Database Services are currently compiling a database that will include both descriptions of in vitro methodology and specific results obtained by testing chemicals with these methodologies. A collection of information about methods, together with a compendium of data produced by these methods, should prove useful for both validation and interpretation of in vitro testing data.

In Germany, ZEBET (The Center for Documentation and Evaluation of Alternative Methods to Animal Experiments) was established at the Institute of Veterinary Medicine of the Federal Health Institute in 1989. The ZEBET Data Bank offers both a compilation of literature on alternative methods in relation to the three Rs (replacement, reduction, refinement), description of the animal experiment that can be replaced, reduced, or refined, names of scientists experienced in the area, and references on the specific alternative methods and on the animal experiment it is designed to replace, reduce, or refine. Most of the information in the ZEBET Data Bank is available only in German; however, both the summary and list of references are available in English. FRAME (Fund for the Replacement of Animals in Medical Experiments) in England and Professor Nicola Loprieno and colleagues at the University of Pisa, Italy are also constructing databases that should prove useful to researchers implementing the three Rs.

In the United States, the Toxicology Data Bank, developed in 1978, lists over 4,000 chemicals and includes data on the production and use of each chemical, a description of its physical properties, and the results of in vivo pharmacological and biochemical experiments. As the Toxicology Data Bank and similar in vitro databases grow, researchers will be able to design and plan better experiments, based upon the knowledge gained from newly accessible data. Conversely, as understanding of toxicological mechanisms increases, the data bank grows larger and more useful. (see list of toxicology databases here.)

Validation and the Future of In Vitro Toxicology

The future of alternatives research lies in validation. Validation is the door through which every alternative method must pass before entering the armamentarium of toxicity testing. In validation, a particular test is defined for a specific purpose. The test's relevance and reliability are established through intralaboratory and interlaboratory assessment, test database development, and evaluation.

In September 1992, the Validation and Technology Transfer Committee of the Johns Hopkins Center for Alternatives to Animal Testing completed a draft framework for validation and implementation of new in vitro toxicity tests. Noting that "continuing advancements in both cellular and molecular biology and bioanalytical and computer techniques" had resulted in a proliferation of in vitro alternatives without a "formal administrative process to organize, coordinate or evaluate validation and implementation of these advancements," the committee proposed a validation plan aimed at increasing scientific and regulatory acceptance of alternative technologies (Journal of American College of Toxicology, In Vitro Toxicology, Xenobiotica, In Vitro: Cellular and Developmental Biology, 1993).

Key elements in the framework for validation include reference laboratories that could evaluate in vitro methods using sets of chemicals provided by chemical banks. Information could then be placed in a continuously updated database, with publication in appropriate peer-review journals. The committee has also proposed the establishment of Scientific Advisory Board review panels. These panels would provide advice and information to researchers interested in developing and validating new procedures, review and recommend the scientific criteria for validation of new testing methods, identify tests ready for validation, and recommend tests as being validated for specific purposes (Fig. 10).

Figure 10. Framework for validation proposed by the Center for Alternatives to Animal Testing validation committee.


Reprinted from Goldberg, Frazier, Brusick, et al. (1993).