Skip Navigation
Johns Hopkins Bloomberg School of Public HealthCAAT

Introduction for TestSmart--A Humane and Efficient Approach to Screening Information Data Sets (SIDS) Data

April 26-27, 1999
Hyatt Fair Lakes
12777 Fair Lakes Circle
Fairfax, VA 22033

A workshop of The Johns Hopkins Center for Alternatives to Animal Testing
 TestSmart is a program of the Vision 20/20 forum
 This workshop is partially funded through a grant by the Vira I. Heinz Endowment


TestSmart is an efficient and humane approach to collecting Screening Information Data Sets (SIDS) data for the HPV Chemical Challenge Program. The first public meeting was held April 26-27, 1999, to bring together representatives of organizations committed to finding alternatives to the high number of animal tests called for by the HPV program.

The Johns Hopkins Center for Alternatives to Animal Testing established the Vision 20/20 program in 1998 to create a forum of industry leaders, scientists, and representatives of government and animal welfare organizations who could identify opportunities for making rapid progress in the alternatives field through collaboration.

TestSmart is the first Vision 20/20 project. It is a collaborative project with the Environmental Defense Fund, Carnegie-Mellon University, and the University of Pittsburgh. TestSmart is funded, in part, by a grant from the Heinz Foundation.

Workshop Summary

The EPA announced that it encourages use of the fixed dose procedure for acute toxicity testing, two in vitro assays for genetic toxicity, and combination protocols to obtain reproductive, developmental, and repeat dose toxicity. These changes result in an 80% reduction in the number of animals originally predicted. Further reductions will be possible following collection and examination of existing data, categorization of chemicals, and SAR analysis.

A number of promising in vitro methods require additional testing and validation. It was predicted that the TestSmart approach created for the HPV program will set the standard for future in vitro work.

A summary of individual workshop sessions follows, with recommendations from each group.


Chairperson: Phil Sayre
Speakers: Phil Sayre, Kurt Enslein, Mark Cronin, Terry Schultz, John Bantle

The focus of this session was on alternatives to acute toxicity testing in fish. The goal is to decrease the number of fish used and, if possible, replace the fish assay with other available assays. Currently, the EPA requires SIDS endpoints for ecotoxicity, which include chronic aquatic toxicity studies in invertebrates and acute toxicity tests in fish, invertebrates, and plants. At this time, no terrestrial toxicity tests are required, which eliminates the need to test in earthworms, terrestrial plants, and avian species. EPA is considering accepting only chronic aquatic invertebrates for high log KOW chemicals, thus obviating the need for any acute toxicity testing for these compounds.

Presentations in this session compared results from five alternative tests with existing acute toxicity data from the fathead minnow. These five tests are SAR, Tetratox, Microtox, FETAX and ECOSARs. Specific recommendations were made for each method.


The status of SAR for predicting ecotoxicity was discussed. Current results show that SAR can predict toxicity of acyclic compounds in fathead minnow from Daphnia data, but for other classes of compounds, there is insufficient Daphnia data to evaluate the usefulness of the method.

Recommendation: More Daphnia testing should be done to allow a comparison of data with the large amount of available data from the fathead minnow.


For the Tetratox assay, which uses the protozoan Tetrahymena, there is a high correlation with fathead minnow data for neutral organic compounds. However, this assay has some serious drawbacks due to the differences in bioreactive mechanisms and basic physiology between the two organisms. There is an ongoing study of Tetratox sponsored by the Danish and German EPA's to further evaluate its usefulness.

Recommendation: More data using Tetrahymena should be generated for comparison with available data on fathead minnow.


A similar recommendation was made for the Microtox assay, which uses a bioluminescent assay with the bacterium Vibrio fischeri. There is a need to further evaluate this test in comparison with available fathead minnow data. The existing Microtox results show a good correlation for non-polar narcosis and for esters. It was suggested that this assay might ultimately be a good screening tool to assess fish acute toxicity, but not a good candidate as a complete replacement.


It was pointed out that most data generated with the FETAX assay have been collected for pharmaceutical chemicals and not industrial chemicals. FETAX currently is in peer review as an alternative developmental toxicity assay.

Recommendation: More studies should be done to assess its usefulness as an alternative to acute fish toxicity tests.


Finally, the ECOSAR system was presented as an ongoing effort by EPA to replace acute fish toxicity tests. ECOSAR is a program that uses chemical structure to predict the LC50 in fish. It also functions in correlating log KOW with fish LC50. In a recent collaborative study by the European Union and the EPA, ECOSAR was demonstrated to have a high validation score in its correlation with fish acute toxicity data.

Recommendation: More ECOSARs should be built or the 48 existing ECOSARs should be strengthened. The HPV Challenge Program may provide an opportunity to generate chemical classes that can be tested in systems with a high-demonstrated level of predictability.

Reproductive Toxicology

Chairperson: Bernard Robaire
Speakers: Gerard Cooke, Kim Boekelheide, Bernard Robaire, Sally Perreault

Participants agreed that the reproductive system is so complex that no single in vitro test can replace the two-generation study, or the 28- or 90-day studies. However, different components of the system can be developed for study in vitro. For example, in Leydig cells, there is a lack of data on the effect of toxicants on steroidogenesis, and there are several cell lines now available that make steroids.

In order to study seminiferous tubules and spermatogenesis, Sertoli cells must interact and interdigitate with germ cells. Currently, the assay with the most promise in this area is the Popov assay, which uses cocultures of germ cells and Sertoli cells and can measure the level of cell death of germ cells in response to chemicals.

While there has been some promise in developing cell cultures of seminiferous tubules, no one is yet able to culture germ cells and get them to undergo spermatogenesis. Similarly, there are no Sertoli cell lines that maintain differentiation, nor immortalized cell lines from the epididymus.

It also was pointed out that spermatozoa are good markers and that genotoxicity tests have been developed using spermatozoa. Although these cells alone are not adequate for assessing reproductive toxicity, they are useful for assessing certain parameters including chemical effects on sperm motility, DNA damage via COMET assays, capacitation, sperm/egg interactions and penetration of the sperm into the egg. Moreover, in vitro fertilization can be used to study the effects of toxicants on early development.

In summary, while the current in vitro systems are not sufficient to completely assess reproductive toxicity, they can provide insight into mechanisms of toxicity and allow for screening of certain classes of drugs. More research is necessary to understand the mechanism of each step.

Recommendations included:

  1. Cell lines that produce steroids-in particular, the K-9 cell line-should be further developed.
  2. The Popov assay should continue to be developed and validated.
  3. The vast amount of clinical experience and data gained from in vitro fertilization should be tapped to increase knowledge of the process of fertilization in humans.
  4. Further research into Sertoli cell lines should be supported.

Thoughtful Testing Approaches I Ð Combining Protocols

Chairperson: Katherine Stitzel
Speakers: John Moore, Thomas Re, Rajendra Chhabra, Michael Holsapple

In this session, it was determined that combining protocols:

  1. is feasible and has been done successfully;
  2. saves animals, time, and money; and
  3. is advantageous because all the endpoints are obtained in the same animals, at the same time, with the same treatment.

Because current OECD guidelines question the acceptance of these protocols, it was cautioned that laboratories using combined protocols need to have the expertise to conduct all parts of combined protocols well. To meet OECD guidelines, it also is necessary to use doses that would satisfy all endpoints. However, for HPV screening purposes, this is not viewed as critical.

Recommendations included:

  1. Studies should start at the upper limit of dose.
  2. International agreements should be established on these protocols to avoid duplicity of tests. Differences in the required number of animals currently exist, and a compromise should be reached so that one study is sufficient. There was a question raised about whether 28 days was an adequate exposure time in males to detect reproductive toxicity, or whether studies should last 54 or 69 days. In order to be certain the test is adequate, either the exposure time should be increased or the endpoints should be changed.
  3. Tissue samples being kept by the National Toxicology Program (NTP) for molecular biological studies should be considered for comparison studies with in vivo endpoints.

(In this session, it was proposed that an immunotoxicity endpoint also could be added to a combined protocol. It was emphasized that immunotoxicity is not required by the SIDS battery. However, in other test batteries, it would be possible to immunize animals with an antigen (e.g. sheep red blood cells) early in the study and then use an ELISA assay to look for antibody production rather than removing the entire spleen. If the animals are immunized early in the study, the reaction will have subsided by the end of the study, minimizing the effect on histopathology.)

Finally, the question was raised about whether acute toxicity data are really needed to classify chemicals. The NTP does not require acute toxicity for dose setting, but instead uses information obtained from the literature and/or SAR data. The response to this is that the European Union has an issue with dose setting and needs acute toxicity data for assigning chemicals in their classification system.

Thoughtful Testing Approaches II Ð All Available Data Ð A Weight of Evidence Approach

Chairperson: Katherine Stitzel
Speakers: Ian Munro, Lois Lehman-McKeeman, Michael Holsapple

The major focus of this session was to identify all of the possibilities for obtaining and using existing data and other information in screening chemicals in the HPV Program. A first step in accomplishing this is to place chemicals into classes. It is also necessary to determine whether a chemical is absorbed, and if so, what form of the chemical is absorbed. Questions that need to be addressed: If a substance is metabolized, should the metabolite or the parent compound be tested? If the toxicity of the parent compound is known is it necessary to test the metabolite and vice versa?

Recommendations included:

  1. Existing knowledge on toxic mechanisms should not be ignored in decision making.
  2. If human data are available, further testing should not be required.
  3. Further workshops should address the issue of categories, determining the mechanism for conducting public reviews of data.
  4. The "checking the box" approach to SIDS data should be avoided if other endpoints are available.
  5. Any test material should be kept for in vitro tests for comparison with in vivo data, especially complex mixtures.
  6. Companies should be encouraged to submit any in vitro data they have.
  7. A "safe harbor" should be provided for this data, to assure companies they will not be penalized for negative or conflicting data.

In Vitro Developmental Toxicity

Chairperson: Thomas Flynn
Speakers: John Bantle, Elaine Faustman, Manfred Liebsch

This session produced the following recommendations:

  1. There must be a consensus on the gold standard against which in vitro teratology assays are evaluated. This is a problem that remains unresolved after 18 years. A case in point is thalidomide, which is positive in the FETAX assay but negative in rodents, the classical gold standard.
  2. New molecular techniques should be incorporated into test protocols as they become available. The FETAX, mouse embryo stem cell, and Micromass assays are constantly evolving in the laboratory, even though they are being validated. As the regulation of specific development genes is uncovered, this information should be incorporated into the assays.
  3. Lower species such as Zebrafish and roundworms can and should be used to assay for developmental toxicity because many of the developmental processes are highly conserved throughout the phylogenetic scale.
  4. Any assay for developmental toxicity must be able to distinguish between basal cytotoxicity and specific toxic effects on development.
  5. Statisticians should be involved in the planning of experiments/protocols to assure the best data will be obtained.

In summary, positive and negative results in in vivo tests must be established as the gold standard, and these must be sensitive to false negatives. It also was noted that in order to determine the validity of using lower species for developmental toxicity assessment, their similarities to and differences from human development must first be identified.

Specific Use of Human Cells in Culture

Chairperson: Charlene McQueen
Speakers: Charlene McQueen, Charles Crespi, Paul Silber, Stephen Strom

Although participants in this session concluded that no perfect in vitro human system is currently available, a number of systems show promise for development and could be used in combination to provide valuable information. (Participants also concluded that the HPV Challenge Program should take advantage of the wealth of information and experience with in vitro human systems shared by pharmaceutical companies.) The evaluation of available human systems was organized into four areas of consideration: suitability, availability, variability, and sensitivity.


  1. Engineered cells expressing human genes e.g. cytochrome P450 isoforms, are suitable models.
  2. Primary cells and tissue slices from various human organs, particularly hepatocytes, are also good models, but there are no suitable differentiated cell lines available.
  3. Standard conditions for cell cultures need to be established for the various models. These conditions should include media, additives, and extracellular matrix. One example of the necessity for standardization of conditions brought out by one of the speakers was the effect of bovine serum albumin in the medium on P450 induction.
  4. There is a need to look at dose/response relationships in these cellular systems.


While the availability of human tissue for experimental purposes used to be very limited, there are now a number of commercial and non-profit groups that provide human tissue. Thus availability is not as serious a concern as it once was.


Humans are a heterogeneous, outbred population. Variation can exist, but the level depends upon the parameter being measured. For example, there is wide variability among individuals for receptor levels and biotransformation enzymes, but for other markers, there is less variability. Thus, it is necessary to look at multiple donor samples and have a bank of reference samples available when conducting studies on human tissue. Maintaining a reference bank would necessitate characterizing a large number of cells from a single donor, emphasizing the importance of the development of good cryopreservation techniques for human cells.

It was noted however, that there is less variability among individuals than once thought and thus this should not be an obstacle to using human tissue.


Current indicators of toxicity are cytotoxic (e.g. membrane integrity, cell function) and genotoxic endpoints. More sensitive indicators of toxicity are needed. Ideally these should be markers of early events in the process that can be predictive of toxicity.

Non-Invasive Techniques

Chairperson: Martin Stephens
Speakers: Pamela Reilly Contag, Bernie Doerning, Ginger Moser, Raymond Poon

The four methods discussed in this session were telemetry, bioluminescent markers, a neurobehavioral screening battery, and urinary biomarkers. These methods have several advantages in common:

  1. The animal serves as its own control, thus reducing the number of animals;
  2. All are highly sensitive tests requiring lower doses of test compound, thus refining the procedure and yielding better data;
  3. All result in less pain and distress to the animals;
  4. The data produced are of high quality and are reproducible; and
  5. They provide a safe harbor for storage of additional data. With additional development, these methods offer an opportunity to provide earlier, more humane endpoints for toxicity tests, representing another facet of "thoughtful toxicology."

Specific recommendations were made for each of the methods discussed.


  1. Parallel studies should compare results in telemetrized animals vs. animals in traditional studies.
  2. Telemetric methods should be used to develop protocols with less distress and pain, i.e. to establish more humane endpoints.
  3. Animals should be used as their own controls.

Telemetric methods give an unfiltered readout of the physiological state, uncompromised by handling, etc., and can give insights into normal physiological processes such as diurnal cycling of various parameters.

Bioluminescent Markers

  1. Toxic markers, e.g. specific gene expression, should be validated against known compounds to quantitate standard responses.
  2. Early markers of toxicity that would reduce pain and stress should be validated.
  3. Non-invasive techniques for repeat measurements in the same animal (animal serves as its own control) should be used.

Bioluminescent markers offer researchers the ability to generate high-quality data, improve statistical analysis, provide models with clear relevance, and develop methods that result in no pain or distress and/or a reduction in the use of animals.

Neurobehavioral Screening Batteries

  1. These methods should be used routinely to obtain an expanded set of clinical observations in standard toxicity tests as well as in developmental and reproductive studies.
  2. A functional observational battery is already incorporated into some OECD protocols and should be more widely used.

Neurobehavioral screening batteries provide sensitive indicators of neurological effects and general health. They are non-invasive, non-stressful and result in the collection of a wealth of data from every subject. Also, these methods force scientists and technicians to look at animals to observe their responses to toxic insult.

Urinary Biomarkers

  1. The incorporation of routine urinalysis into relevant testing guidelines should be promoted.
  2. An ongoing databank for various parameters should be established to facilitate and encourage the use of urinary-based tests.
  3. More research is needed to identify new markers, particularly those related to pain and distress.
  4. There is a broad spectrum of fully automated urinalysis techniques, e.g. NMR, which should be explored as useful additions to chronic studies.

Cytotoxicity Assays

Chairperson: Rodger Curren
Speakers: Erik Walum, Bjorn Ekwall, John Frazier

Much of this session focused on the Multicentre Evaluation of In Vitro Cytotoxicity (MEIC) Program, a worldwide study to evaluate in vitro test results of 50 materials for predicting acute lethal concentration and dose in humans. The premise of this program is that chemicals kill an organism by affecting individual cells, and that toxicity from cells can be extrapolated to the death of the whole organism. In comparing the efficiency of contemporary toxicologic models to predict human toxicity, it was noted that for the LD50 of rat or mouse vs. human for the 50 chemicals, the r2 = 0.607. The weakest correlation in the group was off by 2.5 logs. For human cells as predictors of acute lethal concentration (concentration in the growth medium vs. concentration in serum), the estimated r2 = 0.69, with the weakest correlation being off by 2.5 logs. An important distinction to note is that the cell cultures do not predict a dose, but rather a lethal concentration. If kinetic information is incorporated into the prediction, better data result. There is an ongoing European effort to extend the findings of the MEIC study using an integrated approach from several assays, including those using neuronal components.

The need to develop accessory in vitro models to give additional toxicokinetic information was emphasized as a mechanism to increase the accuracy of the predictive power of existing in vitro models. Examples included models of gut absorption to assess bioavailability and correcting predictions based on whether the material can cross the blood-brain barrier. With additional endpoints such as these, it is hoped that dosage rather than lethal concentration can be predicted in the future.

Recommendations included:

  1. Cytotoxicity assays using human cells can give tremendous amount of information about toxicity and should be conducted at the beginning of any toxicological evaluation. At the very least they provide supplementary information such as handling precautions and dose setting.
  2. An effort should be made to collect better human in vivo data to improve evaluation of in vitro assays.
  3. These assays should be considered for assessment under the HPV Challenge Program, because of their potential and advanced development.
  4. Cytotoxicity measurements should be used to establish or confirm categories.

High Throughput Screening (HTS) Assays

Chairperson: Neil Wilcox
Speakers: Sandra Steiner, Jeff Paslay, Frank Sistare, Pamela Reilly Contag, J. Christopher Corton, Oliver Flint

HTS includes genomics, proteomics, in vitro models, non-invasive imaging, transgenic models, microarrays, SAR and combinatorial synthesis, data mining, and predictive modeling.

These assays share a number of common denominators:

  1. They are rapid screening tests for toxicity;
  2. They use small sample sizes and reagent volumes;
  3. They sample large numbers simultaneously;
  4. They use robotics;
  5. They are real-time analyses; and
  6. They result in the genesis of large databases.

The benefits of HTS methods are:

  1. Multi-cellular events are measured simultaneously.
  2. They can measure gene expression and have the potential to measure and identify mechanisms of toxicity and efficacy.
  3. They can predict drug interactions without animal testing.
  4. Non-invasive imaging techniques track biological events both spatially and temporally in living animals.
  5. They facilitate drug discovery and development.
  6. They provide cell-based assays and physiological environments to measure and cross-correlate multiple parameters in the same cell.
  7. For drug testing, they provide an analysis of specific cell subpopulations or cell types in mixed populations, allowing the prediction of untoward reactions in specific subpopulations. This is important to prospectively determine the potential of a subpopulation to react negatively to a new drug.
  8. They allow for more efficient use of animal tissues.
  9. They allow for simultaneous analysis, e.g. histopathology and toxicogenomic testing.
  10. Since tissues behave differently ex vivo, imaging techniques and HTS technologies can be used to better evaluate data.
  11. They have high potential from a regulatory perspective.
  12. New technical approaches improve the accuracy and speed of predicting human toxic response to xenobiotics.

The limitations of HTS technology relate to the potential difficulty in establishing a hierarchy of toxicology study questions. An additional limitation is that these novel methods, integrated early in the evaluation stage, are considered supportive rather than replacement of animals.

Recommendations included:

  1. An array of useful HTS techniques for toxicity testing should be developed. Also, a rubric should be developed that describes the various methods, endpoints, and relevance of these endpoints in toxicity studies.
  2. A workshop to explore potential test batteries that would compare in vivo data, in vitro data and up/down regulation of genes should be organized. (At the present time, many questions are generated by HTS techniques with regard to their uses and benefits in toxicity testing. For example, how can gene expression studies by HTS be used in conjunction with in vitro and in vivo toxicity screenings?) Another goal of such a workshop would be to identify endpoints and group specific responses into classes.