1,238
Views
0
CrossRef citations to date
0
Altmetric
Review Articles

Investigations on learning and memory function in extended one-generation reproductive toxicity studies – when considered needed and based on what?

, , &
Pages 372-384 | Received 20 Apr 2023, Accepted 07 Jul 2023, Published online: 04 Aug 2023

Abstract

To justify investigations on learning and memory (L&M) function in extended one-generation reproductive toxicity studies (EOGRTS; Organization for Economic Co-operation and Development (OECD) test guideline (TG) 443) for registration under Registration, Evaluation, Authorization, and Restriction of Chemical (REACH), the European Chemicals Agency has referred to three publications based on which the Agency concluded that “perturbation of thyroid hormone signaling in offspring affects spatial cognitive abilities (learning and memory)” and “Therefore, it is necessary to conduct spatial learning and memory tests for F1 animals”. In this paper, the inclusion of the requested L&M tests in an EOGRTS is challenged. In addition, next to the question on the validity of rodent models in general for testing thyroid hormone-dependent perturbations in brain development, the reliability of the publications specifically relied upon by the agency is questioned as these contain numerous fundamental errors in study methodology, design, and data reporting, provide contradicting results, lack crucial information to validate the results and exclude confounding factors, and finally show no causal relationship. Therefore, in our opinion, these publications cannot be used to substantiate, support, or conclude that decreases in blood thyroid (T4) hormone level on their own would result in impaired L&M in rats and are thus not adequate to use as fundament to ask for L&M testing as part of an EOGRTS.

1. Introduction

To additionally investigate spatial learning and memory (L&M) as part of investigation of neurodevelopmental toxicity in extended one-generation reproductive toxicity studies (EOGRTS; Organization for Economic Co-operation and Development (OECD) test guideline (TG) 443), the European Chemical Agency (ECHA) recently – when they see a concern on thyroid toxicity – started to require conducting the Morris water maze (Morris) test or radial arm maze (RAM) test at one time point, and the Cincinnati water maze (Cincinnati) test at the other time point in F1 animals of cohort 2 as these tests “appear to be the most sensitive tests”. To justify thyroid toxicity-related effects on such L&M function in test animals, the agency referred to three publications (Axelstad et al. Citation2008; van Wijk et al. Citation2008; Amano et al. Citation2018) based on which they concluded that the “perturbation of thyroid hormone signalling in offspring affects spatial cognitive abilities (learning and memory)” and “Therefore, it is necessary to conduct spatial learning and memory tests for F1 animals”.

The main objective of the EOGRTS is “to evaluate specific life stages not covered by other types of toxicity studies and test for effects that may occur as a result of pre- and postnatal chemical exposure. The EOGRTS then serves as a test for reproductive endpoints that require the interaction of males with females, females with conceptus, and females with offspring and the F1 generation until after sexual maturity” (OECD Citation2018a, section 4). The EOGRTS is a standard Annex X testing requirement under EU Registration, Evaluation, Authorization, and Restriction of Chemical (REACH), and a requirement under Annex IX “in case of adverse effects on reproductive organs or tissues or other concerns in relation with reproductive toxicity” (OECD Citation2018a).

In the EOGRTS, under the heading “Assessment of potential developmental neurotoxicity”, only the following is stated regarding cognitive testing (OECD Citation2018a, section 51): “If existing information indicates the need for other functional testing (e.g. sensory, social, cognitive), these should be integrated without compromising the integrity of the other evaluations conducted in the study…”.

It is, however, not stated how this should be done and the paragraph is way too vague to establish a standard information requirement. The paragraph only refers to “other functional testing” without any indication what this testing might entail. When looking at the specific TG for neurodevelopmental toxicity testing (OECD TG 426) the following information regarding cognitive testing is provided: “A test of associative learning and memory should be conducted post-weaning (e.g. 25 ± 2 days) and for young adults (Post-natal day (PND 60) and older)…. The same or separate test(s) may be used at these two stages of development………. If the test(s) of learning and memory reveal(s) an effect of the test substance, additional tests to rule out alternative interpretations based on alterations in sensory, motivational, and/or motor capacities may be considered. In addition to the above two criteria, it is recommended that the test of learning and memory be chosen on the basis of its demonstrated sensitivity to the class of compound under investigation, if such information is available in the literature. In the absence of such information, examples of tests that could be made to meet the above criteria include: passive avoidance, delayed-matching-to-position for the adult rat and for the infant rat, olfactory conditioning, Morris water maze, Biel or Cincinnati maze, radial arm maze, T-maze, and acquisition and retention of schedule-controlled behaviour. Additional tests are described in the literature for weanling and adult rats” (OECD Citation2007, section 37).

The OECD TG 426 is not a standard testing requirement under EU REACH, but this study can be requested in case of a substance evaluation or may be proposed by the registrant. It is designed “to provide data, including dose-response characterizations, on the potential functional and morphological effects on the developing nervous system of the offspring that may arise from exposure in utero and during early life. A developmental neurotoxicity study can be conducted as a separate study, incorporated into a reproductive toxicity and/or adult neurotoxicity study or added onto a prenatal developmental toxicity study. When the developmental neurotoxicity study is incorporated within or attached to another study, it is imperative to preserve the integrity of both study types” (OECD Citation2007).

This review paper addresses issues relating to the inclusion and performance of the requested L&M tests as part of an EOGRTS, the validity of rodent models in general for testing thyroid hormone-dependent perturbations in brain development and includes a thorough evaluation of the three publications (i.e. Axelstad et al. Citation2008; van Wijk et al. Citation2008; Amano et al. Citation2018) specifically relied upon by the Agency for inclusion of L&M testing as part of an EOGRTS.

2. Review of the studies used to request learning and memory testing

2.1. Short description of the experimental set-up of the three publications

In , a short description of the set-up of the studies in the three publications (i.e. Axelstad et al. Citation2008; van Wijk et al. Citation2008; Amano et al. Citation2018) is provided. In two of the three studies, propylthiouracil (PTU) was used (Axelstad et al. Citation2008; Amano et al. Citation2018). PTU is a well-known anti-thyroid drug that inhibits both the synthesis of thyroid hormones in the thyroid gland, and the conversion of T4 to its active form T3 in peripheral tissues. In the van Wijk et al. (Citation2008) study, to induce hypothyroidism, both dams and offspring were fed an iodide-poor diet and drinking water with 0.75% sodium perchlorate. In two of the three studies, rats were used, whereas in the third study mice were tested. Regarding L&M testing, the Morris water maze test was used in two studies, the RAM test in one study, and the Object Recognition Test (ORT) and Object in-Location Test (OLT) in the third study.

Table 1. Summary of the three publications in which L&M was investigated.

2.2. Deficiencies in the three studies in which L&M was investigated, including deviations from OECD TG 426

Several deficiencies were identified in each of the papers related to the inclusion and performance of the requested L&M tests as part of an EOGRTS. These are summarized in and are among others: lack of information on test substance intake and/or TH deficiency status, required dosing period, recommended number of test animals and gender, randomization and blinding, time of blood sampling and method of TH measurements, and age of the animals at the time L&M was investigated.

Table 2. Table of deficiencies of the three publications in which L&M was investigated, including deviations from OECD TG 426.

According to OECD TG 426 – in which L&M testing is included as a standard requirement – dosing of the dams should take place as a minimum from the time of implantation (gestation day (GD) 6) throughout lactation (post-natal day (PND) 21) so that the pups are exposed to the test substance during pre- and postnatal neurological development; dosing may begin from the initiation of pregnancy (GD 0) although consideration should be given to the potential of the test substance to cause pre-implantation loss. All three studies were not in line with this requirement as shown in .

Also, according to OECD TG 426, only dams should be dosed to assess pre- and postnatal neurological development; direct dosing of pups – during lactation – should be considered based on exposure and pharmacokinetic information but careful consideration of benefits and disadvantages should be made prior to conducting direct dosing studies (Arts and Beekhuijzen Citation2020). However, in the van Wijk et al. (Citation2008) study, weanling pups were also exposed ().

In an OECD TG 426 study, each test and control group should contain a sufficient number of pregnant females to be exposed to the test substance to ensure that adequate numbers of offspring are produced for neurotoxicity evaluation; a total of 20 litters are recommended at each dose level. At least 20 pregnant females per group should be used in an OECD TG 443. In all three studies, a lower number of animals were used.

In the Amano et al. (Citation2018) study, 1–3 males per litter were used per experiment (). However, littermates have overlapping genetic, epigenetic, and experimental influences that make littermates more similar to one another than to offspring from another litter. The problem of oversampling multiple offspring per litter and treating them as if they were independent has been well described (Holson and Pearce Citation1992; Lazic and Essioux Citation2013). As such, only one male pup and/or one female pup per litter should have been used (Vorhees and Williams Citation2014) which would imply a substantially higher number of dams should have been used as required in the OECD TGs indicated above.

Also, in OECD TG 426, it is indicated that selection of pups should be performed so that, to the extent possible, both sexes from each litter in each dose group are equally represented in all tests. In the Axelstad et al. (Citation2008) study, only male rats were used in the RAM test. For both studies (Axelstad et al. Citation2008; Amano et al. Citation2018), no explanation was given why only one sex was used.

In case of OECD guideline testing (see e.g. OECD TG 408, 90-day study), total T4, T3 and TSH (thyroid-stimulating hormone) should be measured in serum samples (OECD Citation2018b, section 34). Hormones may be measured in plasma if appropriate validation and historical control data are available. In the van Wijk et al. (Citation2008) paper, in which plasma was collected, no information was provided on validation or historical control data. In addition, the collected plasma was stored at −20 °C; whereas in OECD TG 408 it is indicated that stability of T3, T4, and TSH under selected storage conditions should be tested as part of the hormonal assay validation (OECD Citation2018b, section 36).

In the Amano et al. (Citation2018) study using mice, PTU treatment induced contradicting findings in dams and pups: thyroid hormone status (measured by TSH, free T3 (fT3), and free T4 (fT4) in serum) was statistically significantly different. In dams of the high dose group, TSH was higher and fT3 and fT4 lower than control; in dams of the low dose group only fT4 was lower than control. However, in pups of the high dose group, fT4 was lower than control, whereas in the low dose group fT4 was higher than control. The fact that thyroid hormone levels were normal in adult offspring is in line with the results of van Wijk et al. (Citation2008), such that within 14 days plasma hormone levels were comparable to those of the control animals.

In the Axelstad et al. (Citation2008) study, only total serum T4 was measured (no TSH and T3). In addition, on PND 16, blood for T4 analysis was pooled for all male and all female pups within each litter, thereby not taking individual variations into account. It should also be noted that the pregnant F0 rats were set on a reverse light–dark cycle (light from 9 PM, dark from 9 AM) at their arrival on GD 4. As the secretion profile of TSH, T4, and T3 is highly correlated with circadian rhythm, this adds another confounder, and this practice is not in line with the OECD testing guidelines.

In addition, in none of the three papers, a description was provided on the time of blood sampling. Considering the circadian variability of thyroid hormones, a prolonged sampling will contribute to the variability of measured values.

OECD TG 426 requires testing of functional and behavioral endpoints at two time points, viz. at PND 25 ± 2 for young pups and at PND 60–70 for testing young adults (OECD Citation2007, p. 7). In all three studies, investigations were not done in young pups but only performed at or around the 2nd time point; and in the Axelstad et al. (Citation2008) study also at 5–6 months ().

2.3. Contradicting findings in cognitive testing within and between the three publications

Regarding the L&M testing performed, contradicting findings were noted within and between the three studies; these are summarized in . In the Axelstad et al. (Citation2008) study, no changes were noted in the Morris test (with males and females) performed at 8–9 weeks of age whereas statistically significant changes were observed in the RAM test (only males tested) at 5–6 months of age.

Table 3. Summary of cognitive findings in the three publications.

The conflicting outcome is quite odd time wise and considering both tests are supposed to address spatial L&M because both tests are highly dependent on hippocampal function. Also, the description of how the RAM test was conducted was limited. Standard RAM testing starts with acclimating the rats to the maze for 1–3 days where the baits are scattered across the arms and the animals are given the chance to explore the entire maze. Next, for 3–5 days, the baits are placed at the distal ends of the arms and the animals learn to traverse the entire length of the arm and get the bait. Then, the actual test starts where the animals are forced to search for the bait arm by arm by closing the door to the center once the animal has chosen which arm it wants to go into. If the center is kept open all the time, as described in the paper, rats quickly learn a shortcut – they go from adjacent arm to adjacent arm until all baits are found. So, it is “always turn left” or “always turn right” that they learn which is not spatial (allocentric) learning but to a certain extent route-based egocentric learning (dependent on frontal cortex function). This deviation from standard procedure makes the results hard to compare with results of other RAM tests. It is also remarkable that the treated animals would have had so many more errors compared to control without any change in latency. As the test is finished when all baits were taken, it could be expected that re-visiting already empty arms several times would take treated animals a lot more time, particularly if they would have been so bad in figuring out the adjacent arm method as shown in the Axelstad et al. (Citation2008) study.

In the RAM test, there was a significant increase in errors at the mid and high dose. It was indicated that each point represents a litter, which shows the number of errors made by males in that litter during 3 weeks of testing. Data are presented on litter basis – not on individual males – so there is no information on individual variation. The authors concluded that the total number of errors was significantly elevated for both the 1.6 mg/kg bw (p = 0.0006) and the 2.4 mg/kg bw males (p < 0.0001) and that it correlated very well with maternal levels of T4; the lower the levels of T4, the more errors were made (Axelstad et al. Citation2008). Both the correlation with maternal T4 (GD 15) (p < 0.0001; R2 = 0.240) and pup T4 (p < 0.0001; R2 = 0.275) was statistically significant and thus, according to the authors, between 24 and 28% of the variation in the number of errors could be explained by T4 levels (Axelstad et al. Citation2008).

It should be noted that one week before testing, the animals were housed one per cage and given a restricted amount of food (15 g/day); the food restriction continued throughout the 3-week testing period (five trials per week). This restriction was, according to the authors, expected to lead to a decrease in body weight of approximately 10–15%, at the end of the testing period (based on previous studies); however, data for this 4-week restriction period were not reported. Such a significant decrease in body weight is not in line with criteria indicated in OECD Guidance Document 19 (OECD Citation2000). Second, in the text, it was mentioned that for the males in the RAM test the correlation coefficient of 0.28 was linked to pup T4 levels (data not shown). However, it is not clear in which subset pup T4 was measured; because T4 data were provided for males and females (see Axelstad et al. Citation2008) it suggests that T4 was only measured in the animals of the other subgroup, and not in the males used in the RAM test. In that case, a direct correlation between RAM test results and T4 cannot be made.

Linear regression was used in this study to investigate the possible relationship between T4 and L&M. It must be questioned, however, whether simple linear regression is sufficient for the complex process of neurodevelopment. To reduce spurious correlations when analyzing observational data, usually several other variables are included in regression models in addition to the variable of primary interest. It is, however, never possible to include all possible confounding variables in an empirical analysis. For this reason, randomized controlled trials are often able to generate more compelling evidence of causal relationships than can be obtained using regression analyses of observational data. Regression analysis is in principle a reliable method of identifying which variables have impact on a topic of interest. The process of performing a regression allows one to confidently determine which factors matter most, which factors can be ignored, and how these factors influence each other. In the paper of Axelstad et al. (Citation2008), however, all investigations on correlation were linked to only one variable, via simple regression analysis (viz. thyroid hormone T4 as the variable of primary (and only) interest), thereby neglecting all other confounding independent variables that could have played a role.

According to Vorhees and Williams (Citation2014), careful control procedures are needed to ensure that differences are what they appear to be and not secondary to confounding factors. Also, according to OECD TG 426 (OECD Citation2007, section 37), if the test(s) for L&M would reveal an effect of the test substance, additional tests may be considered to rule out alternative interpretations based on alterations in sensory, motivational, and/or motor capacities. Overall, the authors did not look at any other independent variable or investigated confounding factors, and have, therefore, not tried to find any other explanation for the apparent positive result in the RAM test.

Moreover, the correlation coefficients between T4 and the dependent variable for L&M in the Axelstad et al. (Citation2008) paper were quite low; viz. the highest value was 0.28. The R-squared (or R2) is a goodness-of-fit measure for linear regression models. R2 measures the strength of the relationship on a 0–100% scale. The authors concluded that 24% of the variation in the number of errors could be explained by maternal T4 levels. Thus, in other words, 76% of the variation in the errors cannot be explained by maternal T4 and would be explained by other independent variables than maternal T4. Regarding the reported statistically significant p values: as can be expected with these data, the sample data provide enough evidence to reject the null hypothesis indicating there is a non-zero correlation, but correlation does not imply causation.

The authors explained the “higher sensitivity” of the RAM test than the Morris test in their study by the fact that the subset of animals used for the RAM test had not been handled very much for a long period of time in contrast to the animals used in the Morris test. This is surprising as in general the RAM test is not a sensitive test because scoring is rather simple, viz. animals go or do not go to the feed. In contrast, in a water maze, the animals are forced to swim. Also, in the way, the RAM test was used in this study, being a mixture of navigational abilities including egocentric and allocentric abilities, the test is incomparable to the RAM test as conducted by Vorhees and Williams (Citation2014). In addition, non-handled rats learned less rapidly than handled rats (Holscher Citation1999) and according to Hodges (Citation1996) learning is much slower in the RAM test than in the Morris test. This was interpreted as an advantage in one sense because slower learning results in a more protracted learning curve, which in turn make deviations in the slope of the curve more apparent. However, the disadvantage of longer learning times and non-spatial learning components is that interpretation of RAM test data can be challenging (Hodges Citation1996).

In the van Wijk et al. (Citation2008) study, the Morris test was performed (this test was negative in the Axelstad et al. (Citation2008) study). The Morris test was carried out on PND 61–65. According to the authors (van Wijk et al. Citation2008, p. 1204), “All rats, from all experimental groups and of both sexes, learned to find the platform, but all hypothyroid rats and the hypo-normal female group showed a significantly increased cumulative search error, indicating a decreased performance in the training sessions. Moreover, the rate of learning was found to be different between the female experimental groups, with hypothyroid female rats underperforming compared with their peers”.

The underperforming of the “H” group in the van Wijk et al. (Citation2008) study, compared to control, does not really seem surprising when looking at the body weights in this group compared to controls. At PND 61–65, the body weights of the rats of this “H” group were substantially lower (∼33% of control weight only); also, these hypothyroid animals were dwarfed and showed a disturbed spastic behavior (hopping and darting). It is known that chronic food-restriction can result in decreased memory function (Carlini et al. Citation2008; Hemb et al. Citation2010; Ilochi et al. Citation2019). Also, according to OECD guidance document no. 19 (OECD Citation2000, section 36) in young animals that have not reached their adult body weight, an abnormal condition may be indicated by a reduced rate of weight gain when compared to the appropriately matched control animal, rather than an actual weight loss. The strongly reduced weight gain in these young animals of this group is obvious. In addition, as the animals were dwarfed and showed a disturbed spastic behavior, it could even be questioned whether these animals should have been humanely killed for animal welfare reasons according to the criteria indicated in OECD guidance document no. 19 (OECD Citation2000).

The difference between the male and female “H-N” group – i.e. no differences in males in cumulative search error – was explained by the authors as being “inconsistent with other studies” as if cognitive performance in the male “H-N” group should have been different from controls rather than that the findings in females would be the exception. One could suggest that female body weight on PND 65 (Morris test and probe measurements were conducted on PND 61–66) cannot be a confounding factor because body weight was not statistically significantly different from controls on PND 65; however, body weight was statistically significantly lower than control on all measurement days between PND 20–56. Moreover, the lower body weight may even have been the cause for underperforming (see the references indicated above; Carlini et al. Citation2008; Hemb et al. Citation2010; Ilochi et al. Citation2019), especially in the “H” group. Also, the learning curve for the “H-N” females was quite divergent, not only from the other two female groups but also from all male groups.

There were no differences noted with control in T4 of males and females of the “H-N” group on PND 76, and it is highly unlikely there would have been a significant difference in T4 between males and females of this group on PND 61–65 (with their respective control) as they were already at least 45 days on a normal diet. Indeed, the authors indicated that in a concurrent experiment employing the same experimental set-up and method to induce hypothyroidism, blood was obtained from pups and the plasma TSH, T4, and T3 levels determined regularly from postnatal day 14 to 76. Within 14 days, plasma hormone levels in the “H-N” group were similar to those of the control animals.

In the Amano et al. (Citation2018) study, cognitive performance was investigated in adult offspring mice on PND 56 by an ORT and an OLT. The ORT and OLT were used to evaluate visual recognition and spatial memory. Such tests are memory tests and not learning tests as there is no learning curve possible with a single assessment. Also, the tests are described as “visual recognition and spatial memory”. OLT performance was statistically significantly lower in both PTU-treated groups compared to controls; however, a dose–response relationship was absent regardless the 10 times difference in PTU concentration between the two dose groups. No differences in ORT performance were seen in the PTU-treated groups compared to controls.

OLT and ORT were also performed on day 84 in combination with in vivo microdialysis (modified test). This time there was only a statistically significant difference at 50 ppm PTU in the OLT, and again there was no significant difference in the ORT. The difference of the results between the normal and the modified tests, according to the authors, may have been caused by the difficulty in moving around with dialysis probes even though the mice were allowed to move freely to search objects in both tests. Concentrations of neurotransmitters were lower than control, but a dose–response relationship was absent; also, neurotransmitter levels did not change among the phases (viz. home cage, sample-exposure phase, OLT, and ORT).

In addition, a visual discrimination test was conducted in the period between PND 56 and 84 with a touch-screen operant test system for mice. The body weight of mice was kept at 85% of normal weight during the test by food deprivation. Mice were tested for 50 trials/day and this was continued for seven consecutive days. PTU treatment altered the number of sessions required to reach 70% correction; the acquisition of visual discrimination was slightly, statistically significantly delayed in the 50-ppm group on measurement days 3 and 4 but not on days 2 or 5–7. In addition, on day 1 all groups showed discrimination around chance level. Thus, learning curves were quite similar.

However, because food restriction was used, one should ensure that animals in experimental and control groups are equally hungry and hence equally motivated to perform for food, which are important issues in neurotoxicological experiments. This can be problematic if the treatment reduces body weight or suppresses appetite or palatability of food (Vorhees and Williams Citation2014). Equating the incentive value of the reinforcement is not tested (at least not reported here), leaving questions about how well matched the groups may have been. Indeed, in the paper, it was stated that in the 50 ppm PTU group, body weight gain (p < 0.0001) and body weight (p < 0.05; data not shown) were significantly lower than control on PND 21, which persisted up to PND 60 (p < 0.05; data not shown).

2.4. Weak correlation between findings on thyroid histopathology and weight, and cognitive testing

In the Axelstad et al. (Citation2008) study, thyroid toxicity was clearly present upon histopathological examination. In all test groups, and at different ages, thyroid histopathological findings noted were generally described as moderate to marked and absolute thyroid weights were strongly increased, up to more than two times the control value but findings on L&M in offspring were limited (). More specifically, even in the presence of marked thyroid histopathological changes and strong increases in thyroid weight with the known thyroid toxicant PTU, L&M testing in the offspring showed inconsistent findings, viz. a negative Morris test and a RAM test with only weak associations to T4 levels (correlation coefficients of 0.24 and 0.28).

Table 4. Correlation between findings on thyroid histopathology and weight, and cognitive testing in the three publications.

In the van Wijk et al. (Citation2008) and Amano et al. (Citation2018) studies, no histopathological examination of the thyroid was performed, and neither was thyroid weight determined. Effects on thyroid weight and histopathology in repeated dose studies are generally used as trigger to require neurodevelopmental toxicity testing. This means that these two papers cannot be used to determine how severe thyroid toxicity should be to be able to detect neurodevelopmental toxicity, including cognitive effects, of chemicals.

2.5. Overall evaluation of the three publications

In summary, all three publications contain numerous fundamental errors and/or lack crucial information to validate the results and exclude confounding factors. Even with a well-known thyroid toxicant such as PTU, resulting in marked thyroid toxicity as indicated by thyroid histopathology and weight data, conflicting results were obtained in the two L&M tests, one being negative and in the “positive” test the outcome was weak (Axelstad et al. Citation2008). In the van Wijk et al. (Citation2008) study, regarding L&M testing via the Morris test (which was inconsistent with the Axelstad et al. (Citation2008) study with a negative Morris test), under the “H-N” condition, conflicting results were obtained between males and females. It is also clear from this study that under the “H” condition, animals were too small, even dwarfed, to be able to perform in the same way as controls which had ∼3-times higher body weights, highly suggestive of alterations in motor capacities in animals of the “H” group. Finally, in the Amano et al. (Citation2018) study, regardless of the contradicting results between ORT and OLT, and in the absence of a dose–response although there was a 10 times difference in dose, the authors concluded, based on the modified ORT/OLT “it is clear that the spatial memory is impaired in the 50-ppm group”. This is a far-fetched and over-interpretation of the results.

3. Improved design of OECD TG 443 in case of inclusion of developmental neurotoxicity investigations including L&M testing

Based on all deficiencies and other issues described in section 2, the three publications (Axelstad et al. Citation2008; van Wijk et al. Citation2008; Amano et al. Citation2018) should not be considered pivotal for inclusion of L&M in an OECD TG 443.

The question thus is, what is the purpose of inclusion of L&M testing in an OECD TG 443 if this would be prone to provide false/erroneous results and thus misleading information due to faulty design, wrong assumptions regarding TH hormone levels, and neurodevelopment?

According to the agency, in an OECD TG 443 priority should not be given to developmental neurotoxicity (DNT) but to fertility effects, indicating that the study should be designed in such a way as to ensure adequate assessment of sexual function and fertility is possible, and therefore, the dose levels should not be reduced to get enough offspring for the assessment of developmental toxicity (ECHA Citation2022). This is an intrinsic conflict of aims that cannot be solved without a compromise. However, according to the ECHA/European Food Safety Authority (EFSA) guidance on endocrine disruption (ECHA and EFSA Citation2018, p. 21): "Where potentially endocrine-related adverse effects are only observed at excessive toxic dose/concentration (i.e. only observed above the MTD or MTC) they should not be considered indicative of endocrine disruption. Justification of this excessive toxicity should be provided". Thus, (too) high dose testing is in clear conflict with the initial requirement of (also) capturing (subtle) neurobehavioral changes to identify a DNT hazard.

Also, the EOGRTS was never intended to be a final study providing answers to all possible toxicological issues. According to Cooper et al. (Citation2006, p. 77), when the EOGRTS was developed: "It is not the intent of the proposed life stages testing paradigm to necessarily identify every potential effect and fully characterize it but to generate interpretable data that may trigger further evaluation in Tier 2". According to them, the sheer complexity and timing of the study would require making compromises, e.g. in terms of dose level setting, animal numbers in specific investigations and the number of investigations, which are feasible to run in the frame of this study without compromising its quality. In OECD guidance document no. 151, the following is stated on the evaluation of DNT: “The neurotoxicity testing in TG 443 aims to provide an initial assessment of neurotoxicity potential but does not include all facets of a complete Developmental Neurotoxicity (DNT) study. Thus, it is not intended as a replacement for a DNT study, nor is it appropriate to interpret the results from the TG 443 DNT assessment as a replacement for conducting a DNT study where one may be required. Interpretation of TG 443 DNT test results should take into account available information on mechanisms of action, toxicokinetics, maternal toxicity and potential indirect effects on offspring, as well as any available data on neurotoxic effects of the specific test chemical. Such results may indicate additional targeted DNT testing may be required” (OECD Citation2013, section 100). Moreover, the EOGRTS guideline only refers to “other functional testing (e.g. sensory, social, cognitive)” without any further indication and guidance other than that “these should be integrated without compromising the integrity of the other evaluations conducted in the study…” (OECD Citation2018a, section 51).

At present in the EU, the EOGRTS unfortunately seems to be seen as an "all-in-one study suitable for every purpose", to be expanded at regulators’ wishes and which would provide all the final answers regulators seek. The request to add specific L&M testing to an already full-blown EOGRTS is in our view the living proof.

As indicated above, the EOGRTS would require high parental dosing to assess fertility (ECHA Citation2022); the same high dose (in mg/kg bw/day) would need to be provided also to the offspring from weaning (if offspring is still available). However, from neurodevelopmental test results (including L&M) obtained in an EOGRTS, it will not be possible to distinguish between DNT potentially developed during gestation and/or lactation or caused by direct exposure of the pups after PND 21. In the context of OECD TG 426, animals selected for L&M testing are not exposed post-weaning (after PND 21) to ensure that effects can clearly be attributed to exposure of offspring during major nervous system development during gestation and lactation.

Thus, the current OECD TG 443 DNT cohort is very poorly suited to identify a potential DNT hazard and any additional L&M testing currently requested will suffer from the same shortcomings. Moreover, the inclusion of L&M testing into an OECD TG 443 study design has not at all been validated, CROs have not yet shown proficiency for this kind of testing in the context of an OECD TG 443, and historical controls for these investigations in an OECD TG 443 are lacking. In general, prior to including additional parameters into existing guideline studies, their suitability and effectiveness should be evaluated before broadly requesting them. In analogy, the inclusion of parameters suitable to detect endocrine disruption, such as hormone measurements into existing regulatory studies including OECD TG 443, was preceded by a thorough validation process. During validation, it was established which parameters were useful and which were not, also in view of practical performance and background variability.

Also, as mentioned earlier, the OECD TG 426, which has incorporated L&M testing of non-treated pups in its fundamental design, indicates that cognitive tests are to be “chosen on the basis of its demonstrated sensitivity to the class of compound under investigation” indicating that at least some validation is required. Because the OECD TG 443 design fundamentally differs from the OECD TG 426, as explained above, it would be essential to validate cognitive testing within the EOGRTS study design first, if at all possible, in view of the currently required dosing beyond weaning and the required high dose levels to start with. OECD TG 426 indicates that when the DNT study is incorporated within or attached to another study, it is imperative to preserve the integrity of both study types (OECD Citation2007), but that seems impossible if OECD TG 443 requires pup dosing and OECD TG 426 does not.

To prevent both an OECD TG 443 study (to assess fertility) and an OECD TG 426 study (to assess DNT) would need to be carried out – at the expense of several thousand of animals – it might be a better idea that in case inclusion of DNT cohorts 2A/B would be needed in an EOGRTS, these pups would not be dosed after weaning. This would not only be in line with OECD TG 426 to investigate perinatal neurodevelopment, it would also prevent differences in outcome between the pups that are examined at the young age (PND 25 ± 2), which would only have been directly dosed for a few days after weaning and the pups at the older age, which will have been dosed for at least 6 weeks, at the same very high dose as required for the high dose parental animals. Animals of cohorts 1A/B, in contrast, which will be continuously dosed from weaning, can then be used for the toxicity and the fertility part of the study, also in case a second generation needs to be generated. In addition, a subgroup C, exposed from weaning, can be used together with cohorts 1A/B to maintain 3 pups/sex/litter dose (or 60 pups/sex/dose) until sexual maturation. In the same way, no dosing should also apply to cohort 3 to investigate developmental immunotoxicity in case such a cohort would need to be included in the study as well.

Overall, to accommodate – when required based on available data – both fertility and DNT aspects, the following possibilities should be provided in OECD TG 443: (a) to not dosing DNT cohorts 2A/B, and (b) to reduce the dose levels for pups of cohorts 1A/1B (and C) when needed, e.g. in case a dose-range finding study would have shown higher sensitivity of pups compared to (young) adults.

4. Discussion and conclusions

All three studies (Axelstad et al. Citation2008; van Wijk et al. Citation2008; Amano et al. Citation2018) contain numerous fundamental errors in study methodology, design and data reporting, provide contradicting results and lack crucial information to validate the results and to exclude confounding factors.

Regarding the Amano et al. (Citation2018) study, it can be questioned why this paper was used as a reference by the Agency at all because in this study a different animal model was used (mice and not rats) and other L&M tests were performed (viz. OLT/ORT and visual discrimination test) than required, which renders this study irrelevant to use as a basis to include the RAM or the Morris and Cincinnati water maze test (Cincinnati test) as cognitive testing in an EOGRTS.

The Axelstad et al. (Citation2008) study showed that exposure to PTU, as can be expected from a thyroid hormone synthesis inhibitor, resulted in lower T4 levels. Remarkably, however, even with a well-known thyroid toxicant as PTU, which resulted in marked thyroid histopathology and strongly increased thyroid weight, conflicting results were obtained in the two L&M tests used, one being negative and the “positive” test only showing a weak correlation with T4.

The van Wijk et al. (Citation2008) study illustrated that in case of induced hypothyroid conditions, as can be expected, lower T4 levels were noted. The authors, however, failed to show a consistent finding on cognitive performance as conflicting results were obtained between males and females and confounding factors, such as significantly reduced body weight, were not considered adequately.

Regarding the number of animals used, Vorhees and Williams (Citation2014) noted the following: According to their experience with the Morris test and other water mazes, group sizes less than 10 can be unreliable; it is not a justification to underpower experiments and run the risk of false positives, which, in the long run, cost more time, more animals, and more money to prove or disprove. In the study by van Wijk et al. (Citation2008), group size was limited to 8 pups/sex/group. In addition, according to Vorhees and Williams (Citation2014), littermates have overlapping genetic, epigenetic, and experimental influences that make littermates more similar to one another than to offspring from another litter. As in prenatal studies, it is the pregnant female animal that is randomly assigned to experimental groups, the offspring are yoked to the maternal treatment and hence within a litter the pups are not orthogonal to one another. If treated as if they are orthogonal, this violates one of the basic assumptions of inferential statistics that subjects be independent. The problem of oversampling multiple offspring per litter and treating them as if independent has been well described (Holson and Pearce Citation1992; Lazic and Essioux Citation2013). A solution would be to include sampling only one pup per litter or only one male and one female pup (Vorhees and Williams Citation2014). It is not clear in this study how many pups from one original litter were in each group.

Careful control procedures are needed to ensure that differences are what they appear to be and not secondary to confounding factors (Vorhees and Williams Citation2014). Also, according to OECD TG 426 (OECD Citation2007, section 37), if the test(s) for L&M would reveal an effect of the test substance, additional tests may be considered to rule out alternative interpretations based on alterations in sensory, motivational, and/or motor capacities. In addition, many users of the maze fail to include all of the procedures that permit interpretation of differences and account for potential performance factors (Vorhees and Williams Citation2006). Control procedures are lacking and only one L&M test period was investigated (viz. not at the young age).

Measurement of TSH level is the preferred test for initial evaluation of suspected primary hypothyroidism. If TSH is abnormal, a fT4 level will further narrow the diagnosis. In current guideline testing such as the 90-day study (OECD TG 408), it is required to measure not only total T4, but also total TSH and T3. TSH is a pituitary hormone that regulates thyroid hormone production and release from the thyroid gland. T4 is the main circulating thyroid gland product, which is converted to T3, the active form of thyroid hormone. fT4 travels into body tissues that use T4. Bound T4 attaches to proteins that prevent it from entering these tissues. More than 99% of T4 is bound, thus free T4 would be the more important hormone to measure and, thus, total T4 levels in serum are not predictive of T4 levels in the relevant tissues.

The adverse outcome pathway (AOP) for reduced T4 in serum potentially leading to L&M impairment includes several steps (key events) (see AOP-Wiki Citation2023a, Citation2023b, a.o. https://aopwiki.org/aops/54 and https://aopwiki.org/relationships/312). The biological relationship between key event “T4 decreased in serum” to the next key event “T4 in neuronal tissue decreased” is a well-accepted dogma within the scientific community; there is no doubt that decreased circulating T4 can lead to declines in tissue concentrations of T4 and T3 in a variety of tissues, including brain. However, there are also compensatory mechanisms (e.g. upregulation of deiodinases and transporters) that may alter the relationship between hormones in the periphery and hormone concentrations in the brain. In addition, there is limited information available on the quantitative relationship between circulating levels of thyroid hormones, these compensatory processes, and neuronal T4 concentrations, especially in different brain regions, and especially during development. Similarly, the degree to which serum thyroid hormones must drop to overwhelm these compensatory responses has not been established. There are likely different quantitative relationships between these two key events depending on the compensatory ability based on both developmental stage and specific brain region (AOP-Wiki Citation2023a, Citation2023b).

For all steps (key events) between reduced serum T4 and L&M impairment (the adverse outcome), no (OECD TG) tests are available. The EU funded FP7 Athena project (https://cordis.europa.eu/project/id/313220) aimed to identify appropriate method(s) to investigate the effect of low T4. In contrast, for e.g. skin sensitization testing, OECD TGs have been developed for several steps in the AOP (see OECD TGs 442C, D and E), and it is required to perform at least two of these three tests to arrive at a conclusion about potential skin sensitization without testing the adverse outcome itself. Thus, to require L&M testing (adverse outcome) based on only one key event (reduced serum T4), without any information on all the key events in between, can only be considered as a too big step, which is nothing more than using “sledgehammers to crack nuts”. Or even worse, to classify for DNT right away – without any further testing – “based solely on the effect on thyroid hormones” as suggested by Axelstad et al. (Citation2008).

Moreover, based on all these observations indicated above, the studies by Axelstad et al. (Citation2008) and van Wijk et al. (Citation2008) in rats cannot even be used to substantiate, support or conclude that decreases in serum (or plasma) T4 in rats would result in impaired L&M and are not suitable or appropriate to use as a fundament to ask for L&M testing as part of an EOGRTS. It rather looks as if a few articles were found appearing to be fitting to the claim instead of looking at how the claim could be substantiated. Moreover, in the process of drawing a conclusion and proposing extensive additional animal testing the agency should have performed a scientific evaluation of all data applying a weight of evidence approach and the Bradford Hill criteria should have been used to assess whether (1) there is an association, (2) if yes, how strong is this association, and (3) whether this association is causal (Hill Citation1965). Finally, there may be a “negative bias” because studies without findings are not readily accepted for publication.

In a more recent study of the Axelstad group (Ramhøj et al. Citation2020), the effects of perfluorohexane sulfonate (PFHxS) (a widespread environmental contaminant found in human serum, breastmilk, and other tissues, and capable of lowering serum T4 in rats (Ramhøj et al. Citation2018)) on the thyroid system and neurodevelopment were investigated following maternal exposure from early gestation through lactation (0.05, 5, or 25 mg/kg/day PFHxS), alone or in combination with a mixture of 12 environmentally relevant endocrine disrupting compounds (EDC-mix). PFHxS lowered T4 levels in both dams and offspring in a dose-dependent manner but did not change TSH levels, weight, histology, or expression of marker genes of the thyroid gland. In addition, no evidence of thyroid hormone-mediated neurobehavioral disruption was observed in offspring. From these data, the authors concluded that since human brain development appears very sensitive to low T4 levels they maintained that PFHxS is of potential concern to human health but that current rodent models are not sufficiently sensitive to detect adverse neurodevelopmental effects of maternal and perinatal hypothyroxinemia and that there is a need to develop more sensitive brain-based markers or measurable metrics of thyroid hormone-dependent perturbations in brain development. Thus, remarkably, in their discussion, the Axelstad research group even questions the validity of the use of rodent models in testing thyroid hormone-dependent perturbations in brain development (Ramhøj et al. Citation2020).

From a technical point of view, regarding the requested L&M testing, the Morris and Cincinnati tests cannot be used for young pups. The Morris test does not work in case of a pool size of 210–240 cm, which is used for adult rats as this is too large for small pups (M. Beekhuijzen, personal communication). In theory, young pups can be tested in a Morris test using a smaller pool (180 cm), but only with specific adaptations for weanlings such as warmer water, and some extra distal cues and after full validation (with a positive control). Regarding the Cincinnati test, conducted according to Vorhees and Williams (Citation2014), it does not address spatial L&M but route-based (egocentric) L&M. Also, Jablonski et al. (Citation2019) tried testing untreated rats in the Cincinnati test at PND 30, but the full, 10-unit multiple T maze, was too difficult for these rats and none of them found the goal. The RAM test could, also in theory, be used at young(er) age but testing is required at PND 25 ± 2, and animals should be trained after weaning (which is PND 21) and training starting two days after weaning is not recommended. In addition, the animals should be food deprived in the RAM test and the feed should be used to train and test the animals. Thus, in case of a dietary EOGRT study, feed restriction or regular fasting is not even possible. Also at the older age, it should be noted that pups still need to be dosed in an EOGRTS as this is primarily a toxicity study (in contrast to an OECD TG 426 study in which only dams are dosed, and dosing only takes place during gestation and lactation). Thus, the RAM test, so shortly after weaning, is not appropriate for this very young age and neither for a continuous dosing toxicity study at the older age. In addition, as indicated by Vorhees and Williams (Citation2014), because food restriction is necessary in the RAM test, one must ensure that animals in experimental and control groups are equally hungry and hence equally motivated to search for food.

OECD TG 426 (OECD Citation2007, section 37) indicates that if L&M tests would reveal an effect of the test substance, additional tests may be considered to rule out alternative interpretations based on alterations in sensory, motivational, and/or motor capacities. However, it is clearly indicated that any positive results in such tests could be based on alterations in sensory, motivational, and/or motor capacities. According to Vorhees and Williams (Citation2014) in appetitive tasks (such as the RAM test), this means ensuring that animals are matched for the incentive value of the reinforcement. For swimming tasks (such as the Morris test), this means ensuring that groups are equal in swimming ability, such as swim speed, learning that the platform is the goal/escape, and that there is no escape other than climbing on the platform and waiting to be removed. Alterations in motor capacities can clearly be expected to have been present in the van Wijk et al. (Citation2008) study because of a ∼3× lower body weight in animals of the “H” group. Moreover, in none of the three papers, there was any investigation and/or discussion/evaluation of possible alterations in sensory, motivational, and/or motor capacities.

In OECD TG 426 (OECD Citation2007, section 37), it is also recommended that the test of L&M should be chosen based on its demonstrated sensitivity to the class of compound under investigation. However, the Morris test or the RAM test at one time point and the Cincinnati test at the other time point, were requested by the agency without any argumentation or clarification. It is also not clear why two different tests should be performed because in section 37 it is stated that the same or separate test(s) may be used at the two stages of development, and Contract Research Organizations (CROs) normally offer the same test for weanling and young rats in case of an OECD 426. In addition, in section 37, the L&M tests mentioned did not get a classification how sensitive they are. And finally, it is unclear why either a RAM or Morris test would be required in view of the inconsistent results of these tests observed in the Axelstad et al. (Citation2008) and van Wijk et al. (Citation2008) studies; studies that would be the basis to ask for this additional testing. Moreover, in the Amano et al. (Citation2018) paper, no RAM or Morris tests were used, so already for this reason this paper cannot be used as a relevant reference. The Cincinnati test was not even used in any of the papers.

In the Axelstad et al. (Citation2008) study, with the known thyroid toxicant PTU, there were strongly increased thyroid weights and marked thyroid histopathology was noted; yet the L&M test results were negative (Morris test) or only weakly positive (RAM test). So, how strong should effects on thyroid weight and histopathology need to be in order to request L&M testing in an EOGRTS with rats to be able to observe significant and consistent impairment of L&M? In the other two papers, thyroid histopathology and weight were not even determined.

L&M testing was originally not a requirement in the initial setup of the OECD TG 443 (EOGRTS); this kind of testing originates from the OECD TG 426 (DNT) study design. In an extensive retrospective review of a dataset of 69 DNT studies conducted by Raffaele et al. (Citation2010), this investigation provided DNT findings for only four chemicals at lower levels compared to the other findings, but it was never the sole finding. The conclusion of this review, supported by an analysis of Piersma et al. (Citation2012), was influential in the deliberations by OECD to eliminate cognitive testing from the original design of the OECD TG 443 (Makris and Vorhees Citation2015). Remarkably, at a recent event hosted by EFSA in 2021 (EFSA Citation2021a), it was indicated that as of 2014, out of 101 Guideline DNT submissions in North-America, only two Cincinnati tests (ca. 2%) and only nine Morris tests (ca. 9%) had been performed; the RAM test was not used at all (EFSA Citation2021b). This also indicates that the availability of CROs with sufficient experience to perform the requested testing is rather limited.

Another critical issue is the timing of treatment. According to OECD TG 426 (OECD Citation2007, section 2), "Developmental neurotoxicity studies are designed to provide data, including dose-response characterizations, on the potential functional and morphological effects on the developing nervous system of the offspring that may arise from exposure in utero and during early life". Therefore, the offspring in a DNT study is no longer exposed to the test chemical after weaning and all neurobehavioral testing in adolescents and adults is conducted in, at that time, untreated animals. In two of the three referenced studies by the agency, there was no direct treatment of the pups with PTU after weaning up to testing at the older age (NB no L&M testing was performed in young pups at PND 25 ± 2). In contrast, in an EOGRTS, the treatment is continued beyond weaning into adolescence and adulthood and data from an OECD TG 426 study will, therefore, not be comparable to those from an OECD 443 DNT cohort if both studies would be performed.

Overall, as explained above, because the OECD TG 443 design fundamentally differs from the OECD TG 426, it would be essential, when there would be a real need to include L&M testing – not just based on a lower serum T4 level – to validate cognitive testing within the EOGRTS study design first; if at all possible, in view of the currently required dosing beyond weaning and the required high parental dose levels to start with, or to perform an OECD TG 426 study after a substance evaluation under REACH has identified a concern that would need clarification. We, therefore, would like to refer to our suggestions in Section 3 to change the OECD TG 443 by (a) not dosing DNT cohorts 2A/B and (b) to reduce the dose levels for pups of cohorts 1A/1B (and C) when needed.

Abbreviations
AOP=

adverse outcome pathway

Cincinnati test=

Cincinnati water maze test

CRO=

Contract Research Organization

DNT=

developmental neurotoxicity

ECHA=

European Chemical Agency

EDC=

endocrine disrupting chemicals

EFSA=

European Food Safety Authority

EOGRT(S)=

Extended-One Generation Reproductive Toxicity (Study)

GD=

gestation day

(f)T3=

(free) thyroid hormone (triiodothyronine)

(f)T4=

(free) thyroid hormone (thyroxine)

L&M=

learning and memory

Morris test=

Morris water maze

OECD=

Organization for Economic Co-operation and Development

OLT=

Object in-Location Test

ORT=

Object Recognition Test

PFHxS=

perfluorohexane sulfonate

PND=

post-natal day

PTU=

propylthiouracil

RAM test=

radial arm maze test

REACH=

Registration, Evaluation, Authorization, and Restriction of Chemical

TG=

test guideline

TSH=

thyroid-stimulating hormone

Acknowledgements

The authors would like to thank Manon Beekhuijzen from Charles River in Den Bosch, the Netherlands for her valuable suggestions while drafting this article. The authors acknowledge the reviewers and the editor for their time and dedication in reviewing this work. The authors would also like to thank the Aromatics Producers Association (APA) Sector Group of Cefic and the Organic Peroxide Consortium for funding open access.

Declaration of interest

All authors of this manuscript are working in the chemical industry. This manuscript is important as a contribution to ongoing discussions on the validity and value of the inclusion and performance of learning and memory tests as part of an EOGRTS. The authors had sole responsibility for the manuscript. Opinions and conclusions expressed within this manuscript are those of the authors.

References

  • Amano I, Takatsuru Y, Khairinisa MA, Kokubo M, Haijima A, Koibuchi N. 2018. Effects of mild perinatal hypothyroidism on cognitive function of adult male offspring. Endocrinology. 159(4):1910–1921. doi: 10.1210/en.2017-03125.
  • AOP-Wiki. 2023a. Inhibition of Na+/I- symporter (NIS) leads to learning and memory impairment; [accessed 2023 Mar 31]. https://aopwiki.org/aops/54.
  • AOP-Wiki. 2023b. T4 in serum, decreased leads to T4 in neuronal tissue, decreased; [accessed 2023 Mar 31]. https://aopwiki.org/relationships/312.
  • Arts J, Beekhuijzen M. 2020. Is there a rationale for direct dosing of chemicals to nursing pups in the EOGRTS (OECD 443)? Regul Toxicol Pharmacol. 113:104641. doi: 10.1016/j.yrtph.2020.104641.
  • Axelstad M, Hansen PR, Boberg J, Bonnichsen M, Nellemann C, Lund SP, Hougaard KS, Hass U. 2008. Developmental neurotoxicity of propylthiouracil (PTU) in rats: relationship between transient hypothyroxinemia during development and long-lasting behavioural and functional changes. Toxicol Appl Pharmacol. 232(1):1–13. doi: 10.1016/j.taap.2008.05.020.
  • Carlini VP, Martini AC, Schiöth HB, Ruiz RD, Fiol de Cuneo M, de Barioglio SR. 2008. Decreased memory for novel object recognition in chronically food-restricted mice is reversed by acute ghrelin administration. Neuroscience. 153(4):929–934. doi: 10.1016/j.neuroscience.2008.03.015.
  • Cooper RL, Lamb JC, Barlow SM, Bentley K, Brady AM, Doerr N, Eisenbrandt DL, Fenner-Crisp PA, Hines RN, Irvine LFH, et al. 2006. A tiered approach to life stages testing for agricultural chemical safety assessment. Crit Rev Toxicol. 36(1):69–98. doi: 10.1080/10408440500541367.
  • ECHA and EFSA. 2018. Guidance for the identification of endocrine disruptors in the context of regulations (EU) no. 528/2012 and (EC) no. 1107/2009; [accessed 2023 Apr 3]. Microsoft Word - ED_Guidance_06_06_2018_for-publication.docx (europa.eu).
  • ECHA. 2022. Advice on dose-level selection for the conduct of reproductive toxicity studies (OECD TGs 414, 421/422 and 443) under REACH; [accessed 2023 Apr 3]. 27159fb1-c31c-78a2-bdef-8f423f2b6568 (europa.eu).
  • EFSA. 2021a. Developmental neurotoxicity: in vivo testing and interpretation; [accessed 2023 Mar 31]. https://www.efsa.europa.eu/en/events/developmental-neurotoxicity-vivo-testing-and-interpretation.
  • EFSA. 2021b. DNT lecture series, NAFTA developmental neurotoxicity guidance document. Tests of learning and memory; [accessed 2023 Mar 31]. https://www.efsa.europa.eu/sites/default/files/2022-05/EFSA%20%20DNT%20LM_April%202021_e.pdf.
  • Hemb M, Cammarota M, Nunes ML. 2010. Effects of early malnutrition, isolation and seizures on memory and spatial learning in the developing rat. Int J Dev Neurosci. 28(4):303–307. doi: 10.1016/j.ijdevneu.2010.03.001.
  • Hill AB. 1965. The environment and disease: association or causation? Proc R Soc Med. 58(5):295–300. doi: 10.1177/003591576505800503.
  • Hodges H. 1996. Maze procedures: the radial-arm and water maze compared. Brain Res Cogn Brain Res. 3(3–4):167–181. doi: 10.1016/0926-6410(96)00004-3.
  • Holscher C. 1999. Stress impairs performance in spatial water maze learning tasks. Behav Brain Res. 100(1–2):225–235. doi: 10.1016/S0166-4328(98)00134-X.
  • Holson RR, Pearce B. 1992. Principles and pitfalls in the analysis of prenatal treatment effects in multiparous species. Neurotoxicol Teratol. 14(3):221–228. doi: 10.1016/0892-0362(92)90020-b.
  • Ilochi ON, Kolawole TA, Oluwatayo BO, Chuemere AN. 2019. Starvation-induced changes in memory sensitization, habituation and psychosomatic responses. Int J Trop Dis Health. 35:1–7. doi: 10.9734/ijtdh/2019/v35i330126.
  • Jablonski SA, Williams MT, Vorhees CV. 2019. Learning and memory effects of neonatal methamphetamine exposure in Sprague-Dawley rats: test of the role of dopamine D1 receptors in mediating the long-term effects. Dev Neurosci. 41(1–2):44–55. doi: 10.1159/000498884.
  • Lazic SE, Essioux L. 2013. Improving basic and translational science by accounting for litter-to-litter variation in animal models. BMC Neurosci. 14:37. doi: 10.1186/1471-2202-14-37.
  • Makris SL, Vorhees CV. 2015. Assessment of learning, memory, and attention in developmental neurotoxicity regulatory studies: synthesis, commentary, and recommendations. Neurotoxicol Teratol. 52(Pt A):109–115. doi: 10.1016/j.ntt.2015.10.004.
  • OECD. 2000. Guidance document on the recognition, assessment, and use of clinical signs as humane endpoints for experimental animals used in safety evaluation no. 19. ENV/JM/MONO(2000)7. https://one.oecd.org/document/env/jm/mono(2000)7/en/pdf.
  • OECD. 2007. Test no. 426: developmental neurotoxicity study, OECD guidelines for the testing of chemicals, section 4. Paris: OECD Publishing.
  • OECD. 2013. Guidance document supporting OECD test guideline 443 on the extended one generation reproductive toxicity test. Series on testing and assessment no. 151. ENV/JM/MONO(2013)10. https://one.oecd.org/document/env/jm/mono(2013)10/en/pdf.
  • OECD. 2018a. Test no. 443: extended one-generation reproductive toxicity study, OECD guidelines for the testing of chemicals, section 4. Paris: OECD Publishing.
  • OECD. 2018b. Test no. 408: repeated dose 90-day oral toxicity study in rodents, OECD guidelines for the testing of chemicals, section 4. Paris: OECD Publishing.
  • Piersma AH, Tonk EC, Makris SL, Crofton KM, Dietert RR, Van Loveren H. 2012. Juvenile toxicity testing protocols for chemicals. Reprod Toxicol. 34(3):482–486. doi: 10.1016/j.reprotox.2012.04.010.
  • Raffaele KC, Rowland J, May B, Makris SL, Schumacher K, Scarano LJ. 2010. The use of developmental neurotoxicity data in pesticide risk assessments. Neurotoxicol Teratol. 32(5):563–572. doi: 10.1016/j.ntt.2010.04.053.
  • Ramhøj L, Hass U, Boberg J, Scholze M, Christiansen S, Nielsen F, Axelstad M. 2018. Perfluorohexane sulfonate (PFHxS) and a mixture of endocrine disrupters reduce thyroxine levels and cause antiandrogenic effects in rats. Toxicol Sci. 163(2):579–591. doi: 10.1093/toxsci/kfy055.
  • Ramhøj L, Hass U, Gilbert ME, Wood C, Svingen T, Usai D, Vinggaard AM, Mandrup K, Axelstad M. 2020. Evaluating thyroid hormone disruption: investigations of long-term neurodevelopmental effects in rats after perinatal exposure to perfluorohexane sulfonate (PFHxS). Sci Rep. 10(1):2672. doi: 10.1038/s41598-020-59354-z.
  • Vorhees CV, Williams MT. 2006. Morris water maze: procedures for assessing spatial and related forms of learning and memory. Nat Protoc. 1(2):848–858. doi: 10.1038/nprot.2006.116.
  • Vorhees CV, Williams MT. 2014. Assessing spatial learning and memory in rodents. ILAR J. 55(2):310–332. doi: 10.1093/ilar/ilu013.
  • van Wijk N, Rijntjes E, van de Heijning BJ. 2008. Perinatal and chronic hypothyroidism impair behavioural development in male and female rats. Exp Physiol. 93(11):1199–1209. doi: 10.1113/expphysiol.2008.042416.