The grouping of tran scripts with similar expression levels safeguarded against preferential selection of lowly expressed transcripts which showed greater variations in their signal log ratios. As expected, the signal log ratios of all transcripts distributed symmetrically around the x axis. Then, starting with the most abundantly expressed transcripts, we found the transcripts whose average signal log ratios were most extreme within each group of 500 transcripts. We found the transcripts whose average signal log ratios were located more than 3 standard deviations away from the mean of the group and whose fold changes were located more than 2 interquartile regions away from the first and third quartile of the group. We defined these transcripts as outliers.

In a normal distribution, 99% of the data points fall within three SDs of the mean. Whereas the 2iqr limits around the median delimits moderate out liers in any distribution. These two measures of dispersion were jointly used to detect the outlier transcripts at a mod erate stringency level even if the distribution of signal log ratios may not be normal in a group. The outlier transcripts were then submitted to the Affyme trix web site to identify biological pathways and clus ters listed in the Gene Ontology database associated with the outlier transcripts. No particular pathway or cluster was expected to be significantly enriched by the outlier transcripts under the null hypothesis. To test this hypoth esis, we conducted one sided two by two Fishers exact test for enrichment of each of the pathways or clus ters by the outliers.

The total number of annotated outliers and the total number of annotated non outliers were constant entries in each Fishers exact test, whereas the numbers of outlier and non outlier transcripts in a specific biological pathway cluster were entered as variable values. We implemented two conservative measures to the serial significance testing of pathways for enrichment. First, the number of outlier transcripts belonging to a pathway was reduced by 1, as the first outlier transcript hit to a pathway was designated as the ascertaining transcript which could be a chance ascertainment. This step eliminated from further testing all pathways represented by only one transcript.

Second, because some pathways were expected to show significant enrichment by chance as a result of multiple testing, we performed a Bonferroni correction and multiplied the nominal p values by the total number of pathways ascer tained by the outliers before declaring significance. Results Analysis of gene expression changes in OSE cells upon P4 exposure The OSE cultures derived from six women were exposed to P4 for five days and gene expression changes were profiled in P4 exposed and baseline control cultures using U133A Affymetrix chips.