Holding: DoJ claimed that the city’s use of a physical agility test to screen police officer candidates violated Title VII because the test had a disparate impact on female applicants and was neither job related nor consistent with business necessity.

 

 The court found that the city failed to establish that the either the sit-ups or push-ups components had criterion-related validity. Moreover the city failed to prove that the passing standard it used corresponded to the minimum level of physical ability necessary to successfully perform the police officer job.

 

The court found that the United States was entitled to a judgment in its favor with respect to the liability phase of the case.

 


 

Case 1:04-cv-00004-SJM Document 86 Filed 12/13/2005 Pages 1-73

 

UNITED STATES DISTRICT COURT

FOR THE WESTERN DISTRICT OF PENNSYLVANIA

 

United States of America,

Plaintiff,

v.

City of  Erie, Pennsylvania,

Defendant.

 

Civil Action No. 04-4 Erie

 

411 F. Supp. 2d 524

2005 U.S. Dist. Lexis 33397

 

December 13, 2005, Decided 

December 13, 2005, Filed

 

   Sean J. McLaughlin

   United States District Judge

 

 FINDINGS OF FACT AND CONCLUSIONS OF LAW

 

   This case was commenced on January 8, 2004 by the United States of America against the City of Erie based on the United States’ allegation that the City’s use of a physical agility test from 1996 to 2002 as a device to screen police officer candidates violated Title VII of the Civil Rights Act of 1964, 42 U.S.C. § 2000e et seq., as amended. The United States alleges that the City’s use of the test had a disparate impact on female applicants and was neither job related for the position of entry-level police officer nor consistent with business necessity. A non-jury trial was held from March 7 through March 10, 2005. Set forth below are this Court’s findings of fact and conclusions of law:

 

   I. FINDINGS OF FACT

 

A. Background Facts

 

   1. For a number of years prior to 2004, the City of Erie administered a physical agility test (“PAT”) as part of its process for hiring new police officers. The test was administered biannually on even years and each PAT remained in effect for a two-year period. (Tr. Vol. 1, pp. 159-63.) n1

 

   2. In 1992, prior to the time period at issue in this litigation, the City of Erie utilized a PAT that incorporated pull-ups, push-ups, a vertical leap and a broad jump. (Tr. Vol. I, p. 161.) The pull-up component required male candidates to perform 4 pull-ups within 50 seconds and females to perform 1 pull-up within 50 seconds. (S 4.)

 

   3. Due to a typographical error, the 1992 PAT was incorrectly administered, such that male candidates were erroneously required to do 6 pull-ups and female candidates were required to do 2. (S 5.) Because of this confusion, the City of Erie Civil Service Commission n2 decided to allow candidates who failed the 1992 PAT to take the written portion of the exam and, if they passed, to then retake the pull-up component of the test as originally designed. (S 6.)

 

   4. The Civil Service Commission’s decision to allow candidates to retake the pull-up component of the 1992 test engendered some controversy. In the course of the controversy, the Civil Service Commission became aware of complaints that the pull-up component of the 1992 PAT was unfair. (S 7.)

 

   5. As a result of the controversy arising out the 1992 PAT, the Civil Service Commission requested that the Erie Bureau of Police develop a new physical agility test. (S 8.)

 

    6. Then Erie Police Chief Paul DeDionisio, Jr. assigned Stephen Kovacs the task of working with the Civil Service Commission to develop and document a new PAT that would be fair and more “state-of-the-art.” At the time, Kovacs was Captain of the Bureau’s Support Division, which oversaw training, research and planning for the Bureau. (S 9-11.)

 

   7. Captain Kovacs had no prior experience developing physical agility tests; however, he and Chief DeDionisio reviewed tests used by other law enforcement agencies such as the Pennsylvania State Police and the Pittsburgh Police Department. Capt. Kovacs also contacted the Pennsylvania Municipal Police Officer Education and Training Commission and was informed that the Commission had not established a standard for physical agility testing and had chosen, instead, to leave the determination of those standards to the municipalities. (S 12-14.)

 

   8. Captain Kovacs instructed Charles Bowers, then a Lieutenant in the Traffic/Patrol Division (now the City’s present Chief of Police), to use his experience as a police officer to develop a test that would simulate physical tasks that city police officers regularly encounter. (S 15.) At that time, Lt. Bowers had been a police officer for more than 20 years and had extensive experience in patrol duties, including the pursuit and arrest of suspects or other persons who need to be restrained. (D 118.) n3

 

   9. Lt. Bowers had no education in the area of industrial/organizational psychology, exercise physiology, test development or test validation. (S 16.) Nevertheless, Bowers, like Kovacs and DeDionisio, understood that the City’s reason for developing and administering the new physical agility test was to ensure that candidates possessed the physical ability necessary to do the job. (S 17.)

 

   10. Newly-hired City of Erie police officers are assigned to patrol duties, typically for a period of at least 5 years, and must work for promotions to non-patrol duties. (Tr. Vol. I, p. 202; D 90.)

 

   11. In developing the new PAT, Lt. Bowers examined the physical tests utilized by law enforcement in various different cities, but he could find no national or uniform standard to work from. (Tr. Vol. I, p. 172.)

 

   12. Based largely on his own extensive experience working the streets, Lt. Bowers attempted to construct a physical agility test which would simulate a foot pursuit containing common obstacles, followed by a demonstration that the individual taking the test had sufficient strength to apprehend and physically restrain a subject. (S 18; Tr. Vol. I, pp. 163-65.) In developing the PAT, Lt. Bowers did not rely on expert opinions or studies specific to the Erie police, nor did he conduct any formal study himself. (S 19.)

 

   13. Lt. Bowers ultimately designed the PAT to consist of a 220-yard run, during which each applicant was required to negotiate four obstacles, followed by a push-ups component (to be completed immediately after the obstacle course/run) and a sit-ups component (to be completed immediately after the push-ups). The four obstacles included in the 220-yard obstacle course/run were, in order: a six-foot high wall, which applicants were required to climb over, a window opening three feet above the ground, which applicants were required to climb through, a platform two feet off the ground and eight-feet long, which applicants were required to crawl under, and a four-foot wall which applicants were required to climb over. n4 (Tr. Ex. AA (Requests for Admission) and BB (Responses), Nos. 47, 48, 50, 53.) n5

 

   14. In March of 1994, based on his own experience as a patrol officer, Captain Kovacs recommended that the City use the PAT developed by Lt. Bowers. (S 21, 23.) In making this recommendation, Captain Kovacs did not rely on external academic or professional opinions or studies specific to the Erie Police Bureau, nor did he conduct any formal study himself. (S 24.)

 

   15. Each of the obstacles in the obstacle course/run portion of the PAT developed by Lt. Bowers and recommended by Captain Kovacs was included because, based on the experience of Bowers and Kovacs, as well as informal discussions with other incumbent officers, each of the obstacles simulated something patrol officers would be required to do on the job. (S 26.) In developing and recommending the PAT, Lt. Bowers and Capt. Kovacs attempted to limit the components of the test to ones that simulated specific tasks that a police officer might encounter in a foot pursuit; for example, they did not include pull-ups in the test because they did not believe pull-ups related to specific police officer tasks. (S 27.)

 

   16. Captain Kovacs and Lt. Bowers included push-ups and sit-ups components in the version of the PAT recommended to the Civil Service Commission because they believed that push-ups and sit-ups, respectively, measure upper and lower body strength and, following completion of the obstacle course/run, would indicate whether a candidate had sufficient strength or endurance to struggle with and apprehend a subject after a foot pursuit. (S 28.)

 

   17. The PAT recommended by Bowers and Kovacs in 1994 initially did not specify the number of push-ups or sit-ups that would be included in the test or any duration for the push-ups or sit-ups components; instead Capt. Kovacs recommended that a “pre-test” of incumbent officers be conducted before the test was implemented in order to set the “exercise repetition standard” (i.e., number of push-ups and sit-ups included in the test). (S 29.)

 

   18. Capt. Kovacs recommended that the number of push-ups and sit-ups required be set by “pre-testing” incumbent officers because he believed that: (1) the more physical ability (as measured by the push-ups and sit-ups) a candidate could perform, the better able to apprehend and subdue a subject that candidate would be; but 2) the City could not “require a higher physical ability level [than] that already demanded from active officers.” (S 30.)

 

   19. Lt. Bowers chose 15 seconds as the time periods to be used for the push-ups and sit-ups components of the standard-setting exercise because he thought that a total of 30 seconds was about the duration of an average physical struggle. (S 31, Tr. Vol. 1, p. 172.)

 

   20. In order to conduct the pre-testing or “standard-setting” exercise, Capt. Kovacs requested that Sgt. James Beskid, who was then an officer under his supervision, provide a list constituting a “random selection” of officers. (S 32.) Sgt. Beskid, who is not an expert in statistics or sampling, consulted a textbook and determined that a sample of at least 30 officers should be used for the standard-setting exercise. (S 33.) Sgt. Beskid produced a random list of 50 Erie police officers so that, in the event some officers were excused from participation in the standard-setting exercise, a sample of 30 would be available. (S 34.)

 

   21. The Bureau of Police ordered the first 30 officers on the random list prepared by Sgt. Beskid to participate in the standard-setting exercise. (S 35.) However, the union representing the City’s police officers took exception to the City ordering officers to participate without first negotiating with the union. (S 36.) Accordingly, the City issued a memo requesting volunteers from among the 50 officers on the list prepared by Sgt. Beskid, as well as other incumbent officers not on the list, to take the standard-setting exercise. (S 37.)

 

   22. Ultimately, 19 incumbent officers volunteered to participate in the standard-setting exercise, which was conducted on May 19, 1994. (Tr. Vol. 1, pp. 168-70, Ex. 11.) The use of 19 incumbent officers represented 14% of the total 134 Erie Police Officers assigned at that time to the Patrol/Traffic Division of the Erie Police Bureau. (D 89.) Of the incumbent volunteers, 3 were females and 8 were members of the SWAT team. n6 (D 91, 95.) The average age of the incumbents was 35.8 years. (Ex. 11.)

 

   23. In 1994, each of the 19 incumbent officers who participated in the standard-setting exercise was performing his/her job adequately. (S 39.)

 

   24. The test was administered to the incumbents in such a fashion that each officer ran the 220-yard course, negotiating the four obstacles, immediately thereafter performing as many push-ups as possible in a 15-second span and immediately thereafter performing as many sit-ups as possible in a 15-second span. The thirty-second allotment for combined push-ups and sit-ups was added to the officer’s time on the obstacle course, to produce a final time. (Ex. 11.)

 

   25. The average number of push-ups performed by the incumbents in 15 seconds was 17; the average number of sit-ups performed in 15 seconds was 9, and the average total time to complete the test was 87 seconds. (Ex. 11.)

 

   26. Three female incumbents participated in the standard-setting exercise. Their respective scores are as follows. Candice McGahen: 15 push-ups/13 sit-ups and total time of 92 seconds; Tracy Stucke: 16 push-ups/8 sit-ups and total time of 86 seconds; Julie Kemling: 18 push-ups/11 sit-ups and total time of 80 seconds. (Ex. 11, D 93.)

 

   27. The number of push-ups and sit-ups to be required in the PAT (as well as the cutoff time used as the passing standard) ultimately was the decision of the Civil Service Commission. (S 40.)  However, Lt. Bowers made recommendations as to the passing standards for the PAT.

 

   28. Lt. Bowers recommended that the City use the average numbers of push-ups and sit-ups performed by the 19 incumbents in the standard-setting exercise, rather than their low scores, because he felt it would be fair. (S 41.) Lt. Bowers based this recommendation on the fact that the incumbent test-takers were, on average, older than most police officer applicants, and the incumbents (unlike new applicants) had not had time to prepare for the PAT. Lt. Bowers also believed that an officer’s physical conditioning usually deteriorates once he or she is on the job, so he felt it was appropriate to hold new applicants to a standard higher than the lowest scores of the incumbent test-takers. (Tr. Vol. 1, pp. 170-71.)

 

   29. Lt. Bowers felt that using lower scores would not have met the needs of the Bureau. (Tr. Vol. 1, p. 171.)

 

   30. In setting the passing cut-off score for the PAT, Lt. Bowers advocated raising the time limit from 87 seconds (the average total time of the incumbents) to 90 seconds. This was not done for any particular reason, but simply to produce an even number. (Tr. Vol. 1, p. 171.)

 

   31. Capt.  Kovacs consulted with Marcia Haller, an attorney who was then a member of the Civil Service Commission, regarding the number of push-ups and sit-ups that would be required (as well as the cutoff score or passing standard). Ms. Haller indicated to Capt. Kovacs that the City was not required to use the minimum performance by the incumbents and stated that they instead would use the averages. Ms. Haller believed the push-ups and sit-ups included in the PAT were a measure of “basic physical fitness.” (S 42, 43.)

 

   32. The representatives of the City involved in setting the PAT’s passing standards decided to use the average numbers of push-ups and sit-ups completed by the 19 incumbents in the standard-setting exercise and to use the average time it took the 19 incumbents to complete the standard-setting exercise as the mandatory standard for new officer applicants. The City representatives chose those averages as a “medium” or “average” level of physical ability, believing that would be “fair” and “the best way to go.” (S 44.)

 

   33. Thus, as adopted and initially administered in 1994, the PAT required applicants to complete the 220-yard obstacle course, perform 17 push-ups and then perform 9 sit-ups, all within a 90 second period. Unlike the version administered to the 19 incumbents (which consisted of three, separately time components), the PAT as adopted utilized a single passing standard of 90 seconds. (Tr. Ex. AA and BB, No. 59.)

 

   34. Through the years the PAT underwent various permutations in an effort to maintain its relevance to the police officer job and to respond to complaints from the community regarding the number of women who failed. (D 98.)

 

   35. In 1998, a 5-second grace period was added such that candidates who completed the test within 95 seconds were permitted one opportunity to take a re-test. (Tr. Vol. I, p. 174-76, Ex. 4.)

 

   36. During the administration of the 1998 PAT, an 11-year old girl, described as “petite,” “wiry,” “very active” and a “gymnast,” observed the test and requested to take it unofficially at the end of the applicant testing. She passed all elements within the allotted 90 seconds. (Vol. I, pp. 75-77, D 117.)

 

   37. Nevertheless, following administration of the 1998 PAT, local citizens representing the Erie County Human Relations Commission and women’s advocacy organizations complained about the disparate passing rates and focused on the six-foot solid wall portion of the test as being excessively difficult for women and not adequately job-related to justify its use. (D 99.)

 

   38. Accordingly, in 2000, pursuant to a recommendation by Chief DeDionisio and upon the Civil Service Commission’s approval, the PAT was further modified such that applicants were given the option of climbing over either the six-foot wooden wall or a six-foot high chain link fence. Additionally, candidates had the option of using a 12-inch high wooden box for assistance. These changes were introduced in order to make the obstacles more appropriate and “practical” (i.e., like what an officer is likely to encounter in the City of Erie) and also to make the test more fair to women. (S 45, Tr. Vol. I, p. 175, 177, Ex. 9.)

 

   39. Lt. Joseph Kress videotaped portions of the 2000 PAT administration and the tape was accepted into evidence as Defense Exhibit 1. (D 102.)

 

   40. In 2002, the City introduced further changes to the PAT in an effort to increase the passing rate for females. (Tr. Vol. I, pp. 178-80.) For one, the PAT was administered only after applicants had first passed the written exam. Second, the push-up/sit-up components of the PAT were moved to the beginning of the test, such that they preceded the obstacle course. Third, the number of push-ups was decreased from 17 to 13, while the number of sit-ups was increased from 9 to 13. n7 Finally, training sessions were scheduled and publicized for applicants. (Tr. Vol. I, pp. 178-80, D 105-106.)

 

   41. Extremely wet weather on the date of the 2002 PAT resulted in a 5-second extension of the cut-off time to 95 seconds, with a further 5-second grace period being allowed on top of that such that candidates completing the PAT within 100 seconds could re-take the test once. (D 107.)

 

   42. The changes in the 2002 PAT caused the rate of women passing to rise to 30% (7 of 23), so that those seven women, having passed the written test and the PAT, would be placed on the Civil Service list and ranked in order of their written scores, accounting for veterans’ preference of 10 points as mandated by Pennsylvania law. (D 108.)

 

   43. The City administered the PAT as part of each of the entry-level police officer selection processes between 1994 and 2002, the period under challenge here. Applicants had to pass the PAT in order to remain eligible for hire and continue on in the selection process. (Tr. Ex. AA (Requests for Admissions) and BB (Responses), Nos. 14, 21, and 22.) As noted, until 2002, the PAT was administered prior to the written exam.

 

   44. The City of Erie Bureau of Police developed, and the Civil Service Commission approved and adopted the PAT, without consulting any expert(s) in the areas of physical abilities, job analysis, physical or other job requirements, employment testing or test validation. (S 46, 47.) No exercise physiologists or industrial/organizational psychologists were involved or consulted in the development of the PAT.  (S 25.)

 

   45. The following table represents the results of the City’s use of the PAT between 1996 and 2002 in terms of male and female passing rates:

 

Year                         Female Passing Rate      Male Passing Rate

1996                                4.3%                      53.7%

1998                                14.3%                     72.2%

2000                                11.8%                     77.3%

2002                                30.4%                     84.7%

1996-2002                           12.9%                     71.0%

Combined

 

(Tr. Ex. AA and BB, Nos. 27-31.)

 

   46. As of the initiation of this lawsuit, the sworn workforce in the Erie Bureau of Police consisted of 193 men and nine women. At the time of trial, there were only eight female sworn officers (about 4% of the total sworn workforce) and only three female officers working in the patrol division. (See Complaint [Doc. # 1] at P14; Answer [Doc. # 3] at P1; Tr. Vol. 1, p. 197-98; see also Bowers Depo. Tr. (1/12/05) at pp. 21-22.)

 

   47. In contrast, the police departments of other Pennsylvania jurisdictions, (i.e., Harrisburg, Pittsburgh and Philadelphia) have reported a percentage of female police officers earning at least $25,000 per year in the range of 20-27%. (S 88.)

 

   48. On August 5, 2004, the United States moved for summary judgment on, inter alia, the issue of whether the City’s use of the PAT caused a disparate impact against female candidates. The City conceded the point and, on October 8, 2004, the Court found that the City’s use of the PAT caused a disparate impact against female applicants for the entry-level police officer position between 1996 and 2002. n8 (See 10/8/04 Hrg. Tr. [Doc. # 41] at pp. 16, 26.)

 

   49. Between March 7 and 10, 2005, a bench trial was held on the remaining liability issues, to wit, whether the City’s use of the PAT was job related for the position of entry-level police officers and, if so, whether the passing standard used by the City was consistent with business necessity.

 

B. The Evidence Presented at Trial

 

   50. The City presented the testimony of one expert witness, Paul Davis, Ph.D., and several lay witnesses, including current Chief of Police Charles Bowers, seven of the City’s eight female police officers, and an instructor from the Mercyhurst College Municipal Police Training Academy.

 

   51. The United States presented the testimony of three expert witnesses, David Jones, Ph.D., William McArdle, Ph.D., and Bernard Siskin, Ph.D.

 

Definitions/Background Relating to Expert Testimony

 

   52. Industrial/organizational psychology (“I/O psychology”) involves the application of principles of psychology and scientific research in the workplace, commonly in areas such as job analysis, job requirements, employment testing and selection, and performance measurement, among others. I/O psychology is recognized as a specific practice under Division 14 of the American Psychological Association. (S 55.)

 

   53. In the field of I/O Psychology, there are published standards and principles that guide professionals in developing, using, and evaluating tests. Specifically, such standards and principles are stated in:

 

Standards for Educational and Psychological Testing (the “Standards”) (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999); and Principles for the Validation and Use of Personnel Selection Procedures (the “Principles”) (Society for Industrial and Organizational Psychology (“SIOP”), 2003)

 

(Tr. Ex. P at P16; Tr. Vol. 2, pp. 111-12; Tr. Vol. 2, pp. 71-72.)

 

   54. The Principles focus specifically on the development, use and evaluation of personnel selection/employment tests. (Tr. Vol. 2, p. 71-72.) The Principles and Standards apply to the development and validation of physical tests. (Tr. Vol. 2, p. 110-112.)

 

   55. The term “validity” describes the extent to which a candidate’s performance on a test relates to his or her performance on the job. A test is “valid” if a candidate’s test performance can be used to make a better prediction about how well he or she will perform on the job than might be possible without the test. (S 58.) Thus, professionals in the field of employment testing and selection often use the terms “validity” and job-relatedness” (or “valid” and “job-related”) synonymously. (S 57.) n9

 

   56. Validity is not an “all or nothing” concept; various tests may have different degrees of validity, and the same test may have different levels of validity for different jobs or when used in different manners. (S 59.)

 

   57. Three types of validation strategies -- content, criterion-related and construct -- are described in the standards and guidelines followed by the employment testing profession. (S 60.)

 

   58. A content validity study must present data showing that the content of the selection test represents important aspects of performance on the job for which the test is being used. Thus, a content validation strategy requires an evaluation of the extent to which the content of the test is adequately matched to the “content of the job” to determine whether the test measures what is important to or included within the job. A preemployment test can be judged to be content valid to the extent that it represents the contents of the job. Information about the content of the job is obtained from a job analysis study. A content validation strategy usually results in a work-sample test or some type of test that simulates the important aspects of job performance. (S 61.)

 

   59. In a criterion-related study, performance on the test is compared with job effectiveness. A criterion-related validity study should consist of empirical data showing that the test is predictive of, or significantly correlated with, important elements of job performance. A test has criterion-related validity to the extent that performance on the test is statistically related to performance on the job. Criterion-related validity is established when an employer shows that scores on its test (even scores as basic as “pass v. fail”) related in a meaningful way with some measure of job performance (i.e., a criterion). (S 62.)

 

   60. Finally, the construct validity approach is more theoretical than content or criterion-related validity because it is necessary to establish that a construct is required for job success and that the selection device measures that same construct. The data from a construct validation study should show that the test measures the degree to which candidates have identifiable characteristics (i.e., “constructs”) that are important for successful job performance. The user should show empirically that the test validly relates the constructs to the performance of critical or important work behavior(s). This often requires a criterion-related study to show that the construct is related to job performance. Thus, a construct validity approach requires both: 1) a showing that the test being validated measures a particular construct; and 2) a showing that the construct is related to job performance. (S 63.) The construct validity approach is most frequently used when it is well-established that a particular test of a given construct (e.g., reading ability) has criterion-related validity, and the researcher establishes that a different test (e.g., a new reading test) also is valid by showing that the two tests measure the same construct (reading). (S 64.)

 

   61. In his July 2004 report, Dr. Davis describes a “research design” validation method, which involves quantifying and comparing the metabolic (or energy) costs of job tasks on the one hand and of performance on the test on the other hand. (S 65.)

 

   62. In his July 2004 report, Dr. Davis also refers to a “threat” or “perpetrator” analysis validation method, which involves building a model linking the demands of the test to the make-up of the City’s perpetrator population. (S 66.)

 

   63. Whether a test is valid and whether a particular cutoff score or passing standard on the test corresponds to the minimum level of skills necessary to perform the job at issue successfully are two separate, though related, issues. (S 67.)

 

   64. In the field of I/O psychology, professionally recognized methods by which cutoff scores are set appropriately include norm-referenced methods, content-related methods and criterion-related methods. (S 68.)

 

   65. In his July 2004 report, Dr. Paul O. Davis, the City’s expert, also describes a “pacing” (or “concordant”), a “research design” and a “perpetrator” (or “threat analysis”) method for setting or validating cutoff scores. (S 69.)

 

   66. The “pacing” method of setting a cutoff score, as described by Dr. Davis, involves showing a group of subject matter experts (“SMEs”) - i.e., individuals who are familiar with the job at issue -- a videotape of incumbents or actors performing the test at various paces and then using a technique such as the Delphi method to determine the pace at which there is adequate agreement (or “concordance”) by the SME’s that the pace represents successful performance. (S 70.)

 

   67. The “research design” method of setting a cutoff score, as described by Dr. Davis, involves selecting the level of performance on the test that requires a metabolic cost (or energy) cost that corresponds to the metabolic cost of performing job tasks. (S 71.)

 

   68. The “perpetrator” method of setting a cutoff score, as described by Dr. Davis, involves determining the age and gender characteristics of the “perpetrator” population and selecting a level of performance on the test that corresponds to that required to apprehend some selected percentage of the perpetrators who flee/resist such that a foot pursuit and physical struggle is necessary to apprehend them. (S 72.)

 

Paul O. Davis, Ph.D.

 

   69. Dr. Davis is a founder of Applied Research Associates, Inc., a research and consulting firm established in 1976. He has a doctoral degree from the University of Maryland, College of Health and Human Performance, Department of Kinesiological Sciences, where he placed major emphasis on the study of occupational fitness requirements and the quantification of work physiology. He has participated in over 60 legal proceedings as an expert witness, is certified by the American College of Sports Medicine, and has authored over 100 technical reports, manuals and articles dealing with his research on the relationship between human physical performance factors and health. (Ex. 17, p. 3.)

 

   70. Dr. Davis was qualified at trial as an expert in the fields of exercise physiology, physical performance, physical requirements, and physical testing. Per the parties’ stipulation, Dr. Davis is not qualified to provide opinion testimony in the fields of statistics, industrial/organizational psychology, testing other than physical testing, job requirements or job performance other than physical, and job analyses.  (S 53.)

 

   71. In connection with his services for the City, Dr. Davis reviewed a description of the PAT and viewed a videotape of portions of the 2000 PAT administration. On May 6, 2004 he visited the City of Erie, during which time he rode along with various officers on patrol and, in addition, interviewed the police chief and a number of officers. He was able to confirm through interviews that City of Erie officers routinely engage in strenuous physical activity in furtherance of their public safety mission. (Ex. 17, p. 5.)

 

   72. Dr. Davis produced a July 14, 2004 “validation report,” which he characterizes as “validating the obvious.” It is the result of his dissemination of a questionnaire to all 200+ Erie police officers, 114 of which were returned completed. The answers on the questionnaire indicate that nearly all of the responding officers had engaged in a struggle with a suspect or other person while on duty. (D 136.)

 

   73. Dr. Davis testified at trial and his two reports (Ex. 16 and 17), as well as numerous excerpts from his deposition testimony, were admitted into evidence.

 

   74. Boiled to its essence, Dr. Davis’ opinion encompasses several propositions: (i) that the PAT which is the subject of this litigation is a valid test of the physical demands of police work; (ii) that the PAT is not stringent enough to measure the amount agility, strength and endurance necessary to match the criminals who would be pursued in real-life scenarios; and (iii) that it is both unprincipled and dangerous to impose gender-normed standards for the physical screening of police officer candidates solely for the sake of achieving gender parity in work force.

 

(i) Dr. Davis’ Theory that the PAT is Valid

 

   75. Dr. Davis opines that the challenged PAT used by the City of Erie is a valid test in that it measures physical abilities relevant to the successful performance of police work.

 

   76. Specifically, Dr. Davis opines that the obstacle course portion of the PAT is content valid in that the obstacle course barriers are “exact replications” of the sorts of obstacles a police officer would commonly encounter in a foot chase. (Ex. 16, p. 3.) He therefore considers these general foot pursuit tasks as “self-evident and face valid.” (Id.; Ex. 17, p. 6.))

 

   77. Dr. Davis further opines that the push-up and sit-up portions of the PAT have construct and/or criterion-related validity.  (Ex. 17, p. 4; Tr. Vol. 2, pp. 78-79.) He believes that push-ups and sit-ups as performed on the PAT test for muscular endurance and are a “crude approximation of the energy costs of a struggle.” (Ex. 17, p. 4. See also id. at p. 6; Tr. Vol. 2, pp. 49-50, 84; Tr. Vol. 2, p. 78 (opining that the push-up and sit-up components have construct validity in that the metabolic demands of the muscle in performance of these evolutions involve the same muscle groups that are involved in altercations and struggles; thus they are an important construct of successful job performance).

 

   78. He feels that the PAT appropriately utilized push-ups and sit-ups by placing them at the end of the test, following the obstacle course. (“Since this item is administered at the conclusion of the test battery, a fatigue effect will clearly degrade maximum performance. For this reason, this is an appropriate time to interject the test item, since it reasonably reflects the struggle at the conclusion of a short sprint.” (Ex. 17, p. 6.)) n10

 

   79. Dr. Davis’ theory presumes, as a fundamental proposition, that the PAT’s validity can be determined by examining the validity of each component part. While he agrees that one cannot validate the PAT as a whole simply by validating one component part alone, he believes one can validate the test as a whole by separately validating each of its component parts. (Tr. Vol. 2, pp. 80-81.)

 

   80. Dr. Davis did not perform or analyze any criterion-related validity study for the PAT as a unitary test because, he says, this would have been too costly. (Tr. Vol. 2, p. 79.) In fact, he never performed any content, criterion, or construct validity study relative to the PAT as one whole test. (Id.)

 

   81. Dr. Davis testified that there was no real need for a full-blown validation methodology such as was used in Lanning v. SEPTA, as the PAT is a much simpler, albeit gender-disparate test. Although there was no full-blown perpetrator analysis done here as there was in Lanning, Dr. Davis feels this is not a problem. Such a study would have been prohibitively costly in his view and, he claims, would not have added any particular insight. (Tr. Vol. 2, pp. 63-64.)

 

   82. Dr. Davis concludes that the PAT is valid “by reason of its relationship to the essential functions of the job.” (Ex.16, p. 7.) The test, he believes, “contains relevant obstacles and an approximation of a minor scuffle that are as basic as one can get.” (Id.) (See Ex. 17, p. 10 (“The current and past physical ability tests are valid by virtue of their relationship to the occupational requirements of the job.”)) (See also Tr. Vol. 1, pp. 43-44.)

 

(ii) Dr. Davis’ Opinion that the PAT Is Too Easy

 

   83. Dr. Davis opines that the PAT is too easy a test in that it does not adequately measure the amount of agility, strength and endurance necessary to match the criminals who would be pursued in the real-life scenario proposed by the PAT. (Ex. 17, pp. 7, 10; Tr. Vol. 2, pp. 45-46.)

 

   84. Among other things, Dr. Davis criticizes the decision to modify the PAT such that candidates were given the option of scaling a chain-link fence in lieu of the 6-foot wall and the option of using a crate or other scaling device. Dr. Davis views both of these modifications as an “erosion of the rigger of a selection instrument.” (Ex. 16, p. 3.) In Dr. Davis’ opinion, while technique and training both play a role in one’s ability to scale a wall, it is “far better to take individuals who present with no difficulty in performing this task rather than those who are struggling.” (Ex. 16, p. 3.) Dr. Davis believes the wall-scale is “powerful in its predictive value for the selection of applicants.” (Id.)

 

   85. Thus, Dr. Davis believes that “more is better” (i.e. a higher cut-off is more preferable) when screening for physical attributes of police work, and he recommends the City restore its use of earlier versions of the PAT. (Ex. 16, p. 3.) (See also Ex. 17, p. 7 (“By manipulating the pass rates, the City may have unknowingly set in motion that trip down the proverbial slippery slope. Short answer, there is no precise point whereby failure is guaranteed. However, we do know that there is less risk (and no additional personnel costs) in increasing standards, particularly standards that reflect the profile of the perpetrators.”)) On the other hand, Dr. Davis concedes that this “more is better” approach is confined solely to the realm of physical job requirements, abilities and performance. (Tr. Vol. 2, p. 77.)

 

   86. Dr. Davis refers to the anecdote of the 11-year-old girl passing the PAT and states this “would certainly give any citizen pause as to the requisite rigor of this screening tool.” (Ex. 16, p. 3.) He believes that, if an 11 year-old girl can pass the test, then the test needs to be harder. (Tr. Vol. 2, pp. 91-92.)

 

   87. He estimates, based on prior experience as a gym teacher, that 60-70% of his students could have passed the PAT, at the same time acknowledging that kids are clearly unfit to be police officers. This indicates to him that the PAT is missing a vital ingredient: the strength component that is clearly demonstrated in terms of grappling with, engaging and subduing a perpetrator. (Tr. Vol. 2, p. 67-68.)

 

   88. Thus, Dr. Davis opines that the PAT is too easy because it doesn’t test some physical abilities or attributes (i.e., muscular strength) that are more important to police officer performance than the skills the PAT does measure. (Tr. Vol. 2, p. 92.)

 

   89. Dr. Davis also criticizes the PAT for its use of incumbents in setting the passing score. First, he believes that, because incumbents have tenure, they lack any incentive to excel. Consequently, he believes the data derived from this approach is “contaminated with the ambiguities of motivation.  “ (Ex. 16, p. 5.) Second, it is erroneous, Davis believes, to assume that all incumbents perform their jobs satisfactorily. (Id. at pp. 5-6.)

 

   90. In addition, Dr. Davis believes that officers inevitably decline in fitness and agility from the moment of their hire. “Since advancing age will inevitably have a negative impact on performance,” he writes, “hiring those who are on the cusp virtually guarantees obsolescence in a fairly short period of time.” (Ex. 17, p. 7.).

 

   91. Thus, while Dr. Davis criticizes the City’s use of incumbent officers in structuring the 1994 standard-setting exercise, he supports the City’s decision to utilize the incumbents’ mean scores -- rather than the lowest scores -- as the relevant testing standard. (See Tr. Vol. 2, p. 54-55 (given age-associated decline in performance, the use of incumbents’ mean score as a cut-off does represent the minimum qualifications necessary to do the job).) Setting standards based on the poorest performing incumbents, he believes, “is a strategy for disaster” and will never allow for improvement in the workforce. (Ex. 16, p. 6; Ex. 17, p. 7.)

 

   92. Dr. Davis believes the better practice, rather than using incumbents, is to set physical standards based on the profile of criminals who will likely be the target of police action. (Ex. 16, p. 1 (“It is the criminal element that defines the mission of the law enforcement officer.”); id. at p. 5 (“Better yet, the test should be an expression of success in the apprehension of criminals. The data on the criminal population supports the notion that they are younger than police officers.”); Ex. 17, p. 8 (“We are not arresting each other; we are supposed to arrest the bad guys. ‘What do they look like?’ is the more apt question.”); id. at p. 9 (“Fitness standards need to be modeled on the basis of the threat.”); see also Tr. Vol. at p. 49.)

 

   93. In this regard, Dr. Davis analyzed the booking information for over 2,300 Erie male arrestees, n11 and noted that 71% of those arrested in the year or more time span of the bookings are males under age 40, which he states is the population most likely to flee the scene or resist arrest. (D 136.)

 

   94. He posits that the typical patrol officer is already giving the perpetrator the advantage of 10 years and 15 pounds of gear in a foot pursuit. “This fact alone,” he writes, “speaks to the need for above average levels of fitness.” (Ex. 17, p. 9.)

 

   95. Dr. Davis opines that, while the PAT is valid, “the test only modestly approaches what an officer may reasonably be expected to do and should be revisited with the eye towards providing a more reasonable representation of police work.” (Ex. 17, p. 10.)

 

   96. For example, Dr. Davis does not believe the push-up/sit-up element of the PAT can be validly challenged as not job-related: “The expectation to perform a few push-ups and sit-ups at the conclusion of a brief sprint does not begin to approximate the physical demands of a real struggle in effecting an arrest.” (Ex. 16, p. 4; see also Ex. 17, p. 6 (“The minimum number of push-ups and sit-ups hardly approaches thresholds that would have some predictive power.”).)

 

   97. In this regard, Dr. Davis likens the PAT to the “big E” on the eye chart -- “necessary but insufficient to establish visual acuity.” (D 136.) In other words, Dr. Davis believes the PAT as administered between 1996 and 2002 was so easy that, if a candidate could not pass it, this would necessarily indicate that the candidate was not fit enough to perform the job of police officer and the City would be at risk in hiring that individual. (Tr. Vol. 2, p. 86.)

 

   98. He opines that the PAT “is better than no test.” However, he believes that the City’s use of the PAT placed it “at great risk of accepting individuals who cannot perform the rigors of the job since the test’s action limits are well below those required on the job.” (Ex. 17, p. 7.)

 

   99. Nevertheless, despite his view that the PAT is too easy, Dr. Davis believes it succeeded in at least screening for the minimum qualifications necessary to perform the job of police officer. (See Tr. Vol. 2, p. 89 (“I would hate to hire without the use of at least this by way of an instrument. So that to that extent this is a business necessity to have a test at least of this difficulty.”); Tr. Vol. 2, pp 45-46 (“If I were to take fault ... I do not believe that this test is stringent enough. I do believe that it meets categorically the expectation of the minimum, minimum, minimum physical standard.”); Tr. Vol. 2, pp. 57-59 (opining that the PAT has “statistical noise” and “is not a very good test; it minimally, minimally meets the requirements to be a police officer,” but does not make him “warm and fuzzy to believe that a person who passes is always capable of handling every event.”).

 

   100. Dr. Davis specifically opined that the PAT’s use of a 90-second time limit comports with the minimum qualifications necessary to do the job. (Tr. Vol. 2, p. 53.)

 

   101. He feels that the videotape is “very compelling visually” as to the reasonableness of using the 90-second cut-off. He feels that the possibility of false negatives is very remote. (Tr. Vol. 2, p. 60.)

 

   102. In essence, Dr. Davis’ opinion seems to be that the PAT could create false positives (i.e., individuals who are actually incapable of meeting the physical demands of police officer work might pass the PAT), but it probably does not create false negatives (i.e., any individual who failed the PAT ipso facto could not meet the physical demands of police work).

 

   103. He believes that any reasonably motivated person of either gender should be able to pass the PAT. (Tr. Vol. 2, p. 68 (The test is “not out of reach of the physically inclined.”).)

 

   104. He further opines that there is little likelihood for the substitution of an alternative approach that would have a lesser adverse impact. (Ex. 16, p. 7.) “The City has attempted to ‘dummy down’ this test and should return to its original version of the PAT. This slippery slope of tweaking the results to meet some social agenda benefits no one.” (Id.) (“Arguing in favor of reducing standards to achieve some agenda is contrary to the needs and interests of the citizens of Erie.” (Ex. 17, p. 9.))

 

(iii) Dr. Davis’ Opinion That the Test Should Not Be Gender Normed

 

   105. Dr. Davis believes that if different testing standards are employed for different people (e.g., gender-norming tests), then by definition the testing standards cannot be job-related, although they might possess differing degrees of job-relatedness. He believes that “you’re undermining your position with regards to a defensible standard when in fact you have multiple standards for the same job.” (Tr. Vol. 2, p. 83.)

 

   106. While there may be wide-spread use of gender-norming standards in law enforcement with respect to calisthenics, etc., Dr. Davis believes that the more defensible approach is to identify what the “action limits” or requirements of the job are, independent of the officer’s age and gender. He claims that the current trend is moving toward single standards. (Tr. Vol. 2, pp. 96-97.)

 

   107. Furthermore, Dr. Davis takes strong exception to the idea of coercing law enforcement entities to adopt double standards for men and women in the area of physical fitness and/or elevating characteristics such as “integrity” and “sensitivity” over the critical fitness and agility necessary to perform police officer work. To do so, he believes, compromises law enforcement’s standards and endangers the safety of police officers and the general public. Dr. Davis opines that “to force the city to accept sub-par employees benefits no one. It is only the most misguided that somehow believes that placing unqualified people in harm’s way to be [sic] a societal victory.” (Ex. 16, p. 6.)

 

   108. He writes in his May 10, 2004 report that “the wide range of physical tasks to be performed within law enforcement have no age or gender bias; that is, the job is not defined by the individual performing the job.” (Tr. Ex. 16, p. 1.) He further writes that, “Whenever there is a physical ability test that has any degree of sensitivity and specificity, the pass rates will be in favor of males,” and “attempting to adjust these rates to approximate an abstract definition of equality is fallacious.” (Ex. 16, p. 4.) He believes that “the job is the job” and does not change as a consequence of who is performing it. (Ex. 17, p. 10; see also Tr. Vol. 2, p. 66.)

 

David Jones, Ph.D.

 

   109. David Jones, Ph.D., is President of Growth Ventures, Inc., a human resources consulting firm with expertise in the design, validation and administration of employee assessment and selection systems. He holds a Ph.D. in the field of industrial/ organizational psychology and has practiced in that field for nearly 30 years. He has executed projects to design employee assessment and selection systems for both public and private sector organizations. (Ex. O, P1.)

 

   110. Dr. Jones was qualified at trial as an expert in the field of industrial/organizational psychology, including employment testing and selection, job analysis, physical and non-physical job requirements, and employment test validation. (Tr. Vol. 2, p. 180.)

 

   111. Dr. Jones testified at trial and his expert report was admitted into evidence as Ex. O. In addition, a rebuttal report co-authored by Dr. Jones and Leaetta Hough, Ph.D., was admitted into evidence as Ex. P.

 

   112. Dr. Jones rendered two central opinions: first, that the PAT does not show evidence of content, criterion-related, or construct validity sufficient to justify its use in screening entry-level police officer candidates; second, that the cut-off score utilized by the City is well above that which would be defined as the minimum standard of performance. (Ex. O, PP8, 32-33; Tr. Vol. 2, p. 109.)

 

   113. Dr. Jones testified that the Principles and Standards apply to the development and validation of physical tests. (Tr. Vol. 2, pp. 110-12.) Thus, exercise physiologists typically draw upon APA Standards and SIOP Principles in determining whether a physical test is valid. (Tr. Vol. 2, p. 112.)

 

   114. Dr. Jones opines that Dr. Davis’ reports bear no resemblance to validity studies described in APA Standards and SIOP Principles. In essence, Dr. Jones views Dr. Davis’ reports as an expression of professional opinion unaccompanied by any type of validity study. (Tr. Vol. 2, pp. 112-13.)

 

   115. According to Dr. Jones, “content validity” is an appropriate strategy when the content of the test truly represents the relevant job functions. The term “face validity” as used by Dr. Davis is not a term recognized by APA Standards or SIOP Principles. (Tr. Vol. 2, pp. 127-28.)

 

   116. Dr. Jones feels that it is not possible under professionally recognized standards to have criterion-related validity in the absence of empirical data demonstrating the relationship between test performance and job performance. As Dr. Jones explains, if one does not possess such data, one does not have a criterion-related study, and it would not be acceptable under professional standards to opine on the criterion-validity of a test based solely on professional judgment without supporting data. (Tr. Vol. 2, p. 123.)

 

   117. According to Dr. Jones, construct validity studies are a bigger and more theoretical undertaking; one must look at how a whole set of tests or a whole series of performance information fit together. (Tr. Vol. 2, p. 124.)

 

   118. In his preliminary report, Dr. Jones writes:

 

17. None of the information provided to me indicates that the Bureau’s physical agility test has been held to content, criterion-related, or construct validation standards. While various changes have been made over successive administrations of the test, none of the work underlying these changes has been executed or documented in a manner consistent with professional practice.

 

18. Other than personal anecdote from the test’s designers, there is no foundation on which to demonstrate the validity of the City’s physical agility test, its individual components, its scoring and qualifying standards, or its use in producing overall applicant eligibility lists. There is no information to show that individuals who perform well on the test perform similarly on the job.

 

19. In fact, no professionally sound job analysis has been conducted. There is no information to document whether each aspect of the job the City apparently attempts to simulate with the test are actually performed on the job. Nor is any information available regarding how frequently they might be performed, the specific physical capabilities that underlie their performance, or the degree to which new recruits can be trained to perform such activities once hired.

 

20. The City has made no attempt to assess the reliability with which the test is scored. For example, no test-retest analysis has been undertaken to determine  the likelihood that candidates scoring within a given range of the cut-off score might succeed on a re-administration of the test, after a brief rest period. Given the test’s degree of adverse impact, it would be reasonable for the City to determine the tool’s test-retest reliability, to identify its standard error of measurement, and to offer retest opportunities to candidates scoring within a range of one to two standard errors of measurement.

 

21. No information has been provided to indicate that the City has sought out other, equally valid selection procedures that might produce less adverse impact on the employment opportunities of female applicants.

(Ex. O, PP17-21.)

 

   119. Dr. Jones thus opines that neither the process of initially designing the PAT nor the steps taken to modify it over the years were in conformity with professional standards. (Ex. O, PP23-27, 28-31.)

 

   120. Fundamentally, Dr. Jones opines that, because the PAT was administered as a unitary test (i.e., the pass/fail decision was made based on the candidate’s total performance of the test as a whole), it needs to be validated as a unitary test. (Tr. Vol. 2, pp. 128-29.)

 

   121. He explains that the PAT,  as administered, is really a different test for every person who takes it because the amount of time required to complete each sequential segment in the test affects the amount of time the candidate has left to complete the remaining segments. (Tr. Vol. 2, p. 130.)

 

   122. Dr. Jones views this as significant because, in his words, “you have to know what you’re measuring in order to do a thorough study on whether a test has any validity or not. ... If I run the obstacle course in 30 seconds, I have 60 seconds left [to complete the required push-ups and sit-ups.] If you run it in 60 seconds, you have only 30 [seconds]. At that point we’re now taking two different tests. And potentially measuring two different things from that point forward.” (Tr. Vol. 2, pp. 130-31.)

 

   123. One critical problem, in Dr. Jones’ opinion, is that the PAT standards were established in one fashion (i.e., as separately timed component parts) but administered in a different fashion (i.e., as a unitary test): “It’s putting together something that has the chance of being correct when it’s designed, but then putting it into place and making decisions in a completely different fashion.” These differences in administrative format can make a significant difference in the case of test measurement, according to Dr. Jones. (Tr. Vol. 2, p. 131.)

 

   124. Dr. Jones further opines that, even assuming it would be acceptable to demonstrate validity of the PAT by validating each component part separately, such validity has not been demonstrated here. (Tr. Vol. 2, p. 133.)

 

   125. Dr. Jones concedes that the obstacle course component of the PAT appears to have a degree of content validity; however, he finds no evidence of validity for the push-up and sit-up components. (Tr. Vol. 2, pp. 133-37.)

 

   126. As to the sit-ups component, Dr. Jones testified that he had never in his career seen evidence or studies showing that sit-ups relate to job performance for police officers. (Tr. Vol. 2, p. 138.)

 

   127. He believes it may be possible to construct a study which would evaluate the criterion-related validity vel non of sit-ups and push-ups; however, no such evidence has been gathered in this case. Moreover, two factors -- the fact that an 11 year-old girl passed the PAT while incumbent SWAT team members ostensibly failed it -- lower Dr. Jones’ expectation that the sit-ups and push-ups components of the PAT would have criterion-related validity under a proper analysis. (Tr. Vol. 2, pp. 158-60.)

 

   128. Dr. Jones also disagrees with Dr. Davis’ opinion that the PAT’s passing standards were set too low. In fact, Dr. Jones opines that the standard was set far above that which would represent the minimum level of acceptable performance. (Ex. O, PP32-33.)

 

   129. Dr. Jones observes in his initial report:

 

32. The only systematic study of how current, effectively performing police officers might score on the test was undertaken in 1994. In that study, the performance of 19 current police volunteer officers was used to set a qualifying score on the test: the average score of the 19 officers on each component. While helpful in linking the qualifying score required of applicants to the performance of actual police officers, the “average officer” standard selected would have resulted in “disqualifying” approximately one-half of the current police participants.”

 

33. Further, the “random sample” of officers on [sic] whom the passing requirement was established was not random. Originally, fifty Traffic Division officers were randomly sampled from a total list of 134 officers to participate in the process, with the objective of obtaining cooperation from at least 30. Only 19 officers volunteered and participated. This sample is not “random.” There is no basis for determining whether it reflected physical capabilities near the top, bottom or middle of the full officer force. It should be noted that some of the participants were actually members of the Bureaus [sic] SWAT team, a unit that has elevated physical ability requirements. ...

(Ex. O, PP32-33 (internal citation omitted).

 

   130. Dr. Jones further points out that, under the PAT standards as originally implemented in 1994, only four of the 19 incumbent test-takers -- including only 2 of 8 SWAT team members -- would have met all three criteria for a passing score (i.e., total time of 90 seconds or less on the PAT, 17 push-ups and 9 sit-ups). Under the standards as implemented in 2002, when the requisite number of push-ups was lowered from 17 to 13 and the requisite number of sit-ups was raised from 9 to 13, only 2 of 19 incumbents would have passed the PAT. (Tr. Vol. 2, pp. 151-52.)

 

   131. Dr. Jones considers this percentage passing rate “awfully low” considering that the City has represented that all 19 of the incumbent test-takers were adequately performing their jobs. (Tr. Vol. 2, p. 152.)

 

   132. Dr. Jones asserts this is all the more true in light of the fact that the incumbent test-takers were strictly volunteers who, in his opinion, tend to be more motivated and typically better performers. (Tr. Vol. 2, pp. 152-53.)

 

   133. Dr. Jones also observed that, with regard to the three female police officers who took and passed the PAT during its administration, their test scores reflect that they either passed by one second, failed by one second, or passed by virtue of a 5-second grace period. (Tr. Vol. 2, pp. 154-55.)

 

   134. Based on this evidence, Dr. Jones rejects Dr. Davis’ “Big E” theory that the PAT is too easy. Dr. Jones concludes that this theory is belied by the fact that only three female officers (all admittedly competent) only barely passed the PAT, 30% of the males who took it failed, and a significant portion of the incumbents who set the test-taking standards would have received failing scores. (Tr. Vol. 2, p. 157.)

 

   135. Dr. Jones opines that, when conducting employment testing, there is a danger in setting the cut-off score too high on a single component, such as physical ability: namely, an employer risks eliminating those officers who might have an overall better mix of qualities and skills. Otherwise stated, Dr. Jones believes that the “job” is bigger than its physical requirements and, if the physical requirements are set too high, the employer misses its opportunity to add that blend of other relevant skills and abilities (such as cognitive abilities, job knowledge, personal character, etc.) into the mix. (Tr. Vol. 2, pp. 155-56.)

 

   136. With regard to the evidence of the 11 year-old girl who passed the PAT, Dr. Jones believes this factor does not necessarily mean that the PAT as such is too easy; in his opinion it more likely demonstrates that the PAT simply does not measure what the City intends it to measure. He agrees with Dr. Davis that the test probably should incorporate other components that an 11 year-old likely would not pass. (Tr. Vol. 2, p. 156.)

 

   137. In sum, Dr. Jones believes that the PAT is an invalid test because some of its constituent parts (i.e., the sit-ups and push-ups) lack validity. (Tr. Vol. 2, pp. 157-58.) In fact, Dr. Jones concludes that the City’s methodology in designing, modifying, and scoring the PAT over time “fail[s] to incorporate even the most basis principles of content, criterion-related, or construct validation research” and that “technical documentation associated with the test design effort is nonexistent.” (Ex. O, P35.)

 

   138. Dr. Jones considers the procedures used by the City in scoring the PAT and in reaching its ultimate conclusions about applicants’ qualifications to be “seriously flawed” in that they “bear no demonstrable relationship to the level of performance required for successful performance on the job.” He further asserts that the scoring system “gives no consideration to the reliability of the test scores or to the degree that changes in the test’s procedures over time have made the original standards-setting exercise obsolete.” (Ex. O, P35.)

 

   139. In Dr. Jones’ opinion, an invalid test like the PAT should not be used because it is ultimately a waste of time and money and because it may produce unintended consequences, such as inadvertently screening out one group of job applicants over another. (Tr. Vol. 2, pp. 141-42.)

 

   140. Finally, Dr. Jones notes that, “while the City’s attempts to reduce the level of adverse impact associated with the test are clear, they have been unrelated to any systematic analysis of the entry-level job, its physical requirements, or the existence of potentially equally valid, but less adverse alternative screening procedures. (Ex. O, P35.)

 

William McArdle, Ph.D.

 

   141. William McArdle has a Ph.D. in work physiology and is Professor Emeritus in the Department of Family, Nutrition, and Exercise Science at Queens College in New York City, where he has taught since 1965. He has authored or co-authored more than 50 research articles dealing with environmental physiology, exercise training, and the assessment of diverse components of physical fitness as well as several text-books relating to exercise physiology. (Ex. V, p. 1.)

 

   142. Dr. McArdle was qualified at trial as an expert in the field of exercise physiology. The parties have stipulated that Dr. McArdle is qualified to provide opinion testimony in the fields of exercise physiology, physical performance, physical requirements, and physical testing. (S 54.)

 

   143. At trial, Dr. McArdle provided opinion testimony and his two reports were admitted into evidence as Exhibits V and W.

 

   144. He was retained by the United States to assess the PAT’s potential, because of physiological differences between the sexes, to cause an adverse impact on female candidates. (Ex. V, p. 1.)

 

   145. Dr. McArdle opines that the agility run portion of the PAT represents a predominantly sprint-power or anaerobic form of exercise, as opposed to a more prolonged effort requiring energy from aerobic metabolism. Dr. McArdle notes that, while women generally perform less well than men in both anaerobic and aerobic physical tasks, the gender differences are “disproportionately magnified” in anaerobic tasks of relatively short duration. Thus, according to Dr. McArdle, “the overall test duration [of the PAT] with its high demand on anaerobic power capacity exacerbates the adverse impact on women.” (Ex. V, p. 2.)

 

   146. Dr. McArdle notes that sit-ups and push-ups, to the extent they are used at all in the field of exercise physiology, are typically used to measure aspects of an individual’s overall physical fitness. Moreover, tests of physical fitness use gender-specific standards in order to account for well-established physiological differences between men and women. (Ex. V, pp. 3-4.) Dr. McArdle therefore concludes that, to the extent the sit-up/ push-up portion of the PAT is used to measure candidates’ physical fitness levels,  it is inappropriate to use the same absolute performance requirements for both males and females. (Ex. V, p. 4.)

 

   147. Due to physiological differences in upper-body strength, Dr. McArdle opines that the PAT’s push-up component “would very likely disproportionately eliminate female candidates, even if the male and female candidates possessed the same general fitness level.” (Ex. V, p. 5.)

 

   148. With regard to sit-ups, Dr. McArdle states that sit-ups do not reflect lower body muscular strength as the designers of the PAT believed. According to Dr. McArdle, “the sit-up only assesses the endurance of the abdominal musculature. Considerable data exist to indicate gender differences, on average, [favoring males] in sit-up performances.” Thus, Dr. McArdle concludes, the PAT’s requirement that male and female candidates perform the same number of sit-ups would “likely disadvantage female candidates.” (Ex. V, p. 5.)

 

   149. Dr. McArdle concludes in his original report:

 

   Based on my professional opinion as an exercise physiologist, the PAT’s overall demands on sprint-power (anaerobic) physical capacity, its reliance on upper-body muscle strength, and its failure to recognize well-established gender differences in measures of physical fitness magnify performance differences between men and women. The sit-up, wall climb, and push-up components have the greatest negative impact on a woman’s chances for selection. 

(Ex. V, p. 6.)

 

   150. In his rebuttal report, Dr. McArdle disagrees with Dr. Davis’ conclusion that the PAT is like the “Big E” on an eye chart -- so easy that if any candidate who fails the test necessarily lacks the physical qualifications to be a police officer. (Ex. W, p. 3.) On the contrary, Dr. McArdle opined that the PAT’s format -- particularly where the sit-up/push-up component was placed at the end of the testing sequence -- requires an “all-out physical effort” that represents a “predominantly sprint-power or anaerobic form of exercise, precisely the type used to produce exhaustion in exercise physiology laboratories.” (Ex. W, pp. 3-4.) (See also Tr. Vol. 3, pp. 89-90.) He points out that almost 90% of the women taking the PAT failed it; 25-30% of the males failed; Officer Raszkowski (a