In 2010, Strunk and Reardon introduced a potentially transformative method for analyzing teacher collective bargaining agreements (CBAs). We extend Strunk and Reardon’s work by assessing whether the Partial Independence Item Response (PIIR) approach can be applied to subsets of provisions from CBAs, data that may be more feasible for researchers to collect. Utilizing a new data set derived from all provisions in all active CBAs in Washington state, we find that estimates calculated from a subset of high-profile provisions are moderately highly correlated with estimates calculated from the full range of provisions, as are estimates calculated from several categories of provisions. This suggests that researchers can still draw important conclusions by applying the PIIR method to readily available data on teacher CBAs.
- collective bargaining
- teacher labor markets
- item response theory
New Focus on Collective Bargaining
Increasingly, policy makers aiming to raise student achievement have turned their attention to issues of teacher quality. The focus on teachers—and in particular on the variation in effectiveness of the teacher workforce—is driven by a growing body of research that shows teacher quality to be the most important schooling factor in students’ academic success (Darling-Hammond, 2000; Rivkin, Hanushek, & Kain, 2005; Rockoff, 2004). While prominent in policy debates, less empirical attention has been paid to the governing mechanisms that may influence the quality and distribution of teachers within school districts.1 Chief among these mechanisms are collective bargaining agreements (CBAs).
It is surprising that few empirical studies have focused on CBAs given that it is quite common for policy makers and pundits alike to point to CBAs, and some CBA provisions in particular (e.g., seniority-based job protections), as key inhibitors to effective school district operation and student achievement (Cohen, Walsh, & Biddle, 2008; Hess & Loup, 2008). A focus on CBAs is also timely given that the federal government’s Race to the Top grant competition incents states to make dramatic changes in teacher policies, many of which must be negotiated as part of the collective bargaining process.
In this article, we introduce a unique new data set derived from all provisions included in all CBAs in effect in Washington state in the 2010-2011 school year, and report the findings from an analysis using Partial Independence Item Response (PIIR) models of contract restrictiveness (Reardon & Raudenbush, 2006; Strunk & Reardon, 2010). We use this measure to calculate the restrictiveness of every CBA in the state, and then test the internal validity of this measure on various subsets of provisions: an objectively derived “restricted” subset of provisions (Strunk & Reardon, 2010), a subjectively derived subset of high-profile provisions, and subsets of data corresponding to eight categories of provisions. This analysis is important because prior work on collective bargaining has generally focused either on high-profile provisions or a particular category of provisions (Koski & Horng, 2007; Moe, 2009), and while the PIIR approach is a promising new method for analyzing CBAs, it is not clear whether future studies that utilize this new methodology will be sensitive to the subset of provisions they consider. To our knowledge, our data set is the first to include the full universe of CBAs in a state, and this study is the first to assess the internal validity of the PIIR measure of contract restrictiveness.
We find generally high correlations between restrictiveness estimates calculated from different subsets of data. Importantly, restrictiveness estimates calculated using only high-profile provisions are highly correlated with restrictiveness estimates based on all provisions, suggesting that researchers can still draw important conclusions by applying the PIIR method to readily available data on teacher CBAs. However, estimates from certain subsections of the contract—grievance and layoff—do not correlate highly with estimates calculated from the full data. And work relying only on layoff provisions may in fact lead to opposite conclusions than research informed by all provisions.
A large literature on bargaining in the private sector suggests that competition between firms in a given industry limits private-sector unions from demanding inflated benefits and wages (Clark, Delaney, & Frost, 2002). But public-sector unions’ viability depends on members’ ability to persuade the public and elected officials that contracts and bargaining demands are instrumental to positive policy outcomes and not exclusively devoted to members’ more narrow economic concerns (Klingner, 1994).2 Scholars have recently begun to explore the connections between collective bargaining and teacher workforce outcomes in education (Koski & Horng, 2007; Levin, Mulhern, & Schunck, 2005; Moe, 2005, 2009; Strunk, 2011).
Detailed studies of bargaining in the education context focus on the provisions driving union “strength” or “power” and the influence of collective bargaining on outcomes like wages and student achievement. Most of these studies rely on simple indicators from one section of CBAs to capture a union’s strength in the bargaining process. For example, studies by Moe (2005) and Koski and Horng (2007) rely on measures of seniority-based transfer rights to assess the relationship between union strength and important teacher workforce outcomes.3 Moe’s work on the relationship between union power and student achievement relies on a similar unifaceted measure (Moe, 2009). Carefully chosen CBA provisions can inform our understanding of how these provisions influence specified outcomes. However, in highlighting particular cherry-picked provisions, this work may overlook important trade-offs in the negotiation process and, in doing so, provide a misleading picture of union strength and the relationship between union demands and other important outcomes (e.g., student achievement).4
We argue that most existing studies of the influence of collective bargaining on teacher distribution and student outcomes do not go far enough in addressing sustained critiques of the bargaining literature. Kochan & Wheeler (1975) published a study in 1975 arguing that to successfully advance the state of collective bargaining theories that utilize outcomes as the dependent variable
1) outcomes should be conceptualized in a way that includes all (or a representative sample) of the relevant items of interest that form the content of negotiations; 2) a concept of union power should be developed that reflects the underlying complexity of forces affecting a bargaining relationship and that is susceptible to measurement; 3) the model should be tested empirically in order to assess its validity; and 4) the test should take place at the level at which bargaining actually takes place.
Existing studies that focus on particular subsections or provisions of CBAs to the exclusion of others (Koski & Horng, 2007; Moe, 2005, 2009) may ignore relevant items of interest (Criterion 1) and therefore may not capture the complexity of forces driving contract negotiations (Criterion 2). For instance, Koski and Horng (2007) and Moe (2005, 2009) focused on seniority-based transfer rights without regard to other potentially important contract provisions.
Recent work by Strunk and Reardon (2010), however, seeks to quantify the latent restrictiveness of a teacher contract using a data set of CBAs from a large, representative sample of California school districts that includes the full range of provisions mentioned in contracts across California.5 Specifically, they cleverly adapt Reardon and Raudenbush’s (2006) PIIR model to teacher CBAs by coding provisions in each CBA as “responses” to a conditionally structured survey that addresses nearly every provision that could appear in a CBA. Their data set and methods of analysis address Kochan and Wheeler’s first two concerns, and research utilizing this measure of contract restrictiveness (Strunk, 2011; Strunk & Grissom, 2010) can therefore draw more robust conclusions because the measure is a function of all bargained provisions.
Strunk and colleagues have done further research to investigate the external validity of the PIIR restrictiveness measure. For example, Strunk and Grissom (2010) compared PIIR restrictiveness measures with a statewide survey of school board members in California and found that contracts in districts with stronger unions (measured by school board members’ evaluations of union power and union support of school board members in recent elections) allow school district administrators less flexibility than do contracts in districts with weaker, less active unions. This begins to address Kochan and Wheeler’s third criterion (that all measures must be tested empirically), and we contribute to this effort in two important ways. First, we report the results of applying the PIIR methodology to our data set of CBAs in Washington state. To our knowledge, this is only the second data set analyzed in this manner, and the first that includes every CBA in a state. Second, we assess the internal validity of the PIIR measure by estimating restrictiveness using various subsets of provisions: an objectively derived “restricted” subset of provisions (Strunk & Reardon, 2010), a subjectively derived subset of high-profile provisions, and subsets of data corresponding to eight categories of provisions. The results of this analysis should be of interest to researchers who are drawn to the PIIR approach but do not have access to comprehensive data on all provisions in teacher CBAs.
Data Collection, Coding, and Categories of Restrictiveness
CBAs from Washington state inform our analysis. Washington has 295 school districts, but 25 of these districts are not governed by a CBA. We collected the active CBA for each of the 270 districts that had a CBA operating in the 2010-2011 school year.6
CBAs are legal documents, and the length and detail of these documents preclude simple evaluation and comparison. To understand how CBAs and the provisions they contain relate to one another and other outcomes of interest, it is necessary to encapsulate each agreement’s contents in a concise, logical, and consistent manner. To do this, we follow a rubric adapted from that developed by Strunk (Strunk & Reardon, 2010).7 Strunk’s rubric attempts to address all of the provisions that could appear in a CBA so that resulting data, like the CBAs themselves, capture information on the host of provisions included in the following subsections: association rights, evaluation, grievance procedures, layoffs, hiring procedures and transfers, benefits and leaves, and workload.
Undergraduate students at the University of Washington were responsible for the coding of CBAs. All students were split into pairs. Students independently coded each CBA then met with a partner to resolve coding discrepancies and provide a consolidated, agreed-upon record for each CBA. We use this consolidated coding in subsequent analyses.8
Our primary goal is to explore the extent to which different measures of contract restrictiveness agree with each other in providing a similar picture of the overall CBA. In calculating these measures of contract restrictiveness (which also may be judged as a measure of union power), we seek to capture key issues driving the outcome of management-union negotiations in each district. Moreover, we want the measure to reflect the underlying complexity of CBAs. Following Strunk and Reardon (2010), we code CBAs in a manner that treats each provision in a CBA as a “response” to a survey that includes all contract provisions covered in CBAs.
Designing a measure of restrictiveness that adequately captures the complexity of contracts is not trivial. For instance, many important provisions in CBAs—such as the length of the school day, the negotiated class size in each grade, and number of leave days teachers receive—require a numerical response. Others—such as “Does this CBA include a no-strike clause?” and “Are tenured teachers evaluated differently than nontenured teachers?—invite dichotomous categorization. And many “responses” in a CBA are conditional to responses earlier in the CBA—for example, the response to “is seniority the only factor in selecting a teacher to voluntarily transfer?” is conditional on the response to “does seniority play any role in selecting a teacher to voluntarily transfer?”
Strunk and Reardon (2010) utilized a PIIR model to overcome these data challenges and obtain a measure of CBA “restrictiveness.” PIIR models require a dichotomous response to each provision. Binary responses can be used to account for the conditional structure of response provision data. We use actual data from three districts in our sample to illustrate how our initial observed data, like Strunk and Reardon’s, are transformed into a binary, conditional structure in preparation for PIIR analysis.
For each district contract, CBA coders “respond” to a series of questions regarding the important provisions noted above. The CBA provision rubric “asks” two types of questions: gateway questions (GQs) and subquestions (SQs).9 GQs (always answered in a 1 or 0) ask a coder whether a particular provision or topic is considered in the contract. These questions are followed by a series of additional questions that provide more detail on the structure of a particular provision. For example, the rubric might ask the GQ “Does the CBA specify any factors that determine the order of layoffs in the event of a tie (in seniority)?” If the answer to this question is no (0), then the coder would move on to the next question and ignore the SQs following that particular GQ. However, if the answer is yes (1), the coder goes on to answer additional questions that will show what specific factors (education, performance, administrator discretion) or how many factors (4, 5, 6) determine how layoffs are done in that district.10
The SQs noted above illustrate one of the challenges posed by observed data. To get the most information out of each provision and each contract, coders may initially record a qualitative or numerical response to particular questions such as the length of a school day, the number of students in a class, the timelines used to file grievances, and so on. Table 1 gives two example questions from the grievance section of the coding rubric that we will use to illustrate the coding process.
Responses to these questions appear in what we term an observed response matrix. The observed response matrix for the entire data set is 270 districts by 766 individual provision items. Table 2 gives an example of an observed response matrix for the two example questions and three districts in our sample.
Because the PIIR model requires a binary response, when all contracts are coded and combined, we analyze the distribution of numerical response with an eye for cutoff points that will preserve variance in information but allow a binary structure. Each numerical question is recoded as a series of increasingly restrictive questions that lend themselves to binary response.11 Resulting response categories might be thought of as “bins.” Each bin contains information from a minimum of 10 CBAs (any question with fewer than 20 responses—enough to form two bins—was dropped from analysis, that is, the gate question is the most restrictive information available on that particular provision). We allow up to four bins per question. Question 1b in Table 1 originally read “How long do members have to report a grievance?” This question has been recoded in Table 3 to elicit a binary response. Table 3 reports actual frequencies from our observed data to illustrate how questions are recoded and “cut” to form bins, whereas Table 4 shows an example of the resulting binary response matrix for our example questions and districts (the full matrix has 270 districts by 633 binary provisions once we drop any provisions that applied to fewer than 10 districts).
The PIIR Model
We now redirect attention to the PIIR model. As noted above, the PIIR model treats each provision in a CBA as a binary “response” to a survey that includes all contract provisions covered in CBAs. And because many “responses” in a CBA are conditional to responses earlier in the CBA—for example, the response to “is seniority the only factor in selecting a teacher to voluntarily transfer?” is conditional on the response to “does seniority play any role in selecting a teacher to voluntarily transfer?”—the PIIR model uses as the dependent variable the conditional probability that a provision appears in a CBA given that it is in the “risk set” for that CBA (i.e., the item in question could have appeared in the CBA given response to previous questions). Specifically, if Yik represents the outcome of provision k in contract i, and hik represents whether this provision is in the “risk set” for contract i, we can let . The model is then,
In Model 1, the conditional probability of provision k appearing in contract i is a function of the latent restrictiveness of CBA i (θi) and the conditional “severity” of provision (γj).12 Dij is simply a dummy variable indicating which provision is considered. Thus, Model 1 allows simultaneous calculation of the restrictiveness of each contract as a whole as well as the severity of each individual provision.
The dependent variable in Equation 1 is conditional on each item being in the “risk set” for a particular CBA. Provision k is in the risk set for CBA i if it is a gate question or if it is a SQ for which the gate question has been coded a 1. To build the “risk set” and further ready data for analysis, we follow the methodology of Reardon and Raudenbush (2006). We create a “gate matrix” that indicates whether an item is conditional on another (Gate) item (633 × 633). In this gate matrix, a 0 is recorded each time an item refers to itself (all 0s on the diagonal) and a 1 is recorded each time an item references (is conditional upon) the other item in question. Zeros are recorded throughout the rest of the data set. Table 5 gives an example gate matrix for our example questions and districts.
We use this “gate matrix” to form a “risk matrix” that indicates whether a provision is in the risk set for each CBA. CBAs that responded affirmatively to Question 1 above could have responded affirmatively to 1b whether or not they actually did. Therefore, Question 1b is in the risk set for that CBA. SQs are not in the risk set of any CBA that has a zero for any of its gate questions. Table 6 gives an example risk matrix for our example questions and districts.
Once we have this “risk matrix,” we can limit the binary response matrix to only those observations that correspond to items in the risk set for a particular CBA. The resulting matrix is called the CBA-Item matrix and is a record of actual responses to each item considered in a particular CBA. Table 7 contains the example CBA-Item matrix (note that the “response” column comes directly from the binary response matrix in Table 4). In the CBA-Item matrix, each response Yik to a provision i that is not in the risk set for CBA k (hik = 0) has been removed. So, we can run Model 1 as an unconditional logistic regression on the CBA-Item matrix, as the dependent variable in the logistic regression is now .
The PIIR approach described above allows us to consider each CBA as a comprehensive document rather than subjectively pulling out specific CBA provisions that we (or others) may believe should have more or less influence on student and teacher outcomes. Each CBA can then be compared with every other CBA in the state, and by rubric design, the most restrictive district in the state should give management the least flexibility. However, two contracts by this measure may be considered equally restrictive if they have the same number of provisions (0s and 1s) even if they are “restrictive” in very different ways.13 And it is quite likely that union and district representatives “trade” restrictiveness in one area of the contract for “leniency” in another. Therefore, in addition to obtaining an objective measure of CBA restrictiveness informed by all provisions within the “risk set” for each CBA, we also perform similar analyses on different subsets and categories of provisions.
Restricted Subset of Provisions
The measure of contract “restrictiveness” based on all provisions is objective and detailed, but a measure relying on 633 contract provisions is not portable or easily replicated. Moreover, we use these restrictiveness estimates as the dependent variable in future analyses so we want to reduce the noise in this measure as much as possible. Therefore, like Strunk and Reardon, we assess the 633 contract items used in our full model to ensure that they are all contributing to the measurement of the underlying “restrictiveness” trait. Identifying any misfitting items allows those items adding more noise than signal to our measure of restrictiveness to be removed from our scale. The resulting scale should be more reliable and user-friendly (as it is composed of fewer items).14 We begin with a relatively high .67 contract reliability (compared with Strunk and Reardon’s .572).
Like Strunk and Reardon (2010), we base our item reduction on the unbiased statistical methods used in test construction. We run exploratory Cronbach’s alpha analysis on all 633 items included in our initial model. We examine the item-total correlations produced for each of the 633 items. A low item-total correlation statistic for a specific item tells us that item fails to measure the concept captured by the other items. We follow a generally accepted standard used by test makers and Strunk and Reardon and objectively discard items with item-total correlations lower than .25 (Strunk & Reardon, 2010). After an initial round of item reduction, we reassess our data and remove any further items that have item-total correlations below .25 based on the new scale with fewer items. After three iterations of this process, no items with item-total correlation below this threshold remain. We are left with an instrument of 218 items that span the breadth of the contract. The reliability of this measure increased slightly to .72, which indicates that the 415 discarded items were in fact capturing more noise than the underlying trait.15 Unfortunately this “reduced” set is still not nearly as “user-friendly” as Strunk and Reardon’s 39. This suggests that unlike California, in Washington, one must consider a larger number of provisions to get a good gauge on the restrictiveness of a particular CBA.
Categories of Provisions
CBAs often follow a similar layout or formula. Association rights, evaluation, grievance procedures, layoffs, hiring procedures and transfers, benefits and leaves, and workload are discussed in specified contract subsections. The Strunk coding rubric used to create the data used in these analyses also categorizes provisions in this manner. And previous work has focused on particular provisions that may fall under the umbrella of one of these subcategories (workload, layoffs, hiring, and transfers; Koski & Horng, 2007; Moe, 2005, 2009; Moe & Anzia, 2011). Discussions with teachers and district administrators lead us to believe that unions and district managers may bargain “trade-offs” between categories to come to a final, mutually beneficial agreement. Therefore, in addition to running PIIR analysis on our full and restricted data sets to obtain district restrictiveness estimates, we also run PIIR analyses of the categories reported in Table 8 to determine whether districts that are “highly restrictive” in one category appear to be more or less restrictive in related categories. When we run the PIIR model on a category of provisions, we only consider provisions that fall within that category.
Our final data set comprised high-profile provisions, those talked about in the popular press and cited in prior subjectively focused academic research (Koski & Horng, 2007; Moe, 2005, 2009). Table 9 lists the “cherry-picked” provisions included in our final analyses. These provisions should adequately capture a district’s “visible” restrictiveness.
One of the strengths of the PIIR method is that it is highly objective; that is, researchers do not determine a priori which provisions should receive the most weight in the analysis. Thus, it may seem counterintuitive to apply the PIIR method to a subjectively chosen subset of provisions. However, many researchers only have access to data on high-profile provisions—for example, the National Council on Teacher Quality (2009) maintained a publicly available database of high-profile provisions for 150 large districts across the country—and the PIIR method can still generate an objective measure of CBA restrictiveness given the subset of provisions considered. The question we investigate, then, is whether high-profile provisions contribute to the same latent restrictiveness as the full range of provisions.
Restrictiveness Estimates and Internal Validity Assessment
We have described our data and a method of analysis (PIIR) that yields a measure of restrictiveness based on all provisions. This measure should capture the content of negotiations and reflect the underlying complexity of forces affecting a bargaining relationship. How restrictive are the 270 teacher contracts in the state of Washington, and does the PIIR estimate produced from all provisions correlate with PIIR estimates that rely on a reduced set of provisions, particular subsets of provisions, or particular cherry-picked provisions utilized in prior research? In this section, we present restrictiveness estimates for every CBA in our data set and discuss the relationship between measures of restrictiveness relying on various data subsets.
We use our item response data to obtain a “restrictiveness” measure for each contract and each provision in all 270 of Washington’s CBAs. Restrictiveness estimates obtained via fixed effects logit PIIR are presented in Column 2 of Table 10. All results have been standardized to have mean 0 and standard deviation 1 within each model.16 Therefore, the magnitude of each coefficient should be interpreted in standard deviations of restrictiveness; for example, the CBA in Aberdeen School District is 0.24 standard deviations less restrictive than the average CBA in the state when we use the full range of provisions in our data set (column 2, Table 10). Column 3 of Table 10 displays each district’s restrictiveness estimate based on the objectively reduced data set described above. Column 4 of Table 10 provides district restrictiveness estimates based on the “cherry-picked” set of provisions identified in Table 9. Columns 5 to 12 of Table 10 present results by subsection of the CBA (the categories corresponding to each column are listed at the end of Table 10.)
Table 11 displays the correlations between the PIIR estimates calculated from each subset of data. This presentation should be considered a first attempt at assessing the internal validity of the PIIR measure. Comparisons highlight similarities and key differences between estimates based on different subsets of data. The correlations are generally high, suggesting that latent restrictiveness in one category is predictive of latent restrictiveness in another category or in the contract as a whole.
The exceptions are restrictiveness in grievance policies (which is only weakly correlated with other subsets) and layoff policies (which is negatively correlated with estimates from other categories). Researchers who rely on grievance and layoff policy as a proxy for “union power” should take note as these results suggest that provisions from these contract subsections may capture another dimension of bargaining and lead to misleading results. That said, note that the restrictiveness estimates using only hiring and transfer policies are somewhat highly correlated (r = .59) with estimates based on the full sample. This gives evidence that prior work focusing only on these provisions (Koski & Horng, 2007) may be capturing a measure of restrictiveness similar to a measure relying on the full range of provisions.
Also of particular interest is the moderately high correlation between the restrictiveness estimates using the cherry-picked provisions and using the entire contract (.75). This suggests that—although our item reduction demonstrates that a large number of provisions are necessary to make conclusive inferences about contract restrictiveness—it is still possible to infer a great deal about the restrictiveness of a contract from a small subset of subjectively chosen provisions. Thus, future research relying on highly contested provisions across contract subsections may yield results similar to research relying on exhaustive, detailed coding of a near-complete universe of provisions.
Our results suggest that while the PIIR method is an important development in the analysis of collective bargaining outcomes, researchers do not necessarily need to code every provision in CBAs to utilize this methodology and draw meaningful conclusions from these agreements. Specifically, analyses that calculate PIIR estimates using a subset of high-profile provisions across the contract or a category of provisions that appears to contribute to the latent restrictiveness of the contract—such as association rights, evaluation procedures, teacher benefit and leave policies, hiring and transfer provisions, and teacher workload agreements—may capture a measure of latent restrictiveness similar to one that utilizes the full range of provisions. This is good news for researchers who are drawn to the utility of the PIIR methodology but do not have access to exhaustive data sets of CBA provisions.
Further research investigating the external validity of PIIR measures informed by various data subsets will add confidence to these findings. In future research, we plan to explore potential determinants of contract provisions—districts’ demographic, social, political, and economic characteristics, and the corresponding characteristics of proximate districts. The findings reported here will be bolstered if similar factors correlate with contract restrictiveness regardless of the category of subset of data considered. We also plan to investigate the relationship between contract restrictiveness and the quality and distribution of the teacher workforce, and if our results are robust to measures that utilize only high-profile provisions, this will lend additional support to our findings that these high-profile provisions contribute to the same latent restrictiveness as the entire contract.
We thank the collective bargaining agreement (CBA) coders without whom this project would not be possible: Rahn Amitai, Shijia Bian, Scott Bohmke, Stephanie Burns, Jonathan Humphrey, Angela Kim, Gregory Johnsen, Eric Lei, Hanqiao Li, Yi Li, Wanyu Liu, Xijia Lu, Alex McKay, Courtney Polich, Leah Staub, Annie Saurwein, Bifei Xie, Nancy Xu, Youngzee Yi, and Wenjun Yu.
Protocol for Collecting and Coding Collective Bargaining Agreements (CBAs)
To assess the relationship between CBAs and the quality and distribution of the teacher workforce, we rely on contract data from all school districts in Washington state. Transforming contract legalese into quantitative data requires a detailed coding strategy. Recently, Katharine Strunk developed a rubric designed to capture all provisions contained in teachers union contracts (Strunk & Reardon, 2010). The rubric allows one to reduce long, detailed documents to a series of binary responses. We use Strunk’s rubric, modified to suit the Washington state context, to assess the relationship between CBA provisions and the quality and distribution of the teacher workforce.
Prior to analyzing the relationship between CBA provisions and the quality and distribution of the teacher workforce in Washington state, it was necessary for us to (a) obtain a data collection instrument, (b) collect CBAs for all districts in Washington state that had such an agreement, (c) train a team of individuals to read, assess, and code the CBAs per the data collection instrument, and (d) consolidate the data generated from such coding for subsequent analysis. We review each of these processes below.
Obtain a Data Collection Instrument
We use data captured by Katharine Strunk’s CBA coding rubric in all of our analyses. Before using Strunk’s rubric, we modified several questions to reflect Washington state law and context. While much of the instrument could be used without modification, we replaced references to specific California state law with the comparable Washington state law, added questions to capture issues (such as layoff policies) covered by state law in California but left to district discretion in Washington, and made several minor changes to increase accessibility for our coders.17
Collect CBAs for All Districts in Washington State
A CBA, as a contract agreement between a public entity (the school district) and a legal entity (the collective bargaining unit), is a public document and falls under Washington State’s Public Records Act. Contracts should be “publicly accessible” and subject to review on request of any person; however, there is no publicly accessible cache of CBAs for certificated employees (teachers) in Washington state. As such, each CBA must be requested from an originating school district.
The state of Washington has 295 school districts. We requested a hard or electronic copy of all available CBAs between the school district and certified employees (teachers) from each district. We collected many CBAs from school websites (111 districts had CBAs on their district website or on the teacher union’s website). After this initial round of online data collection, we began contacting individual districts by phone and email. These methods led to the collection of CBAs from an additional 80 districts. Although many districts were extremely responsive and helpful, other districts were reticent to comply with our informal public records requests (PRRs). When districts were not responsive to repeated phone/voicemail, or email requests, we faxed them a formal PRR and followed up again via phone and email. This method led to the eventual collection of all remaining CBAs.
We collected CBAs from 270 of the 295 school districts in the state, with 447 CBAs collected in total. Many districts were able to provide multiple CBAs spanning several years. Additional CBAs from previous years were known as “sister” CBAs. The 25 remaining districts do not have union arrangements and therefore had no CBA.
CBAs vary in their legal endurance. The majority of CBAs from the 270 districts in our sample (81%) had a legal span of 2 or 3 years. About 12% of contracts covered only 1 year and were renegotiated annually. The remaining contracts were 4 or more years in span. The average legal span for all CBAs was 2.5 years.18
Train a Team to Code CBAs
Undergraduate statistics, sociology, political science, and economics students from the University of Washington coded the majority of CBAs for this project. The Center for Education Data and Research advertised through departmental internship coordinators and data-related courses within each department. Nineteen students began coding in the spring quarter of 2011. Most of these students took a directed research course or received internship credit through the University of Washington, with the understanding that their work was to code CBAs during the bulk of the term in exchange for access to data to do their own analysis at the end of the quarter. In addition to normal intern/directed research work, we held intensive week-long coding sessions between quarters. These were intended to make maximum use of the training and experience of coders, and maintain skills between school breaks.
The training process for each student was similar. All students read several background documents detailing project goals and operations and their role in the project. Each student then took home a CBA and used the data collection instrument to code the entire document. In 1 week’s time, all the students met again to compare coding and to provide a unified (or agreed-upon) coding rubric for the CBA in question. This process was facilitated by Research Assistants Lesley Lavery and Roddy Theobald.
Once the initial training was complete, the students broke into groups of two or three (based on the number of credits they were taking) and were assigned CBAs to code for the following week. Students independently coded the assigned CBAs (1-7 per week, depending on alacrity and credit load) at home or school, and then met with their partner to resolve coding discrepancies and provide a consolidated, agreed-upon data set for all assigned CBAs. This process was repeated each week until the conclusion of the quarter.
Consolidate the Data
The coding rubric (our instrument) and all coding data from the students were constructed and contained in Excel spreadsheets. All coders were instructed to provide headings in a specific format to make data consolidation a simple process of cutting and pasting from individual CBA coding spreadsheets to a combined spreadsheet.
Coders sent a final, agreed-upon coding for each CBA to a Center for Education Data and Research (CEDR) staff member each week. This staff member then compiled multiple groups’ coding in a “Meta” data set that contained the individual and consolidated coding for each individual and group for each CBA. Cohen’s kappa scores were calculated using this data set to determine intercoder reliability. Finally, from the “Meta” data set, we distilled the consolidated coding (the agreed-upon coding for every CBA coded) for final analysis. A resulting “Master” data set was then divided into two additional data sets: one with only the most recent CBA for each district—the data used in the bulk of our analyses—and another with the coding of sister CBAs.
Our protocol for collecting and analyzing CBAs for this research was characterized by four distinct steps: (a) obtain a data collection instrument, (b) collect CBAs for all districts in Washington state, (c) code the CBAs, and (d) consolidate the data. The most labor-intensive portion of this process involved coding the CBAs. Undergraduate students from the University of Washington, supervised by Center for Education Data and Research (CEDR) researchers coded all 270 CBAs used in subsequent analyses. Coders answered an exhaustive list of questions about each CBA. The analytical methods chosen for this project require that each response be in a binary format. Therefore, after all CBAs were coded, research assistants transformed data via the process explained in the body of this article. The final, transformed, coding rubric contained 764 total questions.
The intensive data collection and coding process described here allows us to explore the relationship between CBAs and the quality and distribution of the teacher workforce. To the best of our knowledge, this is the first time such analysis has been attempted in Washington state.
Declaration of Conflicting Interests The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This research was made possible in part by grants from an anonymous foundation and the Bill and Melinda Gates Foundation, and has benefited from helpful comments from Katharine Strunk and Sean Reardon.
↵1. The prominence of collective bargaining agreements (CBAs) in policy debates was illustrated by recent events in Ohio and Wisconsin, where Republican governors, after significant political battles, rolled back union power to negotiate key collective bargaining provisions. Ohio voters later rejected these cutbacks via referendum.
↵2. Union election structures forge a positive connection between what teachers want and what their leaders actually do. When union leaders step out of line, collaborating when members seek to contest district reforms or supporting changes that members disavow, their tenure is short lived (Moe, 2011).
↵3. Moe relies on a transfer rights scale, developed based on factor analysis of several seniority rights CBA provisions. Koski and Horng rely on six transfer rights provisions.
↵4. For example, unions may agree to stricter evaluation standards in exchange for seniority-based transfer rights.
↵5. Only districts with at least four schools are included in Strunk and Reardon’s analyses.
↵7. We made several modifications to Strunk’s original coding scheme to reflect the Washington state context. We replaced several references to specific California education code with comparable Washington state law where applicable. We also added an entire section on layoff policies, as layoff policies are collectively bargained in Washington state (and addressed by the state law in California). Finally, after coding a representative sample of 75 CBAs, we added additional provisions—mainly to the layoff and evaluation sections—to capture the full range of provisions in Washington state. Our rubric considers 766 individual provisions across the subtopics noted above. For more details on the coding rubric and revisions to the rubric, see Appendix A.
↵8. We use each student’s original coding to calculate a Cohen’s kappa score for each pair as a measure of intercoder reliability. Scores range from .43 to .95 with an average score of .62. A kappa score of 1 implies perfect agreement, which is rare. Scores ranging from .40 to .60 imply moderate agreement, .60 to .80 good agreement, and .80 to 1.0 a very high degree of agreement (Altman, 1991). Final coding reliability should improve on these kappa scores as it reflects improvements made through careful joint review of each individual’s original coding.
↵9. Several gateway questions (GQs) are followed by sub-GQ, additional questions that must be responded to in the affirmative for coders to proceed to the next item.
↵10. A list of all contract provisions is available from the authors on request.
↵11. As one of the goals of the Partial Independence Item Response (PIIR) model is to measure the “restrictiveness” of each CBA, we recode each numerical question so that each successive question represents a greater “restriction” to the district. For example, lower mandated class sizes are more restrictive to a district, so we recode class sizes as “Is the negotiated class size in Grade 4 no more than 27? No more than 25? and so on.” However, more teacher leave days are more restrictive to a district, so we recode leave days as “Do teachers get more than 3 bereavement days? More than 5 bereavement days? and so on.”
↵12. This approach is conceptually similar to a Rasch (1960) model that calculates the probability of a student answering a question correctly on a test as a function of his or her latent ability and the latent difficulty of the question.
↵13. And we cannot say whether this measure of contract “restrictiveness” is related to outcomes. We believe that a restrictive contract will restrict management practices in some sense. But a restrictive contract does not necessarily restrict management in ways that would be expected to lead to any particular outcome. To determine the relationship between a particular kind of restrictiveness and a particular outcome of interest, we would still need to look at on-the-ground practices related to particular provisions. For example, a CBA may mandate that novice teachers are evaluated annually and the evaluation must consist of three classroom visits. This CBA would be seen as a more restrictive CBA than another district that did not mandate anything about the evaluation of novice teachers but we have no idea what evaluation practices look like in either the district with “more restrictive” contract language or the district with no evaluation-related provisions.
↵14. This does not mean the measure is more accurate. A measure that will yield the same response in repeated trials is “reliable” but may not be the best, or most complete, measure of a concept of interest.
↵15. The results of fixed effects PIIR models run on the full and reduced data set are highly correlated (.88).
↵16. The measure of contract restrictiveness obtained from a mixed effects model treating districts as fixed effects and provisions as random effects yields highly correlated (r > .99) estimates, suggesting that the restrictiveness estimates are robust to our specification of the provision effects. For simplicity, then, we only present results of the fixed effects model.
↵17. We clarified some language and terms, expanded some non-GQ, and added administrative detail to ease navigation and reference.
↵18. Though contracts are negotiated and legally binding for a specified period, some districts appear to rely on expired contracts and renegotiate infrequently. For example, Queets-Clearwater, a very small district of approximately 30 students, negotiated a contract to span 1995-1997. It was amended in 2000 (leaving a gap between 1997 and 2000) but has remained otherwise untouched since then. As of 2012, Queets-Clearwater operates under this agreement. This is an extreme case, but it was not uncommon for us to find large gaps between legal spans of CBAs. Therefore, though these instances may be a reflection of district compliance with our efforts to collect CBAs, it may also provide a signal of contract strength or restrictiveness.
- © The Author(s) 2013
Dan Goldhaber is the director of the Center for Education Data & Research, Bothell, Washington. His significant research on various aspects of teacher quality and labor markets has been published in a number of peer-reviewed journals, and he has been involved with numerous technical advisory panels on teacher quality and the teacher labor market.
Lesley Lavery is an assistant professor of political science at Macalester College, Saint Paul, Minnesota. She is also an affiliate of the Center for Education Data and Research. Her research focuses on education policy, public and social policy, and political behavior.
Roddy Theobald is a PhD student in the Department of Statistics at the University of Washington, Seattle. His research focuses on the application of new statistical methodology to problems in educational evaluation and policy.
Dylan D’Entremont is a research assistant at the Center for Education Data and Research, Seattle, Washington. His research focuses on district administration, hiring, and efficiency.
Yangru Fang is a research assistant at the Center for Education Data and Research, Seattle, Washington. Her research focuses on the application of probability theory and stochastic processes to real-life problems.