Skip to contents

A cleaned subset of the Project STAR dataset. This processed version focuses on third-grade test scores (math and reading) and includes key variables for mixed-effect modeling.

Usage

star

Format

A data frame with 4,192 rows and 13 columns:

school_id

Factor indicating unique school ID.

system_id

Factor indicating school system ID.

sctype

Factor indicating school type: "inner-city", "suburban", "rural", or "urban".

gender

Factor indicating student's gender: "female" or "male".

ethnicity

Factor indicating student's ethnicity: "cauc" (Caucasian), "afam" (African-American), "asian" (Asian), "hispanic" (Hispanic), "amindian" (American-Indian), or "other".

cltype

Factor indicating student's class type in 3rd grade: "small", "regualr", or "regular-with-aide".

tdegree

Factor indicating highest degree of 3rd grade class teacher: "bachelor", "master", or "specialist".

tyear

Integer years of teacher's total teaching experience in 3rd grade.

lunch

Factor indicating whether the student qualified for free lunch in 3rd grade: "free" or "non-free".

read_old

Total reading scaled score in 2nd grade.

read

Total reading scaled score in 3rd grade.

math_old

Total math scaled score in 2nd grade.

math

Total math scaled score in 3rd grade.

Details

Project STAR is a large-scale experiment in Tennessee (1980s) studying the effect of class size on student test performance. The original dataset tracked over 7,000 students across 79 schools from kindergarten to third grade, in which they were randomly assigned into one of three interventions: small class (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher's aide). The test score data analyzed in this chapter are the sum of the scores on the math and reading portion of the Stanford Achievement Test.

This processed version focuses on student performance in third grade, ensuring a hierarchical structure where students are nested within schools. It includes a subset of key variables related to student demographics, prior-year (2nd grade) and current-year (3rd grade) test scores, class assignment, teacher qualifications, and school-level identifiers. All students in this dataset have been controlled as attending the same school in both 2nd and 3rd grades. This dataset is structured to facilitate mixed-effects modeling, making it well-suited for evaluating school effects and treatment impacts.

References

Stock, J.H. and Watson, M.W. (2007). Introduction to Econometrics, 2nd ed. Boston: Addison Wesley.

Data sourced from the AER package.