550,000 students 391 school districts -Starting in 2003, started collecting student level data from all district. DESE At that time 32 elements about -students. assessment unit was collecting data on MCAS test booklets, including a student ID Bob Lee started merging -the test scores with enrollment data together using student IDs into a flat file for accountability measures and then -later even bigger flat files to keep a record on graduating students between 2003 and 2005 developed systems to -automatically merge and create data based on a series of business rules All the data is stored in an oracle system, -which is tapped into in various ways (this was Generation 1) data curation done by statisticians with help from data -entry people Now in Phase II Growth Model: Cognos Centric Data Warehouse, cubes and cognos; built as an enterprise education data warehouse; works well in medium sized school district (size???), but not as good for state-wide - hence they are re-engineering the whole thing. -in the datawarehouse, data collections that happen periodically are combined (student information is collected 3 times/year, educator info once/year, plan in place to collect it 3 times/year), teacher-assignment to class data, at the cusp of collecting student assignments to classes - could do a join between the two; also, every response to every question by every student on every MCAS they ever took (not just scaled score and achievement level). This info is coded against standards covered by tests. Until now, most of the analysis has been at the school and grade level. With the new data, class level analyses can be made. In the past, everything was done through the lens of achievement - scaled scores, levels etc., and how student compared to the rest of the school and to other population subgroups, and from school to school. **Growth Model**** The growth model is a student growth percentile model; a regression model to create a cohort of like performing students - how does a student grow from one year's test to the next year's compared to her peer group - grow at the same rate of their peers, or faster or slower rate. The peers are selected from statewide, not confined to schools. baseline is 3rd grade. Every student has a unique ID. the cohort is floating. the regressions are done after every years test. the cohort is based on past 2 years and current year performance - goes beyond raw scores and levels. **Performance is a combination of high achievement and growth*** Normative It is a per student measure that is aggregated over a school. Schools get labels from DESE on test materials for each student. Label contains unique ID - makes life easier for schools. ****Eventually all this will be online - but right now different schools are at different levels of readiness. right now their database ends in 12th grade **** wants to create a P20 database?? with **Sharon wright*** - early childhood to higher and eventually to employment and wage data Florida (has one executive in charge of all execution and training, who has right to data across the spectrum) has done this already, but there are a series of privacy concerns because education in MA is organized differently. DESE only has access to K-12 data because of the interpretation of FERPA. Data analysis: -first, Bob's teams ensures that data is correct; so that they can reliably associate student against test results -once data is clean, just match current data on SASID (state assigned student ID) with past data -then generate reports for districts/schools. From an institutional perspective, three groups look at the data - the MCAS group (headed by Bob), data analysis group that supports all depts., a third group that focus on underperforming (turnaround) schools. At the school and district level, some of them have analysts some of them don't. 391 districts - most are single school districts. Other than a few exception, it is novices trying to make sense of the data at the district and school levels. The DESE provides training and tools to help them out. Tutorials on the web on the growth model; 6 trainings developed so far, 10 more to come. The report that the parents get has the student growth percentile, the school's, the district's; the state is always 50 (normative!!) The test is only sampling - it is not testing all the standards. Teachers get item analyses reports. It is difficult to get past descriptvie statistics. Conclusions may be made and the data is powerful, but one has to be really careful in drawing the conclusions. Data Collection: has to collect data periodically. for two reasons: one to distribute money - need to know how many students there are, two to report to the students, Now moviing on to data to guide policy, decisions, for continuous improvements. sometimes data is only good for a post-mortem rather than something that is actionable at the time ??make it more realtime?? Plan to go into a new standards driven protocol, SIF - schools interoperability framework. Many vendors are moving to this system for student information and HR systems. DESE has gotten money from different grants, race to the top among them, for adopting SIF. realtime information updates via zone servers use this infrastructure for formative assessement (MCAS is summative) - test that are score online and returned to the teacher within a day so that she can act on it. information entered at district level - student profile, teacher profile, student-teacher assignment to class; plans for a teacher ID, as soon as a teacher takes a teaching test/preparation program so that the effectiveness of programs may be evaluated empirically. Student information: 52 elements at the student level now; in addition course data is added, educator info: demographics, educational history, separate licensure database - 400,000 people, plans to link it to the main data warehouse. Problem of multiplication of records: 12 mil records in ephimous?? collection; but only about 75,000 educators because each data collection transaction exist on its own and isn't always appended to the educator info already on file... The 52 student profile elements, over the last 10 years, some added, some deleted. Because local districts procure their own systems from disparate vendors, the DESE has to give them two or three years notice to modify their data structures (solution: owl:sameAS) The data is organized in very inefficient ways --> it takes forever to run reports. "they make a cup of tea, maybe they should go out and have dinner, maybe they should come back the next day, maybe they give up", while they wait for the reports. RDF for individual education plans, student pictures, rules for teachers to get certified across different states. Two keys: how does RDF make data structure modification easier? how to bring information together from disparate sources for integration? risk of widows and orphans when changing the 52 elements. Once students move to higher ed, DESE won't have access to the student data. Not easy to identify causes - the cause could be something that is not a characteristic amassed by the DESE.