White Papers

White papers


Observation Checklists vs. Observation Data


By Dr. John L. Tenny, developer of the Data-Based Observation Method and the ClassGather Classroom Observation Software, January, 2010

I have looked at a couple dozen books on how to conduct classroom/teacher observations and have downloaded another 30 or so observation forms used by various districts across the country. I am struck by the general nature of most of them, and by the lack of specifics in the descriptors. The rating scales used run from observed/not observed to met/not met to a 5 to 7 likert scale. Some of the forms used by districts are designed to record anecdotal notes without a rating scale and/or with the evaluation of the performance on another summary page.

An example: This ‘standard’ came from a school district set of guidelines for conducting observations: “Establishes and maintains an orderly and supportive environment for students”, and is, in one form or another, common in standards.

I just cannot see how a checklist or scale is helpful, let alone accurate, as a record of what happened in the class. If the observer checks ‘observed’, does that mean the class was at one point orderly and supportive? Alternatively, does it mean that when things started to get disorderly, the teacher responded and brought the class back in focus? On the other hand, since the phrase ‘supportive environment’ is also included, does that ‘observed’ indicate that the teacher made positive, encouraging statements? To all the students? Could the observer be satisfied with student work being posted on the walls, and a ‘student of the week’ bulletin board being present — that is certainly supportive?

In addition, if the class was orderly some of the time and not others, and the observer checks ‘not observed’, would not the teacher respond with “Are you saying my class was never orderly Or that I was never supportive? Or both?”

Observed/ not observed does not work. It does not convey any helpful information and will lead to conflict between the observer and observee.

So what about a scale? Scales are typically designed to be either a 1 to 5 (poor to great) or a rubric of ‘unsatisfactory, basic, emerging, competent, distinguished’ type. Some of them will have descriptors for each of the levels, the worst being the range from ‘did not observe/ observed some of the time/ observed most of the time/ observed all of the time’. These descriptors are worthless as they are nearly impossible to mark in a way that conveys what happened. For example, if the class were orderly for the first 3 minutes and then in chaos the rest of the time, the orderly standard would actually be met ‘some of the time’. Actually, if the class were orderly up to 49% of the time, the same checkbox would apply; and if things were orderly 51% to 99% of the time, it would be ‘most of the time’.

There are other descriptors or indicators for each of the levels that seem more specific. For example, Charlotte Danielson in her Framework for Teaching has a standard for Management of Instructional Groups with a proficient level of competence described as “Tasks for group work are organized, and groups are managed so most students are engaged at all times.” In the many districts that use some variation of Danielson’s work as their standards, the observer would be asked to judge if the teacher was at this level during the current observation.

Skip the fact that there are two behaviors in this indicator – organizing tasks and managing groups – which also confuses the issue, and just look at the act of making that determination of worth based on the second behavior. When you watch a typical classroom, the complexity in deciding if ‘most’ (is that 51% or is it really a higher target than that?) ‘are engaged’ (physically or mentally? engaged in low level or high level work?) ‘at all times’ (what would be the determination if the full class went off task for 3 minutes?), has such wide variation across classrooms and observers that the validity is suspect.

In discussions with administrators, what I find very often happens is that the observer adds additional criteria to the specific situation. ‘Most students’ sometimes turns into almost everyone in the class (a higher standard that stated); ‘engaged’ equates to looking and acting busy without regard for the level or quality of engagement; and ‘at all times’ is ignored in lieu of an unspoken criteria of ‘most of the time’. If the class is known to have kids with behavior problems or the number of students is high, the criteria is functionally lowered.

What is really happening is that the observer has internally defined a level that is satisfactory to him or her based on personal experiences, the makeup of the class, and the relationship with the teacher, and that definition is applied unevenly across classrooms. That inconsistency is confusing to everyone, and the results of classroom observations cannot be compiled across the building, let alone the district, as a basis for broad decisions. The system has a built in subjectivity and personal interpretation of the standards that makes it difficult for any observer to be consistent and fair.
Rating scales do not work, especially when extremely little effort is put into rater reliability and clear statement of the objectives and indicators.

Data based observations can make a significant difference. Some of the texts on classroom observations provide steps on how to turn a judgment on a rating scale into numbers and then process those numbers as if they were data — but given all the issues with rating scales I believe this to be a false path.

Instead, I recommend using the Data-Based Observation Method, a 5-step process that includes the actual collection of observable behavior data. The steps are:

1. Identify the standards. Be sure that they are worded so that observable behaviors demonstrating those standards can be clearly identified.
Good standard: Students will be engaged in learning activities. Bad standard: Teachers will act in an ethical and professional manner at all times (what does ‘ethical’ look like?).

2. Create indicators. Be sure that they describe the observable behavior identified in the standard.
Good indicator: Students will listen attentively to the teacher, be productively engaged in individual work, or contribute to the work of a small group. Bad indicator: Students will follow teacher instructions as given (too vague and general).

3. Set criteria. As a profession, we have not engaged in setting criteria for ourselves in concrete terms, so this part is a new conversation.

What are the criteria for engaging students in learning? Should they be engaged 25% of the time? No, that is clearly too low. How about 50% of the time — still sounds low. What about 95% of the time? Too high for real classes? The answer here is not to set the criteria arbitrarily, but to turn to (or conduct) research to establish criteria in which we can have confidence.

If the standard is an important one, and the indicators are valid, there will be a correlation between the behavior observed (such as student engagement) and the final desired outcome (student learning). We need to find those connections and use them as guides for improving teaching. We actually have research that identifies a significant number of them, but we are not applying that research at the classroom level.

4. Design data collection tools. I developed the ClassGather Classroom Observation Software as an easy and efficient way to collect the objective data, but you can use pencil and paper, a stopwatch, the wall clock in the classroom, etc. to collect the data once you have carefully identified what data is important to collect.
Good tool: A counter tracking on/off task behavior and using the time sample data collection method to record the percent of time engaged and the percent of time not engaged for the entire class. By using the time sample data collection approach and repeated sweeps of the class to record the on/off task behavior of each student, a quite accurate data-based, objective picture of the class behavior is produced. This becomes a factual basis for making decisions. Useful tools include Class Learning Time, Level of Questions, Teacher Talk/Student Talk, and other tools reflecting research on best teaching practices.

5. Analyze and interpret the data. Did it meet the criteria? Is there a need or desire for a change?
Given the context (number and diversity of the students, physical space, materials at hand, etc) what is most likely to bring a positive change? When and where will the new approach be initiated? When will the next set of data be collected, analyzed, and interpreted?

For the greatest success, it is critical to operate with the belief that every vested interest be involved in this process. Administrators, teachers, parents, aides, students, counselors, etc. all have an important contribution to make where the purpose of the observation is the improvement of teaching and learning.


I am coming to realize what a big shift this is in the education field. We have tried to cite ‘professional judgment’ when the inconsistency in the process and the unreliability of the results support neither the process nor the conclusions. Serious collaborative discussions are needed to move to a more concrete basis for judging what we say we value, and how to use the specifics to guide the improvement of teaching.


Teacher Pay and Test Scores


By Dr. John L. Tenny, developer of the Data-Based Observation Method and the ClassGather Classroom Observation Software, January 2010

Awhile ago I read an article in Ed Week about the Houston and Denver districts’ efforts in teacher pay for performance. Both programs are broad implementations of the pay-for-performance system and are struggling with enrollment and acceptance. The most interesting quotes were by Gayle Fallon, President of the Houston Federation of Teachers. Both quotes, “It’s better than last year. Still, they are handing out money and getting nothing in return.” and “What we hear from teachers consistently is that they have no clue what they did to get the money”, point to the black-box nature of using student test scores as a primary determiner in awarding pay or other rewards. While student learning is the primary goal, the connection between the teacher’s direct influence on student scores (as they indicate learning) is very difficult to determine. I can understand the teachers not ‘having a clue’ when the results of their efforts (the test scores) are calculated and revealed sometime in the future and those results also include the influence of a large number of other variables.

A medium sized school district in Oregon has recently received a large grant from the Chalkboard Foundation to improve student learning. Part of their efforts include a bonus pay system based on a teacher portfolio of evidence, which can include student scores as well as other strong evidence of exemplary teaching and professional conduct. They contacted me to discuss the use of data-based observation data on best practices as a part of that process.

There is credible research about teaching practices that result in increased student learning. ClassGather Software will that will track the implementation of those practices in an individual classroom. Now we have the opportunity to reward teachers who are implementing those researched best practices. The process is not difficult to manage – identify the behaviors that everyone is confident in as directly influencing student learning (Class Learning Time, Time on Task, Wait Time, Level of Questions {as answered by students, not just asked by the teacher}, etc, etc), train observers (teachers, aides, paid data gatherers, administrators) to competently use the data collection tools, and determine the appropriate data collection procedures (number of data points, length of individual data collection events, etc).

The result of this, I predict, will be interesting and engaging. Not only will teachers know immediately that they are using the researched best practices in their classroom, but they will have a running record of that. It is that running record that is the greatest benefit – it can provide feedback in a timely and useful manner to the teacher who has a goal of becoming an exemplary teacher. They can immediately see if they are moving toward a higher level of proficiency instead of waiting for months to find out if they ‘won’. As I have said before, teachers are deeply dedicated to effectively teaching their students in the best manner possible. Bringing the objective feedback to the classroom level in real time will build more effective teachers; then we’ll know why those scores went up as well have the ‘clues’ we need.

As a side note, I’m a bit concerned that the student performance/more money is a strong extrinsic motivator and will shift the focus on why one becomes/continues to be a teacher. I think the immediate, objective, and over time feedback that ClassGather provides will not only reinforce the skills of teaching but will also reinforce the teacher’s perception of their skills. Since we love doing what we do well, the data and teacher reflection become the intrinsic motivator. As teachers become/continue to be successful in their craft, and are clearly aware of their successes, they will keeping doing what they love – helping kids.


Developing Self-Directed Professional Growth


By Dr. John L. Tenny, developer of the Data-Based Observation Method and the ClassGather Classroom Observation Software November, 2009

Quality staff development efforts are directed at improving teaching and learning in ways that will result in long-term change. The topics included in staff development come from national, state, district, and building standards; from benchmark test scores; from school board and superintendent directives; from research and scholarly journals; and from the teachers themselves. Programs include training on goal setting, action research, specific curriculum or behavior techniques, communication approaches, brain research, child psychology, and a nearly unlimited list of other topics; all of which have value and work to some degree.

However, there is a persistent level of frustration among staff developers around the resistance to new ideas, the difficulties in getting teacher buy-in and implementation, and the limited impact of staff development efforts. It is the premise of this paper that a more productive perspective is to focus on a more fundamental skill needed by professional educators — the skill of reflection.

Every educator has been involved in ‘reflection’ exercises, from college assignments to responding to the annual evaluation. While one would think that teachers are therefore skilled at reflecting on their teaching and their students’ behavior and learning, there has been a significant missing link. The focus of reflections has, to date, been nearly always on something either abstract, outside the control or influence of the teacher, or in response to a judgment or opinion of someone else.

For example, reflecting on the drop in 5th grade reading scores can only be done as an abstract exercise as the exact causes are not known. The number of variables affecting a change in scores is extensive, and the teacher does not have the data/information needed to ‘reflect’, let alone explain or develop an effective plan of action. Similarly, asking a teacher to reflect on observation reports with a list of met/not met or observed/not observed items, or worse yet, a low ranking on a likert scale, will nearly always result in a defensive or deflective response, accompanied frequently by anger, resentment, and hostility. A judgment of one’s worth is always subject to suspicion of bias, and challenging that judgment is nearly always an adversarial exercise.

If reflecting, under current practices, is so difficult to accomplish in a meaningful way, how can staff development efforts increase that foundational skill? The answer lies in conducting objective, data-based observations on the behaviors of teacher and students. By providing the data to the teacher without judgment, praise, or criticism and asking a simple question, “Is this what you thought was happening in your classroom?” This will result in the beginning of reflection on the activities within the classroom, the teacher’s plans and goals, and other variables that would have an impact on the data. When a teacher is presented with the actual duration and/or frequency data rather than the observer’s subjective evaluation there is a shift in the dynamic from defense and deflection to an empowering professional engagement with the results of the observation.

Most often when a person is engaged in teaching and classroom management, it’s not possible for them to see clearly the interaction between the lesson delivery, classroom materials, and student behaviors. When an observer gathers data on focused behaviors such as level of questions, teacher talk time, teacher response to misbehavior, etc, the teacher can become engaged in a non-defensive manner and move from pleasing the observer to objectively devising and testing research based approaches to classroom activities.

Until recently, the process of gathering this type of data has been daunting, and involved pencil, paper, stopwatches followed by time doing the calculations. An innovative program, ClassGather Software, has eliminated nearly all of the time consuming mechanics and has enhanced both the data collection and reporting process. ClassGather includes 40 specific data collection tools, and runs on both Macintosh and Windows computers. An observer can easily gather data by operating the timer and counter floating tools, and produce reports on both individual observations and/or behavior over time. Tools for tracking additional behaviors can be collaboratively developed using the tool creation templates.

When the data is shared in a non-judgmental manner, the teacher has a sound basis for reflection. Working through determining the meaning of the data and the possible need for change is an invigorating professional discussion. Following this process with tracking the implementation of a plan of action and the outcome in student behaviors through additional data collection further enriches the reflection process. The result is self-directed professional growth by the teacher and a collaborative relationship between the observer and teacher.


Plan of Assistance and Data-Based Observations


By Dr. John L. Tenny, developer of the Data-Based Observation Method and the ClassGather Classroom Observation Software, January, 2010

While at the National Association of Elementary School Principals’ convention, I spent some time talking to Steve C., a retired principal who now contracts with districts to work with teachers on a plan of assistance. The teachers he works with are typically having some serious difficulties and are at the intervention stage of help. He has worked with teachers for many years, and talked about how using ClassGather Software and data-based observations has changed the entire playing field.

In many/most instances across the country, the plan of assistance is a combined process of making what is expected of the teacher very clear and specific, while gathering anecdotal notes to support a decision to fire/non-renew the teacher. The teacher is in a last ditch effort to demonstrate a satisfactory level of teaching or classroom management with the only thing different being the administrator in the classroom taking notes. This must rank among the highest stressful conditions under which anyone could ever work.

When I think about these teachers, I put them into three categories: a teacher who is in a truly overwhelming situation with out of control students, lack of materials, lack of support, etc. This is the kind of environment where all but the very, very best would struggle and fail. Putting the teacher on a plan of assistance that is focused on the teacher changing is unfair, but commonly done. Data-based observations in this case can be useful if data is gathered not only on teacher behavior but also on student behavior, outside interruptions, and other systemic influences with an honest effort to determine what is going awry and what are the causes of the problems. It may be that the teacher does need additional skills — along with a change in the classroom conditions that are not within the power of the teacher to enact. The data will help determine the difference.

The second category is the teacher who is having difficulty with a reasonably normal class of students, but cannot get a handle on what to do about it. These are often new teachers or teachers inexperienced with the particular group of students. They generally are making consistent, but ineffectual efforts to do a good job and are frequently a contributing factor in the non-productive classroom. This potentially good teacher has been buried under the problems, and is at the burnout, give-up, and quit stage. However, these are teachers that are worth the effort to support and data-based observations can really help.

By providing non-judgmental, objective data on the teacher’s actions and the student’s responses to those actions, the teacher, with guidance and support, can determine the cause and effects related to the issues, and design changes. The effectiveness of those changes can be tracked and the data will determine the value of the outcomes. Providing the data in a supportive atmosphere can empower the teacher to see ways to make changes and determine the efficacy of those changes. At the same time that the specific issues are being attended to, the teacher is building a life-long skill of reflection and growth.

The third group are those teachers that, sad to say, do not have the skills, knowledge, or capacity to change their behaviors. It can be a lack of interest, a lack of effort, or a basic lack of the personality and skills of a teacher. In any case, these people, nice though they maybe, should not be functioning as teachers, and need to be counseled and/or forced out of the classroom.

The very best outcome of this plan of assistance is for the teacher to take a careful, reflective look at their own skills and values, and concluded that they are better suited for another career path. However, anyone approached by an evaluator who has the power to fire him or her, and who is in the room taking notes and making judgments, will bring up the stress level and engage every defense mechanism available. That can be shifting the blame, bringing in an attorney, building support among the staff and community regarding this ‘unfair’ treatment, etc — in the end, this is a lose-lose situation for everyone, including the kids.

By approaching the issue through data-based observations, some significant things change. An objective picture of the actions in the classroom is presented to the teacher, which should be coupled with a clear description of the requirements for satisfactory performance. Subsequent efforts on the part of the teacher, if truly inept, will not show significant changes, and this non-judgmental picture is often the key to a person’s levelheaded decision to change professions – that best of all worlds’ decision.

Nevertheless, there will be conditions where the individual will continue to deflect responsibility that is appropriately theirs, and in spite of no data to show satisfactory performance, will continue to resist leaving the classroom. I feel for these folks, as it is often a move that will cause embarrassment and economic hardships — but for the sake of the children, it is necessary to remove this person from the classroom. If the process has included clear statements of expectations, and an ongoing record of objective data collection of relevant behaviors showing that they do not meet the expectations, the decision to remove that person is much, much more defensible. Judgments about a person’s ability can easily be subject to bias; you will strengthen the entire structure when you add objective data collection.

Steve’s experience, of the people he has been involved with on plans of assistance, is that approximately 60% decide to change occupations or are removed by the district. However, a full forty percent turn from struggling teachers to competent professionals. And as it is with teachers – when you can step in a help a struggling individual overcome the obstacles, you get to experience the joy that’s the special gift to educators – the joy of helping another human rise to their potential.


Data-Based Observations and the Teaching Profession


By Dr. John L. Tenny, developer of the Data-Based Observation Method and the ClassGather Classroom Observation Software, January, 2010

When I first wrote the ClassGather Software I was focused on giving helpful feedback to student teachers. From years of working with student teachers and new teachers I knew that they needed help thinking through the problems that came up in their classrooms. Providing them with ‘my’ answers and ideas was of much less benefit that getting them to think through things and devise their own solutions.

I also knew, again from personal experiences working with them that giving them data (pencil and paper before ClassGather) help them honestly reflect on their own actions and outcomes, and greatly diminished the fear factor that came with the ‘evaluator’ role of a supervisor.

When I first started working with administrators and ClassGather Software I was totally focused on changing their role from judge to support and staff development. I preached hard that working collaboratively would have great effects and would/could create a staff of self-directed professionals. I still strongly believe that, and have enough feedback to feel confirmed.

However, a recent conversation with an ex-student, now an administrator, has added to my perspective. He likes ClassGather and would love to use it except that his district has a 20 page (gulp!) evaluation system that he needs to complete while observing – so he does not have the time to work with teachers. We agree that it is a waste of time, and corrupts the opportunity for collaborative professionalism.

As I thought about his situation and the hours of development time that went into the creation and adoption of that ‘evaluation guide’, I realized that my approach to observation as staff development had ignored the reality of the required and necessary role of administrator as evaluator. The guide that he’s stuck with seems to me to be the main flaw in the process, and what I believe is wrong with it (and the thousands in use across the country) is that they ask the observer to make a series of poorly defined judgments based on a vaguely defined set of ‘standards’. It is an impossible task and is functionally a terrible and ineffectual burden on both administrators and teachers.

When I thought about how a standards based system might be improved, I came to this conclusion: Standards should be based on research; the implementation of the standard should be in some way observable, if not directly then by keystone indicators; the criteria for an acceptable level of performance should be concrete and collaboratively determined. I say collaboratively since I believe administrators, teachers, parents, and the public all have value to add to the process of educating our youth. Setting those criteria levels in terms of observable behavior data should, again, be based on research, and confirmed by localized action research efforts. That is not as difficult as it sounds when the systematic process already includes data collection.

For the last couple of years, whenever I presented ClassGather I made a big point of saying that I was against set data targets for all teachers, that the context played such a big part in it all that only the teacher could interpret the data. I think now that I was wrong about that, partially at least. A simple example might be wait time – the time between a question and calling on a student for an answer. There is extensive research that shows a wait time of 3 seconds has consistent positive benefits. While I am sure it is not the exact time of 3 seconds that is critical, the researched recommendation is a useful concrete measure. If a teacher waits less than one second (the research on new teachers), the children are robbed of the opportunity to think, and that is not OK. An important facet of the process I am proposing has to do with how the data is presented and used. My experience has been that the first approach to a teacher should be “Is this what you thought was happening?” This question, honestly asked, will empower the teacher and engage him or her in the process of reflection, interpretation, and problem solving. During the ensuing professional level discussion, the criteria for the acceptable level of student engagement as a 3 second wait period should be included, and that is the measure to be used in the final evaluation. For, in the end, a judgment does have to be made, but it should not be based on the observer’s opinion or value system, but on set measurable criteria — criteria set and confirmed by sound research.

A more complex example – class learning time. The standard ‘students should be engaged in learning’ is commonly included in most standards systems. There is credible research that indicates that the more time a student is engaged in learning activities, the greater will be the learning. While the research does not propose a specific percent of learning time as a recommended criterion, I believe we as a profession can at least identify the ranges for unsatisfactory, satisfactory, and exceptional. I think we would all agree that if a class period had only 25 % of the time organized for teaching and/or student engagement in learning activities, it would be absolutely unsatisfactory. But is that number 35%? 45%? 60%? What educator would be comfortable with a class where 40% of the time lacked any opportunity for students to learn. I do not know what the right number is, but I am confident that it is possible to come to a consensus over a minimum level. Class Learning time is a good example of a keystone data set – something that underlies the basic concept in the standard ‘engaged students’.

Then my personal experience as a teacher comes into focus, and the objection “How can you evaluate me on something I don’t have full control over?” pops up. I remember my lesson plans not working out when the principal took 10 minutes with a PA announcement and there were four interruptions from people with important messages, or requests for information or students. How could it be fair to be concerned about my 50% learning time when there were all these outside influences?

That would be a valid concern where the evaluation system is based on the observer’s perception and judgment, but less so when based on data collection. It is an easy task to set up the data collection to identify the non-learning time by sub categories – time under the teacher’s control and time when an outside event took the control away from the teacher. The time under the teacher’s control should meet the criteria for acceptable performance; the total time should be examined for needed systematic changes to provide the teacher with the full allotment of teaching/learning time. Basing the inspection of school functioning on observable behavior data will reveal many possible solutions for problems currently included in the observer’s impression of teaching effectiveness.

It is reasonable to be suspicious of data collected and used as an external weapon, and for that reason I believe it to be critical that the identification of the keystone research and indicators, and the setting of the target level be a collaborative process. Add to that the realization that good research continues to give us new knowledge about teaching and learning, and with that, the process should be in a constant state of discussion and revision. That is my vision of how a profession works – critical self-examination and improvement.

So now, my thinking has come to a point where I believe (tentatively, at least) that we have sufficient research to develop standards, or to better focus the standards we do have. We can identify keystone indicators for those standards, and we can use our collective wisdom to determine concrete levels for acceptability in those keystone indicators. We can train observers to accurately observe and gather data. That data can be used to both further the teacher’s self-directed professional growth and to ensure that the levels of effective performance as indicated by sound research are met.