Deming and Peters, and teacher evaluations

October 12, 2013

Before I was a teacher, I led a tough band of people at the Department of Education, and I plied corporate America (among other jobs).  I spent a couple of years in American Airlines‘s corporate change project, facilitating leadership courses for more than 10,000 leaders in the company, as one of a team of about 20 inside consultants.  I had a fine time in management consulting with Ernst & Young LLP (now EY).

W. Edwards Deming

W. Edwards Deming, Wikipedia image

Back then “quality” was a watchword.  Tom Peters’s and Robert H. Waterman, Jr.‘s book, In Search of Excellence, showed up in everybody’s briefcase.  If your company wasn’t working with Phillip Crosby (Quality is Free), you were working with Joseph Juran, or the master himself, W. Edwards Deming.  If your business was highly technical, you learned more mathematics and statistics  that you’d hoped never to have to use so you could understand what Six Sigma meant, and figure out how to get there.

Joseph Juran. Another exemplar of the mode of leadership that takes lawyers out of law, putting them to good work in fields not thought to be related.

Joseph Juran. Another exemplar of the mode of leadership that takes lawyers out of law, putting them to good work in fields not thought to be related.

For a few organizations, those were heady times.  Management and leadership research of the previous 50 years seemed finally to have valid applications that gave hope for a sea change in leadership in corporations and other organizations.  In graduate school I’d been fascinated and encouraged by the work of Chris Argyris and Douglas McGregor.  “Theory X and Theory Y” came alive for me (I’m much more a Theory Y person).

Deming’s 14 Points could be a harsh checklist, harsh master to march to, but with the promise of great results down the line.

A lot of the work to get high quality, high performance organizations depended on recruiting the best work from each individual.  Doing that — that is, leading people instead of bossing them around — was and is one of the toughest corners to turn.  Tough management isn’t always intuitive.

For the salient example here, Deming’s tough statistical work panics workers who think they will be held accountable for minor errors not their doing.  In a traditional organization, errors get people fired.

Deming’s frequent point was that errors are not the worker’s doing, but instead are caused by managers, or by managerial failure to support the worker in getting quality work.  In any case, Deming comes down hard against firing people to try to get quality.  One of his 14 points is, “Drive out fear.”  In his seminars and speeches, that point was explained with, among other things, a drive to do away with annual performance reviews (wow, did that cause angst and cognitive dissonance at Ernst & Young!).  Performance reviews rarely touch on what a person needs to do to create quality, and generally the review session becomes a nit-picking exercise that leaves ratees angry, and less capable and willing to do quality work.  So Deming was against them as usually practiced.

Fast forward to today.

American schools are under fire — much of that fire unjustified, but that’s just one problem to be solved.  Evaluations of teachers is a big deal because many people believe that they can fire their way to good schools.  ‘Just fire the bad teachers, and the good ones will pull things out.’

Yes, that’s muddled thinking, and contrary to the requirements of the No Child Left Behind Act, there is no research to support the general idea, let alone specific applications.

Education leaders are trained in pedagogy, and not in management skills, most often — especially not in people leadership skills.  Teacher evaluations?  Oh, good lord, are they terrible.

Business adviser and healer, Tom Peters (from his website, photo by Allison Shirreffs)

Business adviser and healer, Tom Peters (from his website, photo by Allison Shirreffs)

In some search or other today I skimmed over to Tom Peters’s blog — and found this short essay, below.  Every school principal in America should take the three minutes required to read it — it will be a solid investment.

dispatches from the new world of work

Deming & Me

W. Edwards Deming, the quality guru-of-gurus, called the standard evaluation process the worst of management de-motivators. I don’t disagree. For some reason or other, I launched several tweets on the subject a couple of days ago. Here are a few of them:

  • Do football coaches or theater directors use a standard evaluation form to assess their players/actors? Stupid question, eh?
  • Does the CEO use a standard evaluation form for her VPs? If not, then why use one for front line employees?
  • Evaluating someone is a conversation/several conversations/a dialogue/ongoing, not filling out a form once every 6 months or year.
  • If you (boss/leader) are not exhausted after an evaluation conversation, then it wasn’t a serious conversation.
  • I am not keen on formal high-potential employee I.D. programs. As manager, I will treat all team members as potential “high potentials.”
  • Each of my eight “direct reports” has an utterly unique professional trajectory. How could a standardized evaluation form serve any useful purpose?
  • Standardized evaluation forms are as stupid for assessing the 10 baristas at a Starbucks shop as for assessing Starbucks’ 10 senior vice presidents.
  • Evaluation: No problem with a shared checklist to guide part of the conversation. But the “off list” discussion will by far be the most important element.
  • How do you “identify” “high potentials”? You don’t! They identify themselves—that’s the whole point.
  • “High potentials” will take care of themselves. The great productivity “secret” is improving the performance of the 60% in the middle of the distribution.

Tom Peters posted this on 10/09/13.

I doubt that any teacher in a public elementary or secondary school will recognize teacher evaluations in that piece.

And that, my friends, is just the tip of the problem iceberg.

An enormous chasm separates our school managers in this nation from good management theory, training and practice.  Walk into almost any meeting of school administrators, talk about Deming, Juran, Crosby, and you’re introducing a new topic (not oddly, Stephen Covey’s book, 7 Habits of Highly Effective People, sits on the shelf of many principals — probably unread, but certainly unpracticed).

Texas works to make one standardized evaluation form for every teacher in every grade, in every subject, in every school.  Do you see anything in Peters’s advice to recommend that?  In many systems, teachers may choose whether evaluators will make surprise visits to the classroom, or only scheduled visits.  In either case, visits are limited, generally fewer than a dozen visits get made to a teacher’s classroom in a year.  The forms get filled out every three months, or six weeks.  Take each of Tom’s aphorisms, it will be contrary to the way teacher evaluations usually run.

Principals, superintendents, you don’t have to take this as gospel.  It’s only great advice from a guy who charges tens of thousands of dollars to the greatest corporate leaders in the world, to tell them the same thing.

It’s not like you want to create a high-performing organization in your school, is it?

More:


Holding teachers accountable, in reality

June 5, 2013

Scott McLeod at Dangerously!Irrelevant put together a video, with computer voices to protect the innocent naive genuinely ignorant and proudly stupid.

Teachers who watch this may cry as they watch America’s future slip away into the Tide of Mediocrity™ we were warned about, which NCLB mistook for high water.  Turn it up so you can hear the full sound effects.  That’s the level of mediocrity rising as the “official” fiddles.

W. Edwards Deming researched and wrote a lot about organization managers who don’t really have a clue what is going on in their organizations, and who lack tools to measure employee work, because they lack an understanding of just what products are, what the resources are that are required to make the desirable product, and how to processes that make those products work, or could work better.

That’s education, today.

Should teachers be “held accountable?”  Depends.  Effective organizations understand that accountability is the flip-side of the coin of authority.  Anyone accountable must have the authority to change the things that affect product, for which that person is “held accountable.”  Texas schools lose up to 45 days a year to testing — that may drop as the TAKS test is phased out, but it won’t drop enough.  45 days is, effectively 25% of the school year.  If time-on-task is important to education as Checker Finn used to badger us at the Department of Education, then testing is sucking valuable resources from education, way above and beyond any benefits testing may offer.

Today, Texas Governor Rick Perry has proposed laws sitting on his desk that would greatly pare back unnecessary testing.  A coalition of businessmen (no women I can discern) with a deceptively-named organization urges Perry to veto the bills, because, they claim, rigor in education can only be demonstrated by a tsunami of tests.

What’s that, you ask?  Where is the person concerned about the student?  She’s the woman with the leaky classroom, who is being shown the door.

Why is it those with authority to change things for the better in Texas schools, and many other school systems throughout the U.S., are not being held accountable? If they won’t use their authority to make things better, why not give that authority to the teachers?

Check out McLeod’s blog — good comments on his video there.

More:

Fitzsimmons in the Arizona Daily Star

Fitzsimmons in the Arizona Daily Star


Test “priming”: Malcolm Gladwell on how to push test results, and why tests might not work

July 31, 2012

Who is the interviewer, Allan Gregg?

From the YouTube site:

Malcolm Gladwell in an interview about Blink explains priming, and re-states some of the examples of priming from Blink with CC (closed captions)

Here’s a longer excerpt of the interview; from TVO (TVOntario)?

Discussion:  Gladwell appears to confirm, for testing results, the old aphorism attributed to Henry Ford:  “If you think you can, or if you think you can’t, you’re right.”  Gladwell seems to be saying that the student’s view of his or her abilities at the moment the test starts rules in a significant way how the student performs — worse, for teachers, it’s the student’s unconscious view of his or her abilities.  As a final shot in class, I have often had students predict their performance on state tests.   I have them write what they think they will scores.  Then I ask them to predict what they would have scored, had they applied themselves seriously to study of history — and of course, almost always the students have a fit of honesty and predict their scores would have been higher.  Then I ask them to pretend they had studied, and cross out the lower predicted score and replace it with the higher predicted score.  At the schools where I’ve taught, we do not administer the tests to our own students, and such exercises are prohibited on the day of the test.  Too bad, you think?

Another exercise I’ve found useful for boosting scores is to give the students one class period, just over an hour, to take the entire day-long TAKS social studies test, in the on-line version offered by the Texas Education Agency.  Originally I wanted students to get scared about what they didn’t know, and to get attuned to the questions they had no clue about so they’d pick it up in class.  What I discovered was that, in an hour, clearly with the pressure off (we weren’t taking it all that seriously, after all, allowing just an hour), students perform better than they expected.  So I ask them to pass a judgment on how difficult the test is, and what they should be scoring — almost unanimously they say they find the test not too difficult on the whole, and definitely conquerable by them.

What else could we do with students, if we knew how to prime them for tests, or for writing papers, or for any other piece of performance on which they would be graded?

With one exception, my administrators in Dallas ISD have been wholly inuninterested in such ideas, and such results — there is no checkbox on the teacher evaluation form for using online learning tools to advance test scores, and administrators do not regard that as teaching.  The one exception was Dorothy Gomez, our principal for two years, who had what I regarded as a bad habit of getting on the intercom almost every morning to cheer on students for learning what they would be tested on.  My post-test surveys of students showed those pep talks had been taken to heart, and we got much better performance out of lower-performing groups and entire classes during Gomez’s tenure (she has since left the district).

Also, if psychological tricks can significantly affect test scores, surely that invalidates the idea that we can use any test score to evaluate teacher effectiveness, unless immediate testing results is all we want teachers to achieve.  Gladwell said in this clip:

To me that completely undermines this notion, this naive notion that many educators have that you can reduce someone’s intelligence to a score on a test.  You can’t.

More:


Teacher ratings can’t tell good teachers from bad ones – back to the drawing board?

March 4, 2012

Corporate and business people who have lived through serious quality improvement programs, especially those based on hard statistical analysis of procedures and products in a manufacturing plant, know the great truths drilled by such high-quality statistical gurus as W. Edwards DemingThe fault, dear Brutus, is not in the teacher, but in the processes generally beyond the teacher’s control.

Here’s the shortest video I could find on Deming’s 14 Points for Management — see especially point #14, about eliminating annual “performance reviews,” because as Dr. Deming frequently demonstrated, the problems that prevent outstanding success are problems of the system, and are beyond the control of the frontline employees (teachers, in this case).  I offer this here only for the record, since it’s a rather dull presentation.  I find, however, especially among education administrators, that these well-established methods for creating champion performance in an organization are foreign to most Americans.  Santayana’s Ghost is constatly amazed at what we refuse to learn.

Wise words from the saviors of business did not give even a moment’s pause to those who think that we can improve education if we could only get out those conniving, bad teachers, who block our children’s learning.  Since the early Bush administration and the passage of the nefarious, so-called No Child Left Behind Act, politicians pushed for new measures to catch teachers “failing,” and so to thin the ranks of teachers.  Bill Gates, the great philanthropist, put millions of dollars in to projects in Washington, D.C., Dallas, and other districts, to come up with a way to statistically measure who are the good teachers, the ones who “add value” to a kid’s education year over year.

It was a massive experiment, running in fits and spurts for more than a decade. We have the details from two of America’s most vaunted and haunted school districts, Washington, D.C., and New York City, plus Los Angeles and other sites, in projects funded by Bill Gates and others, and we can pass judgment on the value of the idea of identifying the bad apple teachers to get rid of them to improve education.

As an experiment, It failed.  After measuring teachers eight ways from Sunday for more than a decade, W. Edwards Deming was proved correct:  Management cannot identify the bad actors from the good ones.

Most of the time the bad teachers this year were good teachers last year, and vice versa, according to the measures used.

Firing the bad ones from this years only means next year’s good teachers are gone from the scene.

Data have been published in a few places, generally over complaints of teachers who don’t want to get labeled as “failures” when they know better.  Curiously, some of the promoters of the scheme also came out against publication.

A statistician could tell why.  When graphed, the points of data do not reveal good teachers who constantly add value to their students year after year, nor do the data put the limelight on bad teachers who fail to achieve goals year after year.  Instead, they reveal that what we think is a good teacher this year on the basis of test scores, may well have been a bad teacher on the same measures last year.  Worse, many of the “bad teachers” from previous had scores that rocketed up.  But the data don’t show any great consistency beyond chance.

So the post over at the blog of G. F. Brandenburg really caught my eye.  His calculations, graphed, show that these performance evaluations systems themselves do not perform as expected:  Here it is, “Now I understand why Bill Gates didn’t want the value-added data made public“:

It all makes sense now.

At first I was a bit surprised that Bill Gates and Michelle Rhee were opposed to publicizing the value-added data from New York City, Los Angeles, and other cities.

Could they be experiencing twinges of a bad conscience?

No way.

That’s not it. Nor do these educational Deformers think that value-added mysticism is nonsense. They think it’s wonderful and that teachers’ ability to retain their jobs and earn bonuses or warnings should largely depend on it.

The problem, for them, is that they don’t want the public to see for themselves that it’s a complete and utter crock. Nor to see the little man behind the curtain.

I present evidence of the fallacy of depending on “value-added” measurements in yet another graph — this time using what NYCPS says is the actual value-added scores of all of the many thousands of elementary school teachers for whom they have such value-added scores in the school years that ended in 2006 and in 2007.

I was afraid that by using the percentile ranks as I did in my previous post, I might have exaggerated or distorted how bad “value added” really was.

No worries, mate – it’s even more embarrassing for the educational deformers this way.

In any introductory statistics course, you learn that a graph like the one below is a textbook case of “no correlation”. I had Excel draw a line of best fit anyway, and calculate an r-squared correlation coefficient. Its value? 0.057 — once again, just about as close to zero correlation as you are ever going to find in the real world.

In plain English, what that means is that there is essentially no such thing as a teacher who is consistently wonderful (or awful) on this extremely complicated measurement scheme. How teacher X does one year in “value-added” in no way allows anybody to predict how teacher X will do the next year. They could do much worse, they could do much better, they could do about the same.

Even I find this to be an amazing revelation. What about you?

And to think that I’m not making any of this up. (unlike Michelle Rhee, who loves to invent statistics and “facts”.)

You should also see his earlier posts, “Gary Rubenstein is right, no correlation on value-added scores in New York city,” and “Gary Rubenstein demonstrates that the NYC ‘value-added’ measurements are insane.”

In summary, many of our largest school systems have spent millions of dollars for a tool to help them find the “bad teachers” to fire, and the tools not only do not work, but may lead to the firing of good teachers, cutting off the legs of the campaign to get better education.

It’s a scandal, really, or an unrolling series of scandals.  Just try to find someone reporting it that way.  Is anyone?

More, Resources: