Ratios, Schedules -- Why and When
One of the most common questions and discussion subjects on our e-mail in the past 60 days or so has been the problem of ratios -- schedules of reinforcement, variable ratios (VR), variable schedules of reinforcement (VSR) -- vs. continuous reinforcement (CRF). This subject has also been a recurring problem, going back as long as we have been receiving and sending e-mail, and farther back than that -- even back to the "good old" days when we were training animals and educating our own trainers at Animal Behavior Enterprises. So let's see what we can make clear first, what we are all talking about, and second, what is our own (the Baileys') philosophy, practices, and advice about schedules of reinforcement.
By the way, it seems to me that most of the correspondents are using VSR and VR in an identical manner, both meaning VARIABLE RATIO. This little article is a distillation of some recent e-mail discussions. The correspondence we are referring to has to do with SCHEDULES OF REINFORCEMENT. Simply put, a schedule of reinforcement is ANY plan or system for presenting a reinforcer for a given response, according to ANY time interval, such as reinforcing a response every two minutes (creating an INTERVAL schedule), or ANY position of a response in a series - reinforcing every second response, that is, a "two-fer," expressed as FR 2:1 (FIXED RATIO of two responses for each reinforcer); every third response, a "three-fer," or a FR 3:1 ratio, every tenth (FR 10:1), hundredth (FR 100:1), and so on, response. If you VARY the interval then you have created a VARIABLE RATIO, the most commonly used in training almost any behavior, and one of the most useful.
The simplest schedule, and one that trainers should all begin with in training ANY response, is a ratio of reinforcing EVERY desired response. This is a ratio of 1 response for 1 reinforcer, 1:1, or CONTINUOUS REINFORCEMENT, abbreviated CRF (to prevent confusion with CR, which is an abbreviation for CONDITIONED RESPONSE, or CONDITIONED REFLEX).
Any schedule other than CRF calls for what we call DIFFERENTIAL REINFORCEMENT, reinforcing some responses and not others. DIFFERENTIAL REINFORCEMENT also is used in forming DISCRIMINATIONS, as in scent discriminations, where the trainer reinforces the response to the scented article and not to the others; it is also a part of SHAPING, where a trainer reinforces the responses that meet his or her criteria -- that is, the response is straight enough, fast enough, properly executed in every way -- and extinguishes the responses that do not meet the criteria.
Third, there are other schedules, those that specifically involve time. They are used less frequently, but useful in their place. One is what we call a FIXED or VARIABLE DURATION schedule, where the trainer asks for a response to hold or continue for a certain period of time -- for example, asking a dog to hold a point, or a prone position ("stay") for 30 seconds. There are also FIXED OR VARIABLE INTERVAL schedules.
We will not say much about time schedules. They can be tough to implement. The introduction of time as a variable can give the animal an opportunity to do things OTHER than what you want the dog to do, yet still respond according to specifications in time. Suppose you have asked the dog, on a FI 5min. schedule, to jump up to a spot on the wall every 5 minutes. This FI 5min. schedule means that you reinforce the first correct response, after the interval is up. Now, just think of all of the mayhem the dog can create in the five minutes! After the five minutes is up, the dog must still jump up correctly to get its reinforcement, but it might have made many other responses, all of which will gain SOME strength from the last reinforcement for the jump. Well, enough of interval schedules. They have little place in most training programs.
The list of schedules can go on and on. For instance, you can also use a schedule of DIFFERENTIAL REINFORCEMENT OF FAST behavior, abbreviated DRF, where the trainer reinforces only the responses that are rapidly executed. You would be right here in thinking these last schedules we described are the same as SHAPING SCHEDULES. Scientists have invested entire careers playing with schedules, and their effects on learning and behavior.
Let us begin by clearly stating our own philosophy about continuous vs. ratio schedules: This philosophy can be quite simply put: IF YOU DO NOT NEED A RATIO, DO NOT USE A RATIO. Or, in other words, stick to continuous reinforcement unless there is a good reason to go to a ratio.
We think most of you will accept that we have been involved in shaping a LOT of behavior. Much of that behavior, we got, and used, without resorting to ratios. Many animals we used for a decade or more ON CONTINUOUS REINFORCEMENT. We benefited from the time not lost establishing a ratio when it was not necessary. Thus we recommend that you consider giving it a try.
Well, what are the relative advantages of continuous reinforcement (CRF), ratio (FR or VR), or interval (FI or VI) schedules? Why and when would we use them? What are the advantages of CRF? When and why should we reinforce every response of a certain type, say, a proper SIT? First of all, the only way you can be sure that each response will be "proper," that is, that it will meet your criteria, is to reinforce EACH AND EVERY RESPONSE that is proper, correct according to your own criteria. If each correct response is NOT reinforced, and you start with a ratio, even a "two-fer," you are apt to allow less than perfect responses to acquire strength from that final reinforcer after the second response.
Let's say you decide to try for two-fers. You tell the dog SIT - the first response is a bit sloppy, the second one is OK. You click and treat. What have you reinforced? -- a sloppy response, chained to a good response. The sloppy one automatically acquires some strength from the final reinforcer. Hence, our rule No. 1 is IF YOU DON'T NEED A RATIO, DON'T USE A RATIO. If you decide you NEED a ratio, then our rule No. 2 is keep a response on CRF until it is just what you want, on cue, with good fast reaction (low latency), and you have given MANY (not just five or ten) reinforcers for the perfect sit - dozens, dozens, of times. DO NOT BE IN A HURRY TO GO TO A RATIO. You should also "proof" the behavior in many different circumstances, different locations, different audiences, under many distracting conditions, ALL on CRF. Then you can say your SIT is as good as you want it to be, the dog knows when to do it and how fast to do it. The behavior is now strong and reliable.
In what follows about "two-fers" and other ratios, we do NOT wish to appear that we are down-grading the advice of experienced trainers, and we certainly do not want our comments to be taken personally. There are, as we will note below, reasons that this practice has come so easily into the advice and the handbooks for training. We simply wish to give the readers here the benefit of our own experience, which runs as follows: In most situations where dogs are being trained as pets, almost never would there be a strong need for ratios. However, as far as we can tell from advice given to newbies, ratios have become de rigeur in training.
As nearly as we can tell, the "ritual" of the two-fer is widespread. It has gone inevitably into the practices and the literature of many good trainers, because it was believed to be a necessary step for building up any resistance to extinction and rapid performance. For example, in a recent Clicker Journal, a very respected trainer recommends getting the behavior, and then, before moving to new locations and other fluency building exercises -- starting with TWO-FERS! In videotapes, in recent manuals, almost everywhere, TWO-FERS! The early use of ratios verges on dogma. There may be an occasional need to give such unquestioned advice to clicker NEOPHYTES who might be prone not to reinforce behavior enough times to get it strong enough. However, as experienced trainers, LOOK CAREFULLY AT WHAT YOU ARE DOING, and weigh the disadvantages of losing precision, and the loss of time. As always, the choice is yours. Just be sure that you know you have a choice.
Now, when SHOULD we use a ratio schedule? Remember our RULE NUMBER ONE -- RATIOS ONLY WHEN NECESSARY. Once you have decided that you need a ratio, then the answer is
There is no question that a variable ratio is the best one to use if you need, or want, a VERY persistent behavior without reinforcement. Just look at the number of times a really "hooked" fisherman will cast out a bait without being reinforced. And, as one of our e-mail correspondents has noted, "according to Skinner, compulsive gambling occurs partly because people become hooked by the variable ratio. The very next response may pay off regardless of how long it has been since the last response paid off, so the gambler keeps responding."
Quite true. But how many times in your life with your dog do you run into the conditions of a), b), c) or d)? Certainly you probably want a reasonable resistance to extinction, and certainly a reasonable rate of response. And, indeed, "this schedule (a variable ratio) provides greater incentive to resume responding right after receiving a reinforcer than does the fixed-ratio schedule."
Probably one of the best examples of when to use a VSR is the case of Ham, the chimpanzee astronaut trained by Joe Brady's group for NASA. Ham was sent into space in the early 1960s, before the Mercury astronauts. Ham was taught to make discriminations and complex responses to certain stimuli, such as flashing lights and special sounds. There was concern that the food dispensing equipment might not work too well in weightless space. For that reason, and for other good reasons, it was decided to build up Ham's responses such that he could work the entire mission ON EXTINCTION. Ham's responses were built up to THOUSANDS of responses per reinforcement. One time Ham might be reinforced after a hundred responses, the next time it might be a thousand. Now that, my friends, is a RATIO!
If you are preparing to blast your doggie into space, and you want to make sure that it keeps on working, VSR is definitely the way to go. We used a VDS (Variable Duration Schedule) with our automated dancing chicken unit. When a person dropped a quarter (a nickel in the early '50's) in a coin box, a door opened and released the chicken into the performing area. The chicken walked over to a simulated juke box, pulled a loop, which started music playing, and the chicken stepped onto a platform. In the center of the platform was a photocell. When the chicken broke the light stream hitting the photocell, that started a timing mechanism (we used a dipper circuit that charged a capacitor, for the electronically literate). Now, because the chicken was what it was, the chicken had to do something other than stand still, so most chickens scratched, which looked like a dance. While it scratched about, it moved into and out of the light beam in a rather unpredictable fashion. This varied the amount of time before the equipment said "enough" and fired the electric feeder. In addition, just in case, we also placed a device in the circuitry (a variable tap on the capacitor for you electronic types) that more or less randomly changed the criteria for firing the feeder. So, we had two methods of determining the VDS, one method depended on the behavior of the chicken and one was independent of the chicken. The up-shot of this system was a chicken that danced from 8 to 22 seconds.
As you can see, when we say VARIABLE, we mean just that. Our piano playing duck (and the variant, the PICKIN PEKIN guitar playing duck) were based on VSR. As the duck played the keys up and down, there were microswitches being triggered by "hot" keys. In the old days we used stepping switches and later we used solid state decade counters to keep track of how many keys had been struck. By various means we then more or less randomly selected a number of keys that must be struck to fire the feeder (usually used a ring counter, or a variant thereof). The duck ended up striking somewhere between 13 and 25 "hot" keys. What the patron heard was TWINKLE, TWINKLE, LITTLE STAR, because we also programmed the output into a recognizable tune. Some people actually thought the duck was playing a tune. No wonder they can sell so much ocean beach front property in Arizona.
How important are schedules of reinforcement? Most of the time, especially in pet training, not terribly. That does not mean that they are insignificant to animal training. We had a coin operated unit (probably our most famous) called BIRD BRAIN. BIRD BRAIN played tic tac toe. The person had the opportunity to test his or her skill against the chicken (the chicken does get a little help). When we designed the control circuitry for BIRD BRAIN, we allowed for reinforcement at the end of the game, meaning that the chicken would usually play three, four or five times before the feeder fired. We knew that there would ordinarily never be a chance for the chicken's first move to be reinforced. We also knew from experience that a certain percentage of the birds (we guessed about 25 percent or one out of four birds) would have problems starting the game because THE FIRST PECK, OR MOVE, NEVER PAID OFF. In anticipation of this problem, we incorporated what we called a FEED FIRST CYCLE switch that reinforced the birds after the first, or starting peck. Well, we were almost right: it was one out of three birds or 33 percent. Those afflicted birds would simply pace back and forth in front of the cage, approach the switch panel and lights, and then back away and pace some more. It might repeat this behavior several times before finally giving the proper response. By turning on the FEED FIRST CYCLE switch, that delay behavior (delaying reinforcement, of course) would suddenly disappear after a few pecks had been reinforced at the beginning of the performance. Sounds strange, doesn't it?
When Skinner played our little game the first time (at a scientific conference in the late 70's) he was intrigued by the game, and very much impressed that the technology had come so far that we could PREDICT from the reinforcement schedule how certain birds would respond. We told him it was because we had to make a buck at it that we knew it so well. He enjoyed the joke, but he understood that it was only partially in jest.
I have not discussed any of our free environment stuff: seagulls, dolphins, dogs, cats, ravens, vultures, etc. Most of that work combines desensitization (the really tough part) with some rather exotic VDS and VSR schedules. Some of the seagulls and dolphins were on excursions lasting hours. That meant that some trips might last for only a half hour and others might go on for much longer. Some of the dolphin excursions lasted the entire day, meaning only ONE trial.
As shown in PATIENT LIKE THE CHIPMUNKS, the animals did things once they got to the target area, but, in my opinion, the getting there was always the hard part. The animals rather quickly mastered most of the terminal maneuvers, even the tough ones. By the way, as difficult as the terminal behaviors were, they were almost always on a CRF schedule: continuous reinforcement, even though that reinforcement might be many minutes or even hours away.
I hope I have made our position on schedules of reinforcement clear. We use the simplest schedule that works. There are those that say a CRF schedule cannot yield ANY strength. Well, our answer would be the WE found it good enough for some excellent behavior over the years. Besides, in our contacts with both experienced and neophyte dog trainers, we found most were in such a hurry that they seldom used enough reinforcements on a CRF schedule to both sharpen and then strengthen behavior.
Some say they NEVER reinforce the same behavior more than a few times, and that's it. A direct quote from a forum post -- "I have never asked for a behavior with no changes 20 times in a row, is there a point to doing that?" (Others in this journal have talked of fluency, so we won't go into that here.) Then, with a partially trained behavior, they go to a ratio of whatever (and, for the sake of this discussion, it is irrelevant if it is VR, FR, or ?). There is usually mentioned something about boredom or the dog quits doing the behavior or ?
First, in our collective experience (and this is essentially 100 years) neither of us has experienced in our training programs a bored dog, dolphin, gull, raven, elephant, aardvark, pangolin, lion, bear, squid, fish, or ???????? Next, we have had dogs performing the exact same identical behavior over 800 times in one day, and repeated that for more than a week. We did similar tests with dozens of other kinds of animals. NONE OF THESE ANIMALS WERE ON A RATIO! Well, they were on a continuous ratio, if you want to split hairs. We did NOT find the behavior in these animals to be frail. The behaviors did not evaporate when the animal was asked to perform several trials with no reinforcement.
Were the behaviors as persistent as they would have been under a VR schedule training program? No, of course not. But, if the animal would have performed the behavior very well for 10 times without reinforcement, would that not be sufficient for most tasks? How often do you need an animal to perform a behavior 100 or 1,000 times without reinforcement of any kind, food, social contact, or the opportunity to perform another behavior, or whatever?
How did the myth of frail CRF behavior find its way into the fabric of dog training? There are many possibilities. Perhaps in the last 10 to 15 years, prominent teachers of clicker training found that many trainers were working with such weak behavior that it fell apart when there was the least amount of stress, or if trainers failed to maintain some reinforcement. The teachers may have, quite logically, solved that problem by concentrating on STRENGTH OF BEHAVIOR EARLY. They accepted some diminution in the power to shape that is a consequence of training on a ratio. WE ARE NOT QUARRELING WITH SUCH A COMPROMISE. The teachers deserve the credit for introducing the man/woman on the street to the technology. We are just pointing out that many dog trainers are accepting on blind faith that TWO-FERS ARE THE AUTOMATIC AND ONLY WAY TO GO. THIS IS NOT TRUE. We do not want to have this myth woven into the fabric of most dog trainers' understanding of operant technology.
Look at it this way, perhaps, life is complicated enough without our making it more complicated. CRF is simpler than VSR. CRF works. We like simple.
Bob and Marian
List and Site Owner: Melissa Alexander, mca @ clickersolutions.com