|http://www.equineclickertraining.com:||back to TRAINING|
An in-depth look at using the 4
quadrants of operant conditioning and how it applies to clicker training
Katie Bartlett, March 2009
This is the first in a series of articles on clicker training and the four quadrants of operant conditioning. It started out as one article specifically about how to use negative reinforcement as a clicker trainer, but then it grew so long, I had to divide it up. I decided to write this part first, to make sure everyone understood about the four quadrants of operant conditioning. The next article in the series is on using negative reinforcement with clicker training, and the last article is on ways to shape behaviors using only positive reinforcement. Hopefully, the three articles together will give people information and new ideas to improve their training programs and develop their own training philosophy. I do want to state that I have no formal training in learning and behavior theory, but I am very interested, so I have been reading, watching relevant DVD's, attending seminars and doing some thinking on the subject. This is a long article. If you want to print it out and are having trouble, email me and I will send you a pdf or word file.
I have tried to present the information on operant conditioning as factually as possible, but the rest is based on my own observations and experiences. I hope if anyone finds an error, they will let me know, and if anyone wants to contribute any thoughts based on their own experience that would be great too. One reason I wanted to write this article is to put down on paper where I am now in my development as a clicker trainer. I know I have changed over the past few years and I am constantly re-evaluating how I do things. So this article is a snapshot into my current thinking and in a few years, I might have to update it to show what has changed and where I think I am going. If nothing else, I hope this article makes you think a bit more carefully about what you are doing, and gives you some insight into how other people do things.
When I first started my website, I included a definition of clicker training and a small section on how to use clicker training with horses. These are in the sidebar menu on the main page and I have left them there (under what is clicker training?) so that people can get a quick idea of what clicker training for horses is all about. Since then I have learned more about the science behind clicker training and this has improved my training in many ways, including helping me to analyze and troubleshoot training issues. But this is a complicated subject and I realize it might be more information than some people are ready to absorb right now. So if this article makes your head spin, don't despair. It took me a few years to understand and remember the four quadrants and apply that knowledge to what I was doing on anything more than a superficial level. But that was ok. Every time I read about it, it made more sense and eventually I got to the point where not only could I understand it, but it seemed useful and relevant.
A lot of the information in this article comes from studying how clicker training is used in other species (dogs and zoo animals) and from animal behavior textbooks. I do want to acknowledge two people. Alexandra Kurland is the person who taught me about the many ways to use clicker training with horses and also helped me switch over to the "clicker mindset." I also have to mention Kathy Sdao because her lectures and DVD's have been a great resource about the science behind it all. I have also benefitted tremendously from attending Clicker Expo (a 3 day conference sponsored by Karen Pryor) and getting a chance to see a wide variety of training strategies that all fall under the umbrella of clicker training.
In a way, I have been working on this article for a long time because the first part of it is based on the information I present when I am asked to do an 'introduction to clicker training" talk. One of the goals of these talks is to explain clicker training so that people see that all forms of animal training (including clicker training) are based on operant conditioning. A lot of people seem to think clicker training is some new, separate gimmick and they don't realize that it is part of a bigger picture that includes traditional training methods and clicker training. I also want to start exposing people to some of the terminology of operant conditioning. Understanding the terminology makes it possible to converse with other trainers and understand what types of training strategies they are using.
At these talks, I often have a mixed audience and there are people with many different reasons for wanting to learn more about clicker training. There are traditional horse people who just want to improve what they are doing and are looking for a way to increase the horse's motivation. There are traditional horse people who are fine with what they do, but just want to train some behaviors that are trained more easily through free shaping (tricks, liberty work, object discrimination, etc...). I also meet a lot of people who have been unsuccessful using traditional horsemanship with a difficult horse and they are looking for an alternate method. In addition there might be some people who have clicker trained other species and now want to apply it to horses. With this type of mixed audience, I find it works best to just start at the beginning. For me, that means starting by defining clicker training and operant conditioning. Once everyone has a basic understanding of the science behind clicker training, we can look at more details of how clicker trainers actually train and maintain behaviors.
This article is long and packed with information and some mental meandering on my part. I did not want it to be dry, textbook reading so I have tried to add details that make understanding the science easier for horse people. And I have included some of my own thoughts and ideas with which you may or may not agree. That's ok. We are all going to take different approaches and my idea is to share what I know and what I am thinking about, in hopes it will generate some thinking and new ideas on the reader's part. To make it a bit easier to navigate around or find relevant sections, I have divided it up into sections. Here are links to each individual section. The links are provided so that you can find a certain topic after you have read the whole thing. They are not intended as "stand alone" discussions. The sections are:
What is clicker
The Link Between Positive Punishment and Negative Reinforcement
Does Negative Reinforcement create animals that just follow directions?
Does Negative Reinforcement become less effective over time?
Advantages to using negative reinforcement with horses
Translating Theory to Real life: Sources of confusion in determining what quadrant you are you using
Remember it is what happens AFTER the behavior that matters
Is it positive or negative reinforcement?
Is it positive punishment or negative reinforcement?
Some Real life Training Examples
A cheat sheet for the Grid
A Few Last Words
What is Clicker Training?
"Clicker Training is an animal training method based on behavioral psychology that relies on marking desirable behavior and rewarding it."
This is Karen Pryor's definition of clicker training and I think it is a good place to start. If you are not familiar with Karen Pryor, visit her website at www.clickertraining.com. Karen has been (and continues to be) a leader in educating the public about clicker training, supporting research on clicker training, and finding new ways to apply clicker training to all kinds of training situations.
There are two parts to her definition. The first part states that clicker training is based on "behavioral psychology", and here she is referring to operant conditioning. The second part is that you mark and reinforce the behavior. Please note that she does not say how you get the behavior, just that you mark it and reinforce it. There are other definitions of clicker training out there, but I like Karen's because it is broad enough to cover many applications of clicker training.
But what is operant conditioning? The simplest definition I could find (with the least technical jargon) is, "Operant conditioning is the use of consequences to modify the occurrence and form of behavior." This means that whether or not you are likely to repeat a behavior is determined by what happens immediately after you do the behavior. If the consequence (what happens after) is something you like, you are more likely to repeat the behavior than if the consequence is something you do not like. Operant conditioning is about consequences affecting future behavior. This is important to remember.
There are four types of operant conditioning and they are usually presented as a grid, sometimes called the training grid. I like to show the grid when I introduce clicker training because I think it puts clicker training in context and provides a framework for understanding more about learning and behavior. I think it also helps new trainers (especially those with previous animal training experience) to recognize that they are already using operant conditioning in their training. They just did not know what it was called. I find that being able to relate something new to something you already do is often helpful.
I want to point out that operant conditioning is not something that psychologists invented: it is a way of describing what happens in real life. Most learning happens through operant conditioning. If I think of behaviors I have learned, many of them were learned because of the consequences of my behavior. If I touch a hot stove and I burn my hand, that behavior (touching a hot stove) decreases. The consequence (pain from the burn) changed my behavior (touching the stove.) The more closely the consequence follows the behavior, the more chance it will increase or decrease it. If I touch the stove and I don't notice I have burned myself until later, I might not realize where the burn came from and the behavior of touching the stove might not be affected.
Here is the Operant Conditioning Grid, showing the 4 quadrants which are +R,-R, +P, and -P. Extinction is not included in the grid. In extinction, a previously reinforced behavior is no longer reinforced and the behavior disappears. Extinction can be a useful tool and it is worth learning about extinction, but I am not going to cover it in this article. There is information available on extinction from other resources (internet, books, etc...)
REINFORCEMENT (increase in behavior) R
PUNISHMENT (decrease in behavior) P
positive reinforcement (+R)
addition of something increases the target behavior
example: horse stands on mat, I feed a carrot, horse is more likely to stand on mat again. Horse is also more likely to go to the mat on his own.
positive punishment (+P)
addition of something decreases a behavior
example: horse paws at mat, I yell at him, he stops pawing
|NEGATIVE (-) take away something
negative reinforcement (-R)
removal of something increases the target behavior
example: I lead horse to mat with pressure on the line, I put slack in the line when the horse stands on the mat, horse is more likely to go to the mat. Does this make the horse more likely to go to the mat on his own? Not necessarily, but his leading to the mat might improve as that is the behavior being reinforced.
negative punishment (-P)
removal of something decreases an unwanted behavior. (-P)
example: horse paws at mat, I leave (taking reinforcement with me), horses stops pawing
The positive/negative and reinforcement/punishment terminology comes from BF Skinner and it is confusing at first, but since it is the accepted terminology, I think it is important to understand the terms.
Negative and positive only refer to adding and removing something (think math). They have nothing to do with whether or not we are being nice. I can add something the animal wants, or something the animal doesn't want. I can remove something the animal wants, or something the animal doesn't want. In addition, I need to be aware that the effect of adding "something" is going to depend upon each specific situation. A wise trainer is constantly evaluating what is reinforcing to an animal and what is aversive and while she might start with some assumptions, being observant and flexible is important.
Here is a simple example. I like to eat chocolate and if you offer me a chocolate when I do something you like, I am likely to offer the same behavior again. But what if I have just had a lot of chocolate, or feel sick or for some reason the idea of eating chocolate is not appealing? If you offer me chocolate and I don't want it, then I am going to be less likely to offer the same behavior again. If I had previously eaten too much chocolate, the sight of it might be so nauseating that I might be unwilling to offer the same behavior again. So the same "something" can create different outcomes in different situations.
On the other hand, what if you paid me $100.00 every time I ate a lima bean (I hate lima beans)? Over time, I might learn to like lima beans and instead of viewing them with disgust, I would look eagerly for them. My reaction to lima beans has changed. Now something that would previously have made me less likely to offer behavior is now making me more likely to offer a behavior. If you want to read more about how clicker trainers take advantage of this to change the value of reinforcers, or create reinforcers, look for information on "The Premack Principle." The point I am making here is that if I am adding something with the intention of increasing or decreasing behavior, I must remember that the subject is the one who determines what is reinforcing and what is punishing.
Reinforcement and punishment only refer to whether or not behavior increases or decreases. As with negative and positive, the terms have nothing to do with being nice or whether or not the behavior is "good" or "bad." They just indicate whether we see more or less of it. Operant conditioning is about consequences, so while we can use prompts, cues or some other antecedent before a behavior occurs, they do not count as adding something. It is what happens after the behavior occurs that is important.
I want to go through the training grid one quadrant at a time, explain each one, and then present a bit of background about its common use and how it relates to clicker training. I also want to point out that while the quadrants make it look as if I am working in one at a time, when I am actually training, it is more common to be using more than one quadrant, or at least moving from one to another. If one behavior is increasing or decreasing, it is also affecting other behaviors. I will explain more about that later. I also want to mention that clicker trainers should also be aware of classical conditioning which is also at work when we are training animals. More information on classical conditioning is readily available from other resources.
This is the quadrant that most people associate with clicker training. Positive reinforcement means we reinforce (increase) behaviors by adding something. I think positive reinforcement is the easiest quadrant to describe and use. Most people understand the concept because we can find examples of it all over the place in everyday life. We are more likely to repeat behavior that is followed by positive consequences. If I eat something that tastes good, I am more likely to eat it again. If I open the door for someone who says "thank you" and smiles at me, I am more likely to open the door again. By definition, the act of clicking and reinforcing is positive reinforcement. If I click and deliver a reinforcer, the animal is going to repeat the clicked behavior so it can earn more reinforcement.
But using positive reinforcement is not always easy. One challenge with using positive reinforcement in training situations is that there may be practical limitations as to how quickly the trainer can deliver reinforcers for desired behavior. Another difficulty is that reinforcement that is very motivating can also be very distracting. When operant conditioning was first applied to animal training, behavior was captured and a reward (usually food) was delivered immediately. It had to be delivered before the animal could perform another behavior in order for the desired behavior to be reinforced. This in itself made it difficult to use outside the laboratory until Keller Breland discovered you could use a marker signal.
The use of a marker signal gives the trainer at least three advantages over positive reinforcement alone. There is more flexibility in how to train behaviors because the marker signal can be used to mark behaviors that could not be positively reinforced in a timely manner. The ability to effectively train a whole class of behaviors was possible now that the correct behavior could be marked and the reinforcer could be delivered after a small delay. This class of behaviors includes training without any physical contact (at a distance, behind protective contact, etc..) and training behaviors that could not be trained without the precision of the marker signal. In addition, the marker signal provides greater precision in marking the exact behavior that is being reinforced so behavior could now be fine-tuned beyond what was possible before. The marker signal also solves the problem of the animal being distracted by the reinforcer. The animal learns reinforcement is not coming from the handler until it hears the marker signal.
In addition, I think the use of the marker signal changes the very nature of training with positive reinforcement because it makes the training process clearer to both the trainer and the animal being trained. Clicker training opens up many possibilities that do not exist if the trainer is using positive reinforcement alone because clicker training has more precision, power and flexibility. I should point out that the use of a "clicker" is not mandatory in clicker training, which is why I referred to a marker signal. Trainers use a variety of different markers to identify the "clickable moment," including, but not limited to whistles, verbal markers, lights, arm gestures, and various kinds of clickers. I have met people who are training with positive reinforcement and don't identify themselves as clicker trainers, but they actually do use a marker signal, they just don't realize the significance of it.
A lot of early work on clicker training was done with lab animals and marine mammals. These were training situations where the trainer either had close control of the environment (as in the lab) or the trainer was working with the animal at liberty, or at least with no physical contact. Using positive reinforcement through clicker training allowed the trainer to shape behavior without physically manipulating the animal. Early work with dogs used the marine mammal approach and behaviors were shaped by clicking and rewarding each tiny behavior that could lead to the finished product. This is called shaping. Zoo animals are also trained this way.
One reason this works so well is because one aspect of operant conditioning is the ability for the learner to become "operant." The word "operant" used in this sense means "operating to produce effects." In real life, it means that the trainee can learn to manipulate the environment to produce more favorable outcomes. So, if a dog learns that sitting produces dog biscuits, the dog will learn t sit when asked, and will also experiment with sitting at other times, to see if he can make the environment produce a dog biscuit. This makes it possible to train some behaviors very rapidly and efficiently because the animal is driving the training process by offering behaviors as he tries to figure out how to create more positive consequences (reinforcement). It also means that the trainer can train animals in situations where the trainer's only tool is marking correct behavior, because the animal is in a tank or cage or otherwise separated.
Here are some examples of positive reinforcement:
1. If I go in to the kitchen and there is a large, chocolate cake with a
sign that says "eat me," the behavior of going to the kitchen is going to
increase, at least while the cake is there.
2. I am positively reinforced for going to the bank by getting money.
3. My horse breaks the fence and goes and eats grass. He has been positively reinforced for breaking the fence.
4. My horse is reinforced for touching the target by clicking and treating him.
5. I let my dogs out when they sit at the door (going out is reinforcement)
Since the act of clicking and reinforcing is the basis of clicker training, I could just stop here and say that clicker training is using positive reinforcement with a marker signal. And there are some trainers who believe that "pure" clicker training means the trainer is only using positive reinforcement. But the operant conditioning grid is showing us that there are 4 ways that animals learn. Are we limiting ourselves if we only work within the positive reinforcement quadrant? Is clicker training just positive reinforcement with a marker signal?
There are huge advantages to sticking with only positive reinforcement as animals trained with positive reinforcement are bright, enthusiastic, creative and love to play. Clicker trainers want animals that know how to learn, think and enjoy the training process. Training with positive reinforcement is very forgiving and the training process itself creates a great relationship with your animal. Using positive reinforcement only can be very clean and efficient. The trainer focuses on desirable behavior and does not get caught up in the downward spiral that can happen when she focuses on decreasing behavior. And there is lots of room for different training styles, preferences or approaches under the umbrella of positive reinforcement. If I gather together a number of clicker trainers, I am going to find a wide range of training strategies and styles based on past experience, personality, training goals and philosophies, but despite any individual differences, I think most clicker trainers are committed to using as much positive reinforcement as possible.
However, all those advantages do not mean that learning to train using positive reinforcement as your main tool is easy. It can be challenging for some people and it can create frustration on the part of both the trainer and the trainee. There are some common trainer problems. One difficulty some people have is that in order to shape behavior well, you have to be very observant of the animal and have a good sense of what steps lead to the goal behavior. If you are a novice trainer, you might get stuck trying to figure out how to get the animal to do something that you can click. If you are more experienced, but training a new behavior, you might get stuck because you don't know what to reinforce to get to the next step.
Sometimes it can be difficult because of the animal you are training. Animals that have had coercive training in the past, are shy, or have fear issues are going to be reluctant to offer any behavior, which doesn't give the trainer much to work with. Novice trainers can give up because they don't know how to get the animal started. There are a lot of strategies that experienced clicker trainers have learned that can help them get past these hurdles or that they use to help new trainers develop better shaping skills, but this can take time. The last article in this series is going to be on ways to train behaviors using only positive reinforcement. It will include suggestions on how to use targeting and other +R tools to help get behaviors started. I find that sometimes it takes a while to develop the kind of creativity and mental flexibility to see how to get started and then shape each little piece of a behavior into the end behavior.
So while we all want to get to the point where we are using mostly positive reinforcement, I think of this as a goal, not a starting point. Each person is going to take a slightly different route toward this goal, based upon where they are starting and where they want to go. For some of these people, learning to use the other quadrants wisely can be helpful. It makes it easier for them to end up as successful clicker trainers because these additional tools can get them past some of the common hurdles. In addition, learning to use the other quadrants well improves their basic understanding of the advantages and disadvantages of each quadrant of the training grid. Being knowledgeable about the whole training grid makes it easier to make educated choices.
I think this goes to the heart of clicker training. In my view, clicker training is not just about clicking and treating. It is also about having a philosophy and commitment to a way of working with animals where the needs of the animal are recognized and the relationship is valued as much as any training or performance goals. Clicker training is about creating happy, confident and creative animals while teaching them new skills.
If you want to explore more about how to use positive reinforcement only with horses, visit the +R training page, which is a collection of ideas for how to shape behaviors using only positive reinforcement. I am going to move on to positive punishment next. Going from positive reinforcement to positive punishment is going from one end of the spectrum to the other, but remember that part of the point of this article is learning to recognize what you are doing.
I have yet to meet any clicker trainers that recommend the use of positive punishment in training situations. The emphasis in clicker training is on reinforcing the behavior you do want, so time spent focused on undesirable behavior is not productive training time and can be damaging to the relationship between the animal and its trainer. Good training focuses on teaching the animal what we want it to do. So any time I find myself focusing on decreasing behavior, I need to go back and think about what behavior I do want. That is the kind of training that will lead to long term changes. One way to think about this is to realize that animals are always doing something. If they are doing behaviors I don't like, I might be able to use punishment to decrease an individual behavior, but since the animal has to be doing something, a new behavior is going to take the place of the one I just punished. Unless I fill that void with a desirable behavior, the animal is likely to replace the previous behavior with a new equally (or worse) undesirable behavior. This is especially true if I don't change anything about the environment or set-up.
I am sure we all can think of examples of positive punishment, but just to be clear, here are a few
1. A horse bites me: I hit it and it stops biting me.
2. A horse kicks out at the whip: I hit him with it and he stops kicking it.
3. A horse kicks the wall: I yell at him and he stops kicking.
In addition to the fact that punishment can focus the trainer on undesirable behavior, there are a lot of other reasons that clicker trainers do not like using punishment. I think most of us have a philosophical or emotional dislike of using punishment and it can become a matter of avoiding punishment because it doesn't feel right to us. But beyond that, there are a lot of good scientific and practical reasons to avoid punishment. Punishment is hard to apply correctly. Often it either has no affect at all, other than a brief interrupt, or it creates a vacuum into which another undesirable behavior can come. In addition, punishment tends to suppress all behaviors in general and can have unpredictable effects. Often the trainer ends up punishing the wrong behavior or multiple behaviors. In general, for punishment to be effective it has to happen the first time the behavior occurs, and be strong enough to stop the behavior (Karen Pryor).
I also want to point out that when you are using punishment, you are not training in a proactive way. You are responding to something you do not like, after it has occurred. Most successful trainers realize that preventing undesirable behavior is more effective than reacting after the fact. And punishment is tricky. As I said earlier, it is not enough to get the animal to stop doing something when we punish it. If we are looking for long term behavior changes and want to use punishment as a training tool, we have to look at whether or not the behavior decreases over time. I think one of the problems with using punishment is that while it initially seems to work (which makes the trainer more likely to do it again), it can create such varied and unpredictable side effects that a novice trainer does not realize they are related to the use of punishment. The novice trainer doesn't realize their "solution" has made things worse instead of better, and the whole training situation goes downhill from there. .
Can punishment ever be used successfully? Yes, in some situations. Steve White, a police dog trainer, has the following 8 contingencies for when you can use punishment. Steve has generously give me permission to put his list here. He specializes in training police dogs and policemen and his web site is www.i2ik9.com.
This list is from his Trainer's Pocket Reference which has the 10 laws of shaping, 8 ways of getting rid of unwanted behavior, 4 conditions of stimulus control, and notes about punishment. I am quoting him directly so the reference is to dogs, but this applies to all animals.
1. It must
be something the dog dislikes and does not expect;
2. It must suppress behavior, otherwise it's just plain abuse;
3. It must be of the perfect intensity. Too much and the dog will shut down. Too little and the dog develops resistance to punishment;
4. It must happen immediately after the behavior;
5. It must be associated with the behavior, not you! Otherwise your presence is a signal that punishment may occur, and your absence is on that it will not. The result? A "sneaky" dog;
6. It must happen every time the behavior occurs. Otherwise, you may put the undesirable behavior on a variable schedule and make it even tougher to break;
7. There must be an alternative for the dog. Give him an opportunity to perform an acceptable behavior in order to escape or avoid punishment.
8. It must never be used to the extent that punishment outweighs reinforcement...from the dog's perspective.
If you can't follow all eight of these rules, avoid punishment. Otherwise you'll end up with unintended and undesirable side effects."
When Steve presented this list at Clicker Expo, he emphasized that there were very few times in his training when he could meet all 8 rules. The safest thing is to just avoid positive punishment completely. I think most clicker trainers would agree that punishment is not a desirable way to change behavior and should not be considered as part of any training plan. Bob Bailey, another important and well respected animal trainer, trained thousands of animals for Animal Behavior Enterprises and he says they might have used punishment a handful of times, if that.
There are, of course, a lot of traditional trainers out there who do use positive punishment and it seems to work effectively for them in some cases. This is if you measure success by the fact that they end up with more compliant animals. These types of trainers are often unsuccessful with a lot of animals too, but it is rarely seen as a failure of the punishment to be effective. And operant conditioning says that positive punishment does work. It is one way to decrease behavior. I think if you showed them Steve's list, they would say that they successfully use positive punishment all the time without undesirable side effects, or they might not recognize that they are using punishment at all and think Steve's list doesn't apply to what they do. I think that, particularly in the horse world, there is a bit of a "get the job done" attitude and sometimes trainers don't realize just how much this is affecting their relationship with the animal. Positive punishment may be one way to get the job done, but I think its drawbacks outweigh its advantages.
Because punishment is so readily used in other types of animal training and in other parts our lives, it can be hard to understand why punishment is such a problem for clicker trainers, especially for new trainers when they are just starting.I remember back to the first year that I was clicker training. One day Rosie started banging on her stall door and I yelled at her. Then a few minutes later I took her out to do some free shaping. We had been working on standing on her box and some other tricks. She was very subdued and would not offer anything. While it was clear to me that the yelling was directed at the door banging, it affected a number of other behaviors too. And to be honest, the yelling was not even a very efficient use of punishment for the door banging. It decreased the future occurrences of door banging that day, but not in the future.
Would I have noticed she was subdued if I was not looking for her to offer behaviors? Maybe not. I might even have thought her attitude was a sign that my use of punishment had been effective.I think this points out a key difference between clicker training and other forms of animal training or situations in everyday life. Clicker training works because we create an environment where the animal feels safe enough to explore and try new things. As clicker trainers, we need our animals to be willing to offer behaviors because this is the raw material we use to build new behaviors. Clicker trainers are not interested in making animals do things through force or coercion. We are totally dependent upon the animal's desire to play this game with us. That means we need to set up situations where the animal wants to play the game and the use of positive punishment undermines this goal.
I also have to point out that for a lot people, it is hard to use positive punishment without tapping into their emotions, especially as the punishment becomes stronger. I think it can be easy for people to fall into the habit of using positive punishment and they do not pay attention to how and when they use it, or if it is effective. Once they start to pay attention to these details, they start to realize how much it is affecting them and their relationship with their horses. There are other people who have been taught to use punishment but never get comfortable with it. In both cases, I think that once you start to see how damaging punishment can be and realize that there are other options, giving up positive punishment as a training option is actually a big relief. Realizing that there are positive solutions to many training issues can be very liberating.
Unfortunately, excluding punishment from our training plans doesn't mean it can't sneak in when we are dealing with our animals in real life. I find that despite my best intentions, I might end up using punishment to get through an awkward situation when I am just handling the horses in our daily routine. It could just be a matter of yelling at the horses when they are crowding the gate or waving a lead at a horse that is getting too close when leading, but I do sometimes find myself reacting to a situation without thinking it through.
I think some of this is a "knee jerk" reaction because in the past I have worked with trainers who viewed punishment as part of horse training. When I was first taught to work with horses, I was taught to "get after" a horse that was doing something undesirable. There is still a lot of emphasis in horse training on "making sure the horse knows who is boss" and "getting your horse's respect" and being the "alpha" in the herd. I used to feel I would never be able to override some of these automatic reactions, but I have found they fade over time and as I pay more attention to what I am doing at all times when I am working with my horses. In some cases, it is just a matter of noticing it and making a conscious decision to do something else next time. In other cases, it is about using management to avoid a difficult situation. And some of it is just getting more experienced at seeing how I can reinforce little changes in the right direction to change something over time.
One thing I have found helpful is to ask myself some questions after I use punishment. They are:
1. Was it effective?
Did it stop the behavior at the time? And more important, did it make my
horse less likely to repeat the behavior in the future? If I find myself using punishment for
the same behavior more than once or twice, it is not being effective.
2. Could I have avoided it? Is there a training hole or management solution so I can avoid using punishment? Am I pushing the horse too fast? Asking for something he can't do?
3. Am I punishing a behavior that I might ever want the horse to offer me in some other context?
4. Is the punishment having an adverse effect on the horse? One of the problems with punishment is that it has side effects that are easy to miss, but if you are looking for them, you can often spot them.
5. Is the punishment having an adverse effect on me? Is it making me feel adversarial or upset at the horse?
I think that if you do end up using positive punishment in this way, there is still a way to make it a less aversive training experience. Positive punishment often creates a little "void" when the animal stops doing the undesirable behavior and this creates an opportunity to either reinforce the animal for standing quietly or to ask for another behavior which you then click and reinforce. In some cases, I might just click once and move on or in other cases, I might spend a few minutes working on an alternative behavior instead of the one that triggered the punishment. The one caution about doing this is that I can inadvertently chain the undesirable behavior into the sequence with this approach. To avoid this, I have to keep track of their behavior in that same situation a number of times to make sure that I am not inadvertently reinforcing the undesired behavior with an impromptu training session. In general it is better to make a mental note and set aside some other training time to work on the issue. Then go back and use the new behavior before the horse can do the undesirable one.
As I learned more about clicker training, I had to train myself to look for other options when I found myself tempted to react to undesirable behavior by using punishment. I think for most people this is a process that takes time. Progress comes from learning to use other tools and also from being less emotional about when a horse is doing something wrong. The horses also showed me how much punishment affected them and this helped make me more committed to using as little punishment as I could. I often think of those programs that have 10 step processes for helping people deal with unwanted behavior (addiction, anger management, etc..). I don't know what all the steps are but I am pretty sure the first step is awareness.
I think some people struggle with clicker training horses because clicker training is presented as a totally positive approach and a lot of the books and other material make it seem as if everything can be solved by just reinforcing what you like. If your horse has a lot of undesirable behavior that you want to change, it can be hard to see how to use positive reinforcement to solve all your training issues. And some horses have undesirable behaviors that have strong reinforcement histories and they are committed to these strategies that have worked for them in the past. This can be very discouraging for new clicker trainers because in order to clicker train, they now have to give up a tool that while it might not have worked all the time, has sometimes helped in the past.
Luckily, operant conditioning provides us with other tools. While clicker trainers do not use positive punishment, they do use negative punishment and negative reinforcement. I am going to cover negative punishment next and then go on to negative reinforcement. Negative reinforcement and positive punishment are very closely linked, so it is going to be useful to keep in mind the information I have presented about positive punishment. I am going to go into detail about the connection and differences between positive punishment and negative reinforcement later on, but a good understanding of positive punishment is going to be helpful.
Negative punishment is the other quadrant of operant conditioning that is about decreasing behavior. In negative punishment, something is taken away in order to decrease behavior. The idea is that when the animal performs an undesirable behavior, I take away something the animal wants. The removal of "something" interrupts the behavior, and makes the behavior less likely to occur again in the future. The "something" that is removed could be a physical change (removing an object) or less tangible, such as removing attention.
The most common use of negative punishment is to remove attention or the chance to earn reinforcement. The simplest version of this is giving the animal a "time-out." A short time-out could mean just stopping for a brief time and not interacting with the animal. Trainers who use this technique often have a particular stance or posture that indicates to the animal that no reinforcement is coming for a moment (Ken Ramirez, Least Rewarding Stimulus or LRS.) I might just turn slightly away and disengage from the horse for a moment. I would be removing attention and the possibility of reinforcement.
I can do the same thing with an object. If I am training with an object (a toy or prop) and the undesirable behavior is centered around that object, I could just take it away when I didn't like the behavior. If I am teaching my horse to play the piano and it keeps biting the keys, I could just put the piano behind my back for a minute if the horse bites it. I might choose to do this instead of ignoring the behavior if I thought the horse might damage the piano. If I want to give a longer time-out, I might pick up my training equipment and leave, or I can put the animal back in his living space, if I had him out in a training area.
For negative punishment to be effective, It is important to make sure that the animal does not want you to remove whatever it is that you are removing. For example, if I am training my horse and it pins its ears at me and threatens to bite me, I could give it a time-out and leave. If the animal did want me to stay and play and was just feeling frustrated, then leaving would be an effective use of negative punishment. Because the animal did not want to me to leave, the time-out might be an effective way to decrease the undesirable behavior. On the other hand, if the animal was frustrated, feeling defensive or angry and wanted me to leave, then by leaving I have reinforced the behavior I am trying to decrease.
This means I have to be careful about evaluating what is reinforcing the horse for an undesirable behavior and make sure I am not inadvertently rewarding it. I find I usually have to use negative punishment a few times before I know if it is working well for me. Yes, leaving does usually get the animal to stop doing an undesirable behavior, especially if it is directed at the trainer, but I am looking for a long term change. It is not enough that the animal stops when I leave. That could be happening just because I am no longer available to be the object of the unwanted behavior. What I want is for the animal to start to make choices about its behavior so that I don't leave. I want leaving to decrease the behavior in the future. These are long term changes and I might need to use negative punishment a few times before I can see if it is decreasing the behavior or not.
Of the two forms of punishment, negative punishment is one that is used by clicker trainers in training situations. It generally has less side effects than positive punishment. But some animals are very sensitive to negative punishment and if I use it, I am still very careful to use it minimally and with caution. Negative punishment can be very aversive to some sensitive animals and it can cause them to shut down or act out more. An animal that is showing frustration during training is going to be helped more by rethinking the training plan than by punishing the unwanted behavior. As with positive punishment, negative punishment is reactive, not pro-active, and a better training option is always to focus on the behavior I want and set up the training so the animal is less likely to offer the unwanted behavior.
In order for negative punishment to be effective, the animal has to be on a high enough rate of reinforcement that the removal of the opportunity for reinforcement is significant. If I am working on long duration behaviors and my rate of reinforcement is very low, negative punishment will be less effective than if I am working on a behavior where the animal is getting a steady stream of reinforcement. Keep in mind that the reinforcement does not have to be food so when you are observing someone else train, you need to know what they are using for reinforcers.
I sometimes use negative punishment during the course of training to interrupt unwanted behaviors but these are usually small things and it is just a minor pause in the training. Most of my horses are very clickerwise so I am not dealing with a lot of unwanted behavior. But I did have one recurring behavior that I had been unable to completely eliminate and this was a bit of pawing that Rosie was doing in the wash stall. When I got her at 9 months and taught her to stand in the wash stall, she would paw almost non-stop. She was not good at standing still in general and when I put her in the wash stall all that anxiety had to go somewhere so she pawed.
Reinforcing her for standing quietly (4 on the floor) was one of the first behaviors I worked on with her and over the years she improved, except I could never quite get rid of the last little bit of pawing. She would paw a few times with her left front foot when I first brushed her and while it was better than she had been, the pawing never completely disappeared. She seemed to chain together all kinds of behaviors and include pawing in it. She would just paw four or five times in a session, but I couldn't figure out why the behavior did not disappear entirely. I suspect that it had somehow been reinforced early on as I was just learning clicker training when I worked on this and pawing can be self-reinforcing too.
So finally, as an experiment, I decided to try a big time-out. I had used little time-outs in the past where I would stop, wait and then brush when she was still, but while this might stop the pawing in that session, it did not decrease pawing in future sessions. She seemed to think it was a little game or exercise we did. I was not sure if putting her back in her stall for a big time-out would work as she has always been sensitive about being groomed. I thought it was possible she would prefer to be in her stall instead of being groomed. But it turns out that it was not.
On the first day after deciding to try this, I put her in the wash stall and started grooming her. The first time she pawed I unclipped her, took her back down to her stall, and put her in. I did that a few times the first day whenever she pawed and after a few of these time-outs, she stopped pawing. I did that a few times the next day and on the third day, she did not paw at all. I did increase the reinforcement rate for standing quietly in these same sessions. I wanted to make it very clear that standing quietly in the wash stall meant lots of reinforcement and pawing meant none. I have to say that I was amazed at how successful this was and it taught me a lot about how negative punishment can be used successfully to solve even persistent training problems.
In the sections on the other quadrants, I defined them, gave examples and described a bit about how (and if) clicker trainers used them. I started this section with the same format and it kept getting longer and longer. I do not want to swamp new people with too much information so I decided that a more detailed description of how to apply negative reinforcement as a clicker trainer needed its own article. For that reason, I pared this section down a lot to keep it simple. I am going to define and give examples of negative reinforcement, give a brief description of how to use it with clicker training and explain why it is so controversial. More details on specific methods of using combining it and combining it with positive reinforcement will be covered in the next article, which is called How to use Negative Reinforcement as a Clicker Trainer.
If you are a horse person, then you are already familiar with negative reinforcement, although you might not know it by that name. Most traditional horse training uses negative reinforcement to both prompt, train and cue behaviors. Remember, this does not mean it is negative in the sense of being unpleasant or mean, it just means that we use the removal of a stimulus to train behavior. Pressure and release works because we are using negative reinforcement. A lot of leg and rein cues are taught using pressure and release. I apply a leg aid or take the slack out of the rein and release when the horse does what I want. The horse learns that by moving off, or turning or doing a certain behavior, he can get me to remove the pressure.
The fact that training with pressure and release is based on negative reinforcement is something I have heard and read since I started clicker training. I have always accepted it as a typical example of negative reinforcement and I have used it to explain how negative reinforcement works. But when I sat down to write this article, I realized that I didn't know enough about how negative reinforcement was used outside of horse training. I wanted to understand more about how negative reinforcement is used in other types of training or behavior modification so I did some additional reading and I looked on the internet for other examples. I have to say that I was amazed at the variety of ways in which negative reinforcement is applied, or occurs naturally.
I am going to list some of the examples I found as I think it is helpful to read them. These examples come from a few different web sites ( (http://www.princeton.edu/~yael/LearningCourse/Notes/Examples.doc, http://www.utexas.edu/courses/svinicki/ald320/negrnf.html) and from my notes from various speakers and books.
If you are having trouble identifying the stimulus and what behavior is being reinforced, click here to get the answers. I suggest you take time to try and figure it out before looking, as this is a good mental exercise. For another example, an interesting story about applying negative reinforcement to a work situation is at http://www.intropsych.com/ch05_conditioning/using_negative_reinforcement.html.
Even though there is a huge amount of variation in these examples, the list above shows that all applications of negative reinforcement have one thing in common: a behavior increases if it is followed by the removal of typically, an annoying or aversive stimulus. The intensity of the stimulus can vary. It could be as mild as having the sun in your eyes, more unpleasant such as having a headache, or even more unpleasant as in the loud sound of a fire alarm. The list shows how often negative reinforcement is influencing many of the behaviors in our lives and how it can occur in many situations.
The variety on this list also shows why negative reinforcement is such a complicated topic and needs careful explanation. As I noted above, I will be following up this article with another one specifically on some ways to combine positive and negative reinforcement in horse training. This article is more about theory and philosophy. The next one will be more about putting all this in practice. Here, I want to briefly write about the why using negative reinforcement can be a problem for clicker trainers and also about why it is worth exploring anyway. I am hoping that by presenting this information, it will explain both why some clicker trainers choose not to use negative reinforcement, and why some are able to use it effectively.
Most of this discussion on using negative reinforcement is going to focus on the deliberate use of negative reinforcement to get a certain behavior. I do want to point out that sometimes the use of negative reinforcement comes into a training situation not because the trainer chooses to use a negative reinforcer, but because there is something in the training environment that is already affecting behavior. An astute trainer can recognize the reinforcement value of removing the aversive and use this to change behavior, either by itself or in combination with positive reinforcement. With horses, we often see this with desensitization. If my horse is scared of the clippers, I can reinforce the horse for standing while I approach with the clippers by clicking and treating (positive reinforcement) and by taking the clippers farther away (negative reinforcement).
With the clippers, I am deliberately introducing something that could be an aversive stimulus, but this situation can also occur when there is some kind of environmental change that creates an aversive. If I am working my horse in the ring and it is reluctant to go in one corner, either because of a sudden aversive event such as a trash can blowing over, or because it has concerns about something about the corner, I can reinforce the horse for a behavior I like (going toward the corner, standing in the corner, head down in the corner, etc...) with the chance to leave the corner. In both cases, the clippers and the scary corner, I am not choosing to introduce an aversive to get a specific behavior, but I am recognizing that there is an aversive in my training environment and using negative reinforcement to work through the situation. This is a bit different than how I use negative reinforcement as a specific tool to train new behaviors which is what I am going to talk about next.
I learned to use negative reinforcement with horses as part of my traditional horse training and I learned clicker training from Alexandra Kurland who combines negative reinforcement with positive reinforcement in her training program. She has put a lot of time and effort into coming up with a training system that combines and takes advantage of both the usefulness of negative reinforcement and the power of positive reinforcement. In her system, if she uses negative reinforcement, she uses a mild stimulus and she rewards the horse for the right answer by both removing the stimulus and offering some other reinforcement. This is the most common way that I see negative reinforcement used by clicker trainers. It is not usually used alone, where the only reinforcement is the removal of the stimulus. More often it is combined with a click and reward so that the animal is motivated to find the right answer quickly and the use of the stimulus is kept to a minimum.
Because I came from a training system that used negative reinforcement without positive reinforcement, this seemed like a significant improvement over what I had been doing. And combining negative and positive reinforcement in this way can give very good results. But as I learned more about clicker training, I found that a lot of clicker trainers didn't like negative reinforcement and did not want to use it at all. So I had to start re-evaluating what I was doing. I wanted to learn more about why negative reinforcement was controversial and look at my own training to see if I was just unaware of unwanted side effects of my current training methods.
I think there are a few reasons that some clicker trainers avoid using negative reinforcement. I am listing them here as "possible problems" because I think they are aspects of negative reinforcement that need to be taken into account when putting together a training plan. They don't automatically happen when you use negative reinforcement, but they can if you are not careful.
Possible problems with using Negative Reinforcement:
1. Negative reinforcement is closely linked to
2. Negative reinforcement does not create the same kind of thinking and operant animal that positive reinforcement alone does because there is a "pressuring" stimulus that precedes the behavior.
3. Combining negative reinforcement with positive reinforcement can lead to poisoned cues.
4. Animals can become desensitized over time so that negative reinforcement is less effective and this can lead to it becoming more aversive
The Link Between Positive Punishment and Negative Reinforcement
Looking at each item individually, I want to start with the fact that often negative reinforcement is closely linked to positive punishment. If you go back to the list of examples again, you can see that in some cases, negative reinforcement is occurring as a product of the environment or situation (the bright light, the headache, the full bladder, the itchy eyes). But in other cases, a stimulus or aversive event has been added to prompt a specific behavior. This stimulus is removed when the desired behavior occurs. Some of the examples that show this are the seat belt buzzer in the car which prompts you to buckle your seat belt, the fire alarm which prompts you to move, and the addition of pressure which prompts the horse to make a change. These examples show that in order to use negative reinforcement, and especially if you want to do so in a training situation, the trainer has to control both the addition and the removal of the negative reinforcer. And this is where it gets a bit difficult for clicker trainers.
It gets difficult because when I choose something to add, I have to choose something that the animal will want me to remove, at a level that will prompt the animal to change its behavior. This means that the addition of that stimulus can act as a positive punisher by decreasing the behavior the animal is doing at the time I apply it. In my reading I saw the terms positive punisher and negative reinforcer used to describe the same stimulus at different times. I think one stimulus can be either or both, and the terminology depends upon how the target behavior is affected. This will get clearer as I continue. In order to know what is happening, I have to evaluate what behaviors are increasing and decreasing starting from the time the stimulus is applied. That includes looking at the behavior that was happening when I applied the stimulus as well as what behavior is reinforced by the removal of the stimulus. And I have to evaluate the emotional state of the animal as well.
What really matters is that I have to realize that in order to make the animal try something different, I have to discourage the animal from continuing to do what it is already doing. And if the behavior that is occurring when the stimulus is applied decreases, then I am using punishment. This connection between negative reinforcement and punishment is a real problem for most clicker trainers. Clicker trainers know that using punishment is unpredictable and can suppress all kinds of behaviors, not just those at which it was directed. If I am trying to create a training environment where an animal feels free to offer things and experiment, then I don't want to be suppressing behavior.
There is a whole section in this article on why positive punishment is not recommended, so it would make sense if I just said negative reinforcement had the same problems as positive punishment and should not be used. And, in some ways, that is the safe approach to take. If I feel an animal I am training will be adversely affected by any form of punishment, then I have to think carefully about using negative reinforcement too. The first time I heard about the connection between negative reinforcement and positive punishment was in a lecture by Kathy Sdao and it seemed so horrible, to think I was punishing my horse all the time. It also made me feel a bit defeated because if negative reinforcement was so bad, then I was stuck because I didn't know how to do enough with only positive reinforcement. Since then I have had more time to think about it and I have realized that while we talk about training using one quadrant or the other, training in real life is not so clear cut. There is a lot of "gray area" and many variables that can be adjusted to allow us to use the different quadrants of operant conditioning in ways that still maintain a positive training environment.
But I did have to ask myself why it was worth continuing to use negative reinforcement if it involved punishment? I think there are a lot of reasons and here are a few of them. One is that negative reinforcement is so much a part of horse training, that it is hard to escape it unless I throw away everything I know and start over from scratch. And I also think that if I am going to be physically connected to my horse in some way, by a lead rope or because I am sitting on him, I want that horse to understand about pressure and release. Understanding pressure and release makes it easier to teach tactile cues and let's me tap into the horse's awareness of body language and the horse's natural tendency to adjust to changes in my position when I am riding. Additionally, I think that it is possible to use negative reinforcement while avoiding, or at least minimizing, the drawbacks of using punishment. Even now writing this, I am uncomfortable with the idea of using punishment on my horses. But I have to remind myself that punishment is just decreasing behavior and that negative reinforcement done well is more about reinforcement than punishment.
The problem, as I see it, is that it takes a lot of skill to find the right balance so that when I use negative reinforcement, I am using the minimal amount of positive punishment to generate a change in behavior, but not so much that all behavior decreases. I want to use just enough to ask the horse to look for other options without shutting it down or becoming frightened or defensive. This also explains why there is such a range of training that falls under the category of negative reinforcement. Each trainer and animal is going to have to find their own blend that works for them. Very skilled traditional horse people are often those that can find this balance. People who are ineffective or end up being abusive often end up on heavy on the positive punishment side because they are paying too much attention to suppressing behavior and not enough attention to what they do want.
It would be nice if there was some reliable way to teach how to find the right way to use negative reinforcement with every horse, but there are so many factors that each situation is different. Luckily the horses give us good feedback if we let them and can help us find the right stimulus to generate change, but keep the horse thinking and looking for the right answer. In the next article I am going to share some ideas about choosing the right stimulus, evaluating your use of negative reinforcement, coming up with new training plans that don't rely on aversives. I think that clicker trainers can successfully use negative reinforcement if they use it carefully and their focus is on helping the horse find the right answer.
Does Negative Reinforcement create animals that just follow directions?
The second possible problem with using negative reinforcement also comes from the fact that in order to use negative reinforcement, the trainer often starts by applying an undesirable stimulus. I have already written about concerns because the stimulus can be an aversive, but even if the stimulus is not aversive, it changes the dynamics of the training. Instead of the animal offering behavior and trying to figure out what the trainer wants through experimentation on its own (with the click as feedback), the animal is now depending entirely upon the trainer for direction.
This argument is similar to the argument against luring. If dogs are trained exclusively with luring, they can become dependent upon the lure for information and never learn how use the click for information. Any kind of dependence upon prompting to get the behavior has the same problem and could create animals that are not good problem solvers or creative thinkers. In part my answer to this comes down to saying that an animal that follows directions might be preferable for some people and that by balancing the use of negative reinforcement with other exercises that do encourage creativity and attention to the click, you can end up with animals that can be trained using both methods.
Alexandra Kurland refers to guiding the horse through negative reinforcement as "directed learning" and it means that the animal is being guided through the process with physical help from the trainer. This is not necessarily a bad thing and in some cases, I find it preferable to making the animal guess. I think that using negative reinforcement to guide an animal through learning a behavior can provide some structure and direction so that neither trainer nor trainee get frustrated by taking too many wrong turns. I find that using a lot of negative reinforcement (combined with positive reinforcement) makes horses good at following directions and they can still be happy in their work. But it does require a different set of skills on the part of trainer and trainee, creates a different relationship, and your animal might not end up with the same level of problem solving and thinking skills that it might have if more free shaping was done.
I mentioned that most clicker trainers who use negative reinforcement use it in combination with positive reinforcement. There are a lot of advantages to this approach and just the addition of positive reinforcement through clicking and reinforcing can change the dynamics of negative reinforcement so that it is more clicker friendly. Done well, the animals learn to accept the use of negative reinforcement as another way to generate behavior, similar to using a prompt such as a target stick or other object and it can look very much like training with positive reinforcement alone. But done poorly, the punishment aspect of negative reinforcement will overshadow the addition of positive reinforcement and there can be a lot of emotional fallout.
Avoiding this emotional fallout is one reason that some trainers don't like to use negative reinforcement and there is now some research being done on "poisoned cues." A poisoned cue is one that has been trained with a combination of positive and negative reinforcement and it was first described by Jesus Rosales-Ruiz at the University of North Texas. He found that when a dog was trained with an aversive (through negative reinforcement) it showed different emotions during those training sessions than when it was trained with only positive reinforcement. This has led to more evaluation of the use of negative reinforcement in clicker training and he found that a lot of animals seem to have poisoned cues that can be identified by the animal's unhappy or reluctant response to the cue.
The work on poisoned cues has made clicker trainers think even more carefully about using negative reinforcement. There is more information on poisoned cues available on the internet where you can read about the study and its conclusions. I am also going to write about it more in the next article when I cover cues and negative reinforcement. This is an area that needs to be studied more (the original study involved one dog) and I think it is too broad to say that any combination of positive and negative reinforcement leads to poisoned cues. The poisoned cue study used a particular application of negative reinforcement where it was applied more as a punisher. I think one has to look carefully at how the trainer uses negative reinforcement and that it is possible to use negative reinforcement and keep a positive training environment. I know from using Alexandra Kurland's work, that it is definitely possible to train using a combination of positive and negative reinforcement and still end up with happy and operant animals. I just think you have to be careful about how you do it.
Does Negative Reinforcement become less effective over time?
The last thing I want to mention is an aspect of negative reinforcement that can lead to problems for any trainer. The reason negative reinforcement works is because the animal is willing to change its behavior to remove a stimulus. The challenge for the trainer is to find the right stimulus that motivates the animal to change, but keeps it in thinking mode so that the animal is learning and not just responding automatically. A good trainer has to determine what each animal needs in any given situation and adjust accordingly if things are not going well. The trainer has to be aware of what is effective in each training session, and also be aware of any changes that occur over time.
This is important for because over time the animal can get desensitized to the stimulus if negative reinforcement is not used correctly. If I use my leg to ask my horse to go forward and I keep using my leg after the horse goes forward, my horse might eventually learn to ignore my leg. Through bad timing or overuse, the leg no longer has any meaning and the horse either learns to ignore it or starts to act out against it. In both cases, the tendency is to make the leg cue stronger so the horse responds to it again and this starts the trainer down the slippery slope of escalating pressure. We have all seen this in some riding programs where the rider and trainer keep going to more and/or stronger equipment. They may have started out with something quite mild, but over time they end up using negative reinforcement in a way that more closely resembles punishment.
The solution, of course, is to go back to the basics of educating the horse so that the milder stimulus now has meaning for him. This is the same way one would avoid the problem in the first place and one reason I think clicker training combines well with negative reinforcement. By using clicker training from the beginning, the trainer can make a very mild stimulus meaningful to the horse and avoid the need to escalate. Not only does the actual use of the click and reward help to motivate the horse, but because clicker trainers are taught to watch for small changes, they are going to recognize any little step in the right direction and this means the stimulus is going to be applied for less time and have less chance to escalate.
I also think that experienced clicker trainers recognize when an application of negative reinforcement is not the best training tool for a situation before they get into trouble. Clicker training gives me lots of options for ways to build behavior and if I am training using negative reinforcement and it is not working, I am more likely to completely change my strategy. If I was limited to using negative reinforcement, I might not realize that I had other options and that escalating was my only choice. Traditional trainers sometimes fall into this trap. There are always other ways to get behavior and a skilled user of negative reinforcement knows when to use it and when to do something else.
Because of the association between negative reinforcement and positive punishment and these other concerns, there are some clicker trainers who use negative reinforcement and others who do not. It seems to be quite controversial and I have met trainers who feel any use of negative reinforcement is contrary to the philosophy of clicker training and others who have modified it so that they get the benefits of negative reinforcement while minimizing the punishment aspect. This question about whether or not clicker trainers use negative reinforcement is part of the reason I wanted to write this article.
I wanted to gather together the information I had (and could find) and present it in such a way that people could make their own choices about using negative reinforcement. I am trying to present both sides, but since I do use negative reinforcement myself, I am understandably biased in that direction. I do think that people who choose not to use it have valid reasons and there are situations in which I would choose not to use it too. We are all influenced by our past training experiences and I think that is important to recognize that different things are going to work for different people. Since i have spent time on the possible problems with negative reinforcement, I do want to explain some of the advantages to using negative reinforcement with horses.
Advantages to using negative reinforcement with horses
I think most of the downsides to training with negative reinforcement can be managed or minimized by adding positive reinforcement (via clicker training) to the training program. By adding positive reinforcement, a good trainer can keep the stimulus below the aversive level, keep the animal motivated so that the trainer does not have to escalate, and clicker trainers can learn to use negative reinforcement as a constructive tool to help the animal find the right answers. I like to think of using it for physical guidance and feedback in a way that complements positive reinforcement.
Physical guidance and feedback are important for horse trainers because in riding and groundwork, we are often connected to the horse through our bodies or our equipment. Unless I am only working with horses behind barriers or when they are loose, I am going to spend some time physically connected to my horse. Because of this, I am often put in situations where negative reinforcement is the best tool available to me. I am going to end up using pressure and release, weight shifts or some other kind of body language to communicate with my horse.
For this reason alone, I think anyone working with horses benefits from being skilled at using negative reinforcement. The more familiar my horse is with negative reinforcement, the easier it is going to be to use it as a training tool or as a management tool when I need it. I think a big part of using negative reinforcement with clicker training is learning the subtleties of using negative reinforcement so that I can become an educated user. This might mean changing how I apply it or learning some new strategies that are compatible with clicker training but it doesn't mean giving it up altogether. I am not sure that we can, even if we wanted to, so we might as well learn to use it.
Learning to use negative reinforcement well has other advantages. As I stated earlier, a lot of traditional horse training is based on negative reinforcement. Horses are a bit different than some pets in that they tend to have more owners over the course of their lives. Since horses are often bought and sold, there are advantages to training a horse so that it understands pressure and release and can be ridden in a more traditional way. The same goes for some owners and trainers. Many people don't want to throw away everything they know and start all over. Using negative reinforcement allows the trainers to feel like they can keep using the skills they know and just add something new to make things better. This makes the owners and trainers feel like they are more compatible with the rest of the horse world too.
Using negative reinforcement also allows the trainer to tap into some of the natural qualities of horses. Horses are very sensitive to pressure, changes in body language and feel. I can take advantage of this by using negative reinforcement to prompt the first building blocks of a new behavior. Negative reinforcement seems to be a very natural way to teach a horse about the physical connection between horse and rider including tactile and pressure cues.
I also think that using negative reinforcement is easier for some people as it allows the trainer to teach behaviors using cues right away, which seems to be more natural for many people. In most traditional training, some kind of cue is used from the beginning, whether it is through luring or physically manipulating the animal. People like to think the cue makes the animal do the behavior and people often ask me what cue I used to teach a new behavior. But in clicker training, we teach the cue last and this is very confusing for some people. They don't see how I can train the behavior without a cue. I think people also get confused or are not sure they like the idea that we also allow animals to offer behaviors without being cued at certain stages during the training process. Some people are uncomfortable with that too. But when I train with negative reinforcement, I can set it up so that I have a working cue right away that is similar to the final cue. Some people are more comfortable with that.
For example, a lot of cues used by horse people will develop naturally through the training process if the behaviors are trained with negative reinforcement. It is common to teach a horse to back from slight pressure on its chest. I can teach this by putting my hand there and waiting for the horse to shift its weight back. As the horse shifts back, I remove my hand, click, and reinforce. I often click and remove my hand at the same time, but I can change that depending upon what I am trying to accomplish. For the horse, this is not an aversive process (unless I attempt to push it back or poke it or do something else that is unpleasant). But over time the horse learns to back from a touch to the chest. When the behavior is done, I now have the completed behavior and it is on cue. This is easier for some people than the idea of training the behavior and then adding the cue. For people who are crossing over to clicker training from traditional training, this makes more sense to them.
This ends the discussion of the four quadrants of the training grid as separate training tools. I hope that the terminology is becoming more familiar and you can start to see when you using each quadrant in your training. I tried to use simple examples so that the difference between the quadrants would be clear, but in real life, it is not always so simple. So I want to take a look at a few possible areas of confusion and work through some examples. The examples are going to show how to identify the quadrants being used based on whether or not I add something and if behavior increases or decreases. However I am also going to point out that it is important to look at it from the horse's point of view. What quadrant you are using is important, but it is not everything. When I asked around for tips about helping people learn to use different quadrants appropriately, the most common response I got was that good trainers learn to read their animals and let their animals tell them when something is or is not working.
Translating Theory to Real life: Sources of confusion in determining what quadrant you are you using
Determining which quadrant you are using is not always straightforward. There are some ways that people tend to get confused. I think a lot of it is the terminology, and I have remind myself that positive and negative are just adding and subtracting. But it is also gets complicated because in real life, there tend to be many things happening at once and while a trainer might be focusing on one behavior and deliberately using one quadrant, operant conditioning might be happening in other ways too.
I spent a long time on this section trying to think of areas of confusion and how to make things clearer to new clicker trainers. As I stated in the beginning, I am not an expert in this field. I am just someone who has put some time and energy into learning about it and since I have found it helpful, I wanted to share. I have been on some lists where members get into very technical discussions about which quadrant they are using. These are trained professionals and they do not always agree. This could make me wonder about spending time on the whole thing, but instead it made me realize that this is just not an absolute science. Maybe it can be in the lab, but out in the real world, there are too many variables. All the quadrants can be used on so many different levels. Just think of the many ways punishment can be used ranging from a minor annoyance to extreme physical harm. On one of the lists, someone wrote that which quadrant you use is not a question of morality and I thought that was a good point to remember.
The point of this section is not to over-analyze every little training decision and give us all headaches. It is really to help people start thinking about how to look at training situations from different points of view or to see the larger picture. Sometimes I find that clicker trainers are so sure that their click and reward is driving the behavior, that they don't look at all the other stuff that is happening. We like to think that we are the most important part of the training picture, but sometimes we are not. And we like to think that we are choosing the reinforcement, but sometimes we are not. I think there is value in being able to analyze a training situation in operant conditioning terms because that can be helpful for troubleshooting and it is a good mental exercise.
Do I analyze every little training choice I make when I am out there training my horses? No, of course not. I try to focus on using positive reinforcement, observe my animal for feedback and adjust as needed. Clicker training can be as simple as rewarding what you like. But for those times when it is not, it helps to have thought about a lot of different ways to increase and decrease behavior, the possible consequences and how to put all the aspects of operant conditioning to work for you.
Some of the most common areas of confusion are getting distracted by what happens before the behavior, the difference between positive and negative reinforcement, and the difference between positive punishment and negative reinforcement. I am going to discuss each one in detail and with some examples.
Remember it is what happens AFTER the behavior that matters
Prompts and cues are important and in most cases the trainer has some kind of training plan that includes a way to start the animal on the right path toward the behavior. This is fine, but we don't want to start thinking of cues and prompts as being what increases, decreases, or maintains behavior over time. There can be confusion over cues because if you ask someone why their dog sits, they are likely to say the dog sits because they told it to. Yes, that is a learned cue, but the reason the dog sits is because it has learned that sitting when you say "sit" is a good idea because it can either earn a reward or escape a punishment. If you discontinued either of those, the behavior of sitting on cue would decrease over time. The cue is not maintaining the behavior, it is the consequence that is maintaining the behavior. Since operant conditioning is about consequences, the use of the cue is irrelevant.
This shows that when evaluating my use of operant conditioning, I have to remember that operant conditioning is about consequences, so the addition and subtraction of stimuli happens AFTER the behavior. The traditional view has the trainer making things happen, and there is a lot of emphasis on prompting, luring, or somehow generating behavior, all things that happen BEFORE the behavior occurs. I have to be able to set that aside and realize that I only want to look at what happens after the behavior.
In the same way that a cue does not reinforce the behavior that follows, a prompt does not either. If I am training a dog to fetch, I might use an object (such as a ball or toy) to get the animal to offer some behavior. The ball is a prompt and while it has been "added," that has nothing to do with what kind of operant conditioning I am using. If I click and reinforce the dog for touching the ball, I am using positive reinforcement. If I yell at the dog for sitting on the ball, I am using positive punishment. If I remove the ball and leave because the dog is chewing on my pant leg, I am using negative punishment. And if I hold the dog in place until it tries to go toward the ball, I am using negative reinforcement. These are possible scenarios based on the assumption the dog wants to be trained and is interested in interacting with the ball. They are not necessarily examples of good training.
When I am training one behavior, this is pretty simple. My dog sits and I give it a dog biscuit. This is positive reinforcement. But what if I ask my dog to sit, then I ask it to lie down and then I give it a dog biscuit. This would be a sequence of behaviors. The lying down is positively reinforced by the dog biscuit. Is the sit reinforced? I have to look at what follow the sit. In this case, the sit is followed by the cue "down." If the dog has been positively reinforced for lying down on cue, then the cue "down" will actually reinforce the sit. This is because the dog's perception of the cue "down' is that it is a good thing since it could lead to reinforcement. If you are new to clicker training, this might seem confusing but I bring it up here because I want to make it clear that many different things can act as reinforcers and giving a new cue can reinforce the previous behavior if the dog has a positive response to hearing the cue.
Is it positive or negative reinforcement?
This seems like it could not possibly be a source of confusion at first, but there are situations where it can be hard to figure it out and I find that people do get confused about whether they are adding something "good" or taking away something "bad." This is particularly true if both are coming into play. Sometimes the trainer is focused on using positive reinforcement and negative reinforcement is actually what is driving the behavior and vice versa. This is one of those times where I think you have to look at the situation and consider both the technical definitions and the animal's point of view. Does the animal think you are removing something or adding something?
If I am using food rewards, it is easy to identify when I am using positive reinforcement. But if I am using a different kind of reinforcement, it can get more complicated because I have to figure out if I am adding something good or taking away something bad, and this is from the point of view of the horse. It sounds simple, but it is not always. In some ways, it doesn't matter as both are reinforcement, but it can make a difference in troubleshooting if the behavior deteriorates. If you think back to some of the examples of negative reinforcement, there were some where one could look at the reinforcer as being the removal of something or the addition of something.
Did any of you read the link to the car assembly line story? A company had an assembly line that was too slow and they could not add reinforcement by providing extra pay or some other incentive. So they told the workers that if they completed enough units in less than an hour, they could take the rest of the hour as a break. The site lists this as an example of negative reinforcement because by working harder, they removed the requirement to work the rest of the hour. When I first read this, it seemed like it was just as possible that the workers were positively reinforced for working harder by getting a break.
But I think the reason this is negative reinforcement comes down to the fact that an aversive was added (nagging to work harder) and it was reinforced by the removal of some work time. The reason this works is because the workers had an expectation of how long they had to work in the first place. The boss removed work time from the hour, he did not add time by giving a break. Maybe this is a matter of perspective, but I think it is more than that. What if the boss had said that for every hour they completed "x" number of units, they could take a break but they still had to work a full 8 hour day so the break time ended up increasing the length of their workday. That would not be removing anything in the sense of work to be done. It would just be adding something positive (a break) so it would be positive reinforcement. Would it have been as successful? Somehow I don't think so.
I think there probably was some positive reinforcement going on too, because if the workers did things they liked (ate, sat down, etc...) during the break, then it is not just the removal of work, it is also adding positive things. But I would guess that was not what was driving the behavior. It was the removal of work that was most important. I started to think about if this applied to horses, but it is harder with horses because we cannot make verbal agreements ahead of time. The best I could come up with was if I trained a horse to expect to have to complete a certain behavior or pattern, letting the horse stop early would be negative reinforcement. The thing is, I don't see how this would work over time because as soon as I changed the pattern, the horse's expectations would change.
Most of the time when we use negative reinforcement with horses, we are thinking of removing something in order to reinforce the behavior we do want. Recently someone pointed out to me that any time we work with our horses, we are using negative reinforcement, and this could be true, but it is on a different level than the car example above. When I ride or ask a horse to do something that requires energy, allowing the horse to stop can be reinforcing. This works because stopping removes any discomfort associated with moving. It is not because the horse is reinforced by not having to continue moving. In this scenario I don't think the horse can be reinforced by removing something that hasn't happened yet.
And I am not talking about pain or saying horses don't want to move, but rather than any kind of exertion beyond the horse's fitness level is going to cause some discomfort that is removed by allowing the horse to stop. Of course, we have to be careful about making assumptions here because horses don't always find stopping reinforcing. There are going to be some days when they would prefer to move. And stopping might not just be about negative reinforcement. I have a horse that loves to look around. When she gets to stop, she gets to look around. So if she does something I like and I let her stop, there are times when she is being negatively reinforced for stopping (by the removal of any aversive associated with moving) and times when she is being positively reinforced by permission to look around. I think being aware that both things are happening at once and being able to evaluate which is driving the behavior can be an important part of being an effective trainer.
Perhaps this is all making it seem too complicated? Do you really need to know if you are using positive reinforcement or negative reinforcement? I think most of the time when we train, there are multiple reinforcers at work and it is not worth overanalyzing every situation, but I do sometimes find it useful to do so when my training plan is not producing the desired result. It helps me to be able to tease apart the various reinforcers in any one situation so that if I hit a training bump, I can work through it.
Is it positive punishment or negative reinforcement?
Positive punishment and negative reinforcement are the two quadrants where I add an aversive stimulus. In one case I am applying the stimulus to decrease an unwanted behavior and in the other case, I am using the stimulus to get the horse to do a different behavior. I could just say the difference is a matter of timing, intent and feel, but that is assuming that the horse can correctly read my intention and I think it is incorrect to assume that just because I am not focused on punishment, it is not happening. I have already explained that the use of the negative reinforcer can also be a positive punisher so when I want to distinguish between positive punishment and negative reinforcement, I really mean learning to tell the difference between positive punishment, the positive punishment/negative reinforcement combination and negative reinforcement without positive punishment.
Most of the time when I use negative reinforcement, it falls into one of two general categories. I am either using it to teach a new behavior or I am using it to interrupt or redirect a horse that is performing an undesirable behavior. In the first case, I am using negative reinforcement as a teaching tool and I don't want the horse to feel punished. I don't necessarily dislike the behavior the horse is currently doing. I just want to teach it something new and my aim is to use negative reinforcement without positive punishment. In the second case, I don't want the current behavior and I want to use negative reinforcement to ask the horse to do something else. It is in this situation that it gets tricky separating out negative reinforcement from positive punishment because I probably do want to decrease the behavior the horse is doing. This could happen through punishment or that behavior could decrease because it is not possible for the horse to do both behaviors at once.
The question is what is happening when I use an aversive to stop a horse from doing something. In general, when I respond with an aversive after the behavior occurs, I am using punishment. When I use an aversive before the target behavior occurs and remove it when the horse changes its behavior in a way we like, I am using negative reinforcement. But what if I apply the aversive as the behavior is still occurring? I think this is where the choice of positive punishment vs. negative reinforcement can make a difference. In both cases, I am adding a positive punisher to interrupt the unwanted behavior. But I could be using either positive punishment or negative reinforcement depending upon what I use as an aversive and how I apply it. I think it is important to know which I am using because it is much easier to get long term changes in behavior by using negative reinforcement than it is by using positive punishment.
So how do I know which I am using? I thought it would be helpful to come up with some guidelines, but every time I wrote something down, I could think of an exception. I would like to help people identify when they are using punishment vs. negative reinforcement because I think using negative reinforcement makes the trainer focus on what the trainer wants the horse to do. I can think of lots of examples of ways to use positive punishment that are less aversive than some applications of negative reinforcement so I am not saying that negative reinforcement is always the best choice. I just think that starting to address problem behaviors in terms of redirecting the horse is a change in the right direction. I also think that if I am thinking in terms of using negative reinforcement going into a situation, I can use a mild aversive with negative reinforcement and avoid putting myself in a situation where I end up using punishment.
I did come up with the following questions that I ask myself (these are guidelines, not rules):
1. Am I using the aversive to make
the horse stop doing something, or am I asking the horse to do something else
2. What is my timing? Do I apply the aversive once and expect the horse to change, or do I maintain the aversive until the horse does change? This can be a tricky one because stopping an unwanted behavior is a change, but it is not the horse choosing another behavior. I find that when people use punishment it is usually quick and over with, and then they wait to see if the horse does it again. Or they continue using punishment after the horse has stopped doing the undesirable behavior. With negative reinforcement, the aversive is applied until the horse chooses to do something else. This gets into a fundamental problem with punishment which is that often by the time people respond with punishment, the horse is already doing a different behavior and the punishment is ineffective.
3. How strong is my aversive? Usually if I am redirecting, I can use a milder aversive than with if I am using punishment. I think a common pattern with punishment is that the stimulus has to get more and more aversive over time to be effective. This happens when the stimulus is used ineffectively and inconsistently. With negative reinforcement, the stimulus gets less aversive over time, but still remains effective.
The above questions are useful for analysis after the fact, but I don't usually have time to think about them when I am responding to something my horse is doing. And since a lot of unwanted behavior happens outside of formal training time, there can be a lot of other variables. But I think starting to separate out ways to use negative reinforcement instead of positive punishment, even if it is the positive punishment/negative reinforcement combination is a useful first step away from just using punishment. There is a lot of gray area here, at least for me. Depending upon how I look at it, I seem to be able to analyze it in different ways. The only way to really know which I was using would be to have a good way of monitoring future behavior to see if I am getting a decrease in the unwanted behavior and/or an increase in another behavior. I could do this in the controlled environment of a lab, but I don't think I can do it in real life, other than to look at general trends. Even if I am not sure what I am using, using the aversive to redirect the horse is usually going to be a better training solution than just telling it to stop.
Another way to think about it would be to compare the following example. I am training my horse to back up when I take the slack out of the lead. I am in my ring calmly working away and things are going well. My horse is interested in working with me and I am using just enough feel in the line to get his attention. I am using negative reinforcement but the aversive quality of it is pretty low. The chances of seeing a decrease in the starting behavior (standing still with nose forward) are pretty small and if it does decrease, it is not because it is being punished, but because I am reinforcing an incompatible behavior. I would say that for all practical purposes, I am using negative reinforcement without punishment.
But what if the training environment changes because the wind comes up, or the horses in the field next door start running. My horse is no longer happy to stand and work with me. He wants to move. Now the change in the feel of the line becomes aversive and because of the horse's energy level, I might have to take a stronger feel. I am still using negative reinforcement but the pressure on the line might now be acting as a positive punisher too. This is because my focus might have changed and the horse's perception of the stimulus might have changed too. When I started the feel on the line was just me asking for a request. Now it is preventing him from doing what he wants. So the balance between the positive punishment and the negative reinforcement has changed because of changes in my use of my application of the stimulus and the horse's mental state which affects his perception of the stimulus. Now instead of using negative reinforcement to keep working on backing, I might find I am using the lead to stop the horse from barging forward or running around me. If this happens, I am now using punishment too because I actually do want the new behavior (barging or nervous activity) to decrease.
I still think this is different than if my horse starts barging and running around so I jerk on the lead rope, or yell at him, or smack him. Even though my goal with the upset horse is now to decrease the agitated behavior, I am still using negative reinforcement as a tool to get a behavior I want. It just has a component of punishment included in it. Do I need to worry about punishing the horse? I think it depends. If the horse is new to learning this behavior, I would be better off to ask for something he does know instead of continuing the lesson. If I continue, there is the possibility that the amount of aversive that I have to use is going to create long term effects in his perception of the rein cue, or that particular exercise, or even what happens when he gets excited.
You might wonder what is the point of teaching with negative reinforcement if I can't use it when I need it. But that is not the point I am making. The point I am making is that when the horse is learning a cue with negative reinforcement, I want to be careful about using it in certain situations. This is why it is worth practicing some of these cues that are based on negative reinforcement over and over again. I want the behavior to be so automatic when I ask, that I can use the cue without having to make it more aversive. I think this goes back to the value of practicing basic skills until they are really solid. If I take the time to train a behavior with negative reinforcement until the behavior is really solid, then the behavior will be there when I need it and I can ask for that behavior instead of using punishment. In most cases, I find it is less stressful on everyone to ask a horse for an already trained behavior than it is to just try and stop an unwanted behavior.
I want to mention one other difference between positive punishment and negative reinforcement and I think this makes a difference in the why clicker trainers can use negative reinforcement, but don't like to use positive punishment. Besides positive reinforcement, negative reinforcement is the other quadrant of the operant training grid where the animal has some kind of choice. When I use positive reinforcement, the training is driven by the animal wanting to repeat the behavior. It has the choice of whether or not to engage with us. In negative reinforcement, the animal also has some level of choice, meaning it has some control over how the trainer uses negative reinforcement. This is different than positive and negative punishment where the animal does not have any choice. If it is behaving in a way we don't like, we either add or remove punishment, often in a predetermined way.
Some people might argue with the use of the word "choice" because the trainer chooses the stimulus. But in good training with negative reinforcement, the animal can change some aspects of the application of the stimulus through its own response. For example, if I put a rat in a box and a shock is applied, the rat can learn that jumping across the barrier removes the shock. Depending upon the intensity of the shock, the rat's tolerance, and any other relevant factors, the rat can control how long it is shocked. If the rat hates being shocked and the barrier is easy to jump, it is going to jump as soon as it feels the shock. It might even find a way to avoid the shock entirely if there is some way to predict when the shock is coming. This is what I mean by choice because the rat makes a choice about what to do based on the intensity of the stimulus, the presence of a warning (or predictor of the stimulus), and the difficulty of the behavior that would remove the stimulus.
The fact that the animal can learn to escape from or avoid the stimulus is one of the things that can happen when using negative reinforcement. This is called the escape-avoidance aspect of negative reinforcement and I am going to talk about it in more detail in the next article because I think it has implications for choosing aversives and the use of cues. The rat that learns to jump over the barrier as soon as the shock is applied has learned to "escape" the stimulus and this is called escape. If the rat takes it one step further and identifies a way to predict the shock, and jumps before the shock happens at all, this is called "avoidance."
So in some applications of negative reinforcement, the animal can make choices about its own exposure to the aversive stimulus. This is not true with punishment. Punishment is often used in such a way that the animal cannot change the nature of the punishment once the punisher decides to deliver it.
Some Real life Training Examples
Let's look at a couple of scenarios and see how to recognize what quadrants are being used and how a trainer can adjust for that to get the training result she wants. I want to point out that there are some gray areas here. I still get a bit tangled up sometimes thinking about what is happening, but I find it a useful mental exercise and it does get me thinking about all the possible things that could be going on. I think that using the technical definitions is helpful for analyzing what part of operant conditioning is being used, but that is not the only thing that determines how a training strategy affects the horse. It is not just what quadrants you use: it is how you use them.
I am going into quite a bit of detail here, which may seem unnecessary, and for a lot of training situations this kind of over-analysis is not necessary. But if you clicker train enough horses, eventually you will meet one for whom all these little details matter. Being able to look at the situation from all sides and considering all the possible quadrants in use is going to be important in working through training challenges with some horses.
Case 1: Using positive reinforcement to increase one behavior so that an unwanted behavior disappears.
I want my horse to stand quietly by the corner of the gate so I can open it without him pushing or barging to get out. The normal situation is that I walk out to the field and the horse is hanging over the gate so I cannot even open it. As soon as I get it open a bit, he tries to squeeze through the opening and is generally rude about the whole thing. I want to use positive reinforcement to teach the horse an acceptable behavior, so I put a target on the fence near the corner of the gate and reinforce the horse for standing there. I do this by walking the horse out to the field and spending time reinforcing him for standing at the target. I do this as a separate exercise, so I am not doing it in response to the unwanted behavior and I am doing it at a time when the unwanted behavior is not likely to occur.
When the horse has learned the new behavior and I have put it on cue, I ask the horse to do it as part of our daily routine. I walk out to the field and the horse is hanging over the fence. I ask him to go to the target and reinforce it. Over time the horse will probably start going to the target when he sees me coming and the gate hanging behavior will have decreased. In this scenario, I have increased the behavior of standing quietly at the target through positive reinforcement by clicking and treating the horse for being at the target. I have decreased the behavior of hanging over and barging at the gate.
Have I punished the behavior of standing and barging over the gate? Well, it has decreased because the horse cannot be at the target and hang over the gate at the same time. This was the goal of teaching the horse to go to the target, so it is a desirable outcome. But has it been punished?
Because the horse had to choose between hanging over the gate and going to the target, punishment is not necessarily the reason for the decrease in the gate hanging behavior. Animals make choices all the time about which behavior to do and this can be as simple as choosing the behavior that is more likely to be reinforced. This is called the Matching Law and states that if people or animals are given a choice between several behaviors with different reinforcement rates, they will choose those behaviors with the higher reinforcement rate. Even when behaviors are reinforced at different intervals, people and animals will find a way to maximize their reinforcement through some combination of behaviors.
Gate hanging was not punished here because I taught an incompatible behavior with a higher reinforcement rate, and I did this as a separate exercise. I did not redirect the horse from the gate to the target until he knew the new behavior. If I go out and the horse is hanging over the gate and I cue the horse to go to the target, I still don't think I am punishing the behavior of hanging over the gate because in the horse's mind, this is all about going to the target, not about being over the gate.
I could have decreased gate hanging by using an aversive and it might have initially seemed simpler and quicker. I could have shooed the horse away from the gate with my arms or a whip. I could have made the horse back up when it crowded me or hung over the gate. In those cases I would have used positive punishment or positive punishment/negative reinforcement. Even training the horse to go to a target did not guarantee I would not be using aversives because what if I taught the horse that being at the target was a good thing, but then used pressure to send the horse to the target? That would have been better than just chasing the horse away from the gate, but it would have changed the meaning of the target. Instead of the target just being a place to go and earn reinforcement, the target would be a way to avoid punishment. Interestingly enough, I don't think that makes the targeting behavior stronger. I think it adds tension and anxiety and might make the targeting behavior weaker. This goes back to poisoned cues and what happens when you combine positive and negative reinforcement.
Case 2: Using negative reinforcement combined with positive reinforcement to increase a behavior
A rider is training a horse to walk forward from a halt when she uses her leg. This is a standard example of how negative reinforcement is used in horse training. She applies her leg aid which is some kind of pressure (tap, squeeze, kick) and when the horse moves off, she releases the pressure. The horse is able to remove the aversive (leg) by walking off. Walking increases and standing at a halt decreases. To keep this simple, let's start by looking at what happens when she is using negative reinforcement alone, and then add in positive reinforcement later.
What happens to the behavior of standing at a halt? I already said it decreased. This could be for two reasons. It could decrease because the horse cannot walk and halt at the same time. Or it could decrease because the aversive has made the horse actively avoid halting. In both cases, it meets the technical definition for punishment (adding something decreases behavior) but to the rider, they are going to have different feels and outcomes and they are going to create different emotional responses in the horse.
Looking at what happens to other behaviors when you increase one behavior is important. The rider needs to recognize that training the horse to walk is going to make the horse want to spend less time standing at a halt. She needs to understand that punishment could be happening whether she intends it to happen or not. Is the horse spending less time standing because it cannot walk and halt at the same time and it is choosing the behavior with the higher reinforcement rate? Or is something else going on. She needs to ask herself two questions. Does the horse feel punished for standing? And how do I increase one behavior without losing another?
So is she using punishment or does the horse "feel" punished? I think this depends upon how well she has prepared her horse. If the aid is applied in a mild way and she removes it as soon as the horse walks off, then any use of punishment is probably minimal. But if she hasn't prepared the horse and applies enough leg until the horse wants to get away from it and moves off, then there is probably more punishment involved (in conjunction with the negative reinforcement.) If she is using only negative reinforcement, the horse will tell her how aversive the leg cue is and if he is being punished for standing still. A horse that no longer wanted to halt would be indicating that he was trying to avoid the leg cue entirely. A horse that became very light to a leg cue might also be trying to avoid the aversive. Both of these would result in a reduction in the behavior of standing still.
Stimulus control comes into play here so I am not suggesting that a horse that is eagerly walking off is avoiding punishment, but If I have a horse that consistently avoids halting, even if it has had reinforcement for that behavior, I want to look more carefully at what is going on. In this example, the rider doesn't want the horse to stop standing at a halt because she wants to have both behaviors. This is why when I teach one exercise, I often need to balance it out by teaching another exercise so that the horse does not replace one behavior with another, but instead learns that both behaviors are desirable, just at different times.
She can get a lot of feedback from her horse on how he perceives the stimulus by how he changes over time. But what happens if she adds positive reinforcement? If she is using positive reinforcement, then it gets a bit more complicated because now the rider has to decide if the horse is moving off to avoid the aversive or is motivated by the possibility of positive reinforcement. The rider would need to be able to carefully evaluate her horse's response to the leg aid to see if punishment is an important factor. Either way, the behavior of halting is still going to decrease so she is going to have to look at the horse's attitude to see how aversive he is finding the leg cues.
If I am only concerned with getting behaviors, I might not care if the horse was working to remove an aversive vs. working to gain positive reinforcement. But good trainers are aware that the mental state of the animal is an important consideration in training. Adding positive reinforcement changes an animal's mental state and also changes the animal's perception of the stimulus. Even though the behavior of "halting" has been decreased by reinforcing going forward, don't want it to be punished. I want it to decrease just because of the fact that a horse cannot be walking and halting at the same time and since I am reinforcing walking, the horse is choosing to spend more time walking.
Adding positive reinforcement can make it harder to identify aversives because the horse might tolerate the aversive to earn the reinforcement. But it has some significant advantages too. Positive reinforcement can make the aversive milder or non-aversive by adding a positive association. And it is easy to avoid punishing one behavior when that behavior is decreasing because of the application of an aversive. All I have to do is start reinforcing halting as well as going forward and start putting the two behaviors on cue. The horse learns that it is not that one is right and one is wrong, but that both are right at different times.
This was just a little peek into how to combine positive and negative reinforcement without the horse feeling punished. Combining positive and negative reinforcement is the topic of the next article on this subject so I am not going to go into any more detail here. What I want to emphasize here is that punishment can happen intentionally or as a by-product of reinforcing another behavior.
Case 3: I am training my horse one behavior and he keeps offering another behavior. I want to use positive reinforcement to reinforce a new behavior, but he keeps offering a previously learned behavior. This happens a lot with horses new to clicker training or if I am training two behaviors with similar cues. Let's say I have taught my horse to drop his head from a lead cue. It doesn't matter if I trained it with positive reinforcement, or a combination of positive and negative reinforcement. The horse thinks a lift of the lead means drop his head.
Now I want to teach the horse to bring his nose to the side from a lead cue. I ask and he keeps offering to drop his head. I do not reward dropping the head and positively reinforce any movement of the nose to the side. Over time he offers bringing his nose to the side and stops offering dropping his head. This is great, but what is going to happen the next time I ask him to drop his head? I am probably not going to get it because I have been reinforcing the nose to the side and I have essentially extinguished head lowering in the session (by not reinforcing it.)
Have I punished it? I think the behavior has decreased, but I have to ask myself "did I add or remove something after the behavior occurred?" If I am training with positive reinforcement alone and I just did not reinforce head dropping, then the behavior has not been punished. But if I responded with any kind of aversive, verbal or use of the rope, then the behavior has been punished. I do want to point out that when I am using a rope, the horse can make it more aversive than I intended by pulling harder against me. So when I ask did I add or remove something, I have to also consider if the horse added or removed something by how he interacted with the rope. I also have to take into account extinction which is when a previously reinforced behavior disappears because it is no longer being reinforced.
The good news is that usually it is easy to just get the behavior back by reinforcing it a few times. However, I think that this process can allow some frustration to creep in and you can end up with a horse that has some anxiety about the cues used to ask for the behaviors. For this reason, if I am training a new behavior and my horse keeps offering the same different behavior (one that has been previously trained) over and over again, I will go back and change something about the presentation or the set-up so that I am not punishing a behavior I want at the same time I am training a new behavior.
A cheat sheet for the Grid
This is not intended as anything other than a reminder of some of the topics I have discussed. There are no absolutes about which quadrant is going to be the most effective and stress-free for the animal. In general, positive reinforcement is the most forgiving and flexible quadrant, but some trainers and animals benefit by adding in negative reinforcement and/or negative punishment.
addition of something increases behavior
advantages: no aversives, animal has free choice, very forgiving, good for animals with emotional issues
disadvantages: dependent upon animal offering something to shape, requires trainer learn new skills such as timing, observation, and how to shape behaviors (these are good skills to learn so learning them is an advantage, but they can take some time to acquire), poorly timed or unpredictable reinforcement can create stress, frustration and aggression.
main tool of clicker trainers, click and reinforce is positive reinforcement and shaping is based on using positive reinforcement
addition of something decreases behavior
advantages: if done correctly, it can suppress unwanted behavior
disadvantages: hard to do correctly, unwanted side effects such as frustration, aggression, fear, or animal can shut down and disengage from the trainer
this is an advanced training tool and is not recommended
removal of something decreases behavior
advantages: can be very effective at decreasing unwanted behavior if used in conjunction with positive reinforcement. A mild way to interrupt unwanted behavior or unwanted behaviors that have crept into behavior chains
this is also an advanced training tool, but is usually safer to use than positive punishment
removal of something increases behavior
advantages: can be used to prompt and shape behavior, it is easy to teach tactile and pressure cues using it
disadvantages: doesn't encourage the same kind of free thinking as +R, easy to escalate and/or get into punishment
used by some clicker trainers more than others
A Few Last Words
I think a lot of clicker trainers get along quite well without being well versed in all the quadrants of operant conditioning. If you just stick to positive reinforcement, you can do a lot of training without needing to know anything about how to use the other quadrants and this keeps things simple. I encourage anyone new to clicker training to spend as much time as they can learning to use positive reinforcement to build behaviors.
Does that mean your time reading this was wasted? I hope not <smile>. Even as I was writing this, I kept thinking "is this going to be useful to people?" and "am I making it too complicated?" But I kept going back and revising and adding bits and pieces. I did this partly because I found it a useful way to organize my own thoughts and because it made me really think about things. But it was also because I really do believe that learning more about operant conditioning has improved my training.
I think many of us drift into the other quadrants without realizing it or really thinking about what we are doing. It might be because we are not paying attention to what we do when we are not actively training so we don't see the long term effects of our behavior. Or it could be that we are acting out of habit and assuming that what we are doing is just part of horse training. My goal is to make people more aware of what they are doing and start to see the connections between what happened this morning in the field and what happened this afternoon during training. I find that horse people do not necessarily look upon themselves as animal trainers and information about operant conditioning is out there, but not in horse training books. I wanted to put information about operant conditioning out where horse people might find it.
People are drawn to clicker training for different reasons and I think this is one of the strengths of clicker training and one reason it is so interesting to try and teach it. As a teacher, I might spend time on the specifics of teaching people how to train different behaviors, but it is also important for me to teach people about the process and philosophy of training. I want them to learn how to come up with their own training plans and solutions so that they can work on their own too. Learning about clicker training has changed so many things about my life that it is hard to identify them all, but one that is obvious to me is that clicker training gave me a sense of empowerment. I learned that I had the ability to solve my own horses' training challenges. That doesn't mean I might not need some help now and then, but it means that I could systematically work through a lot of issues that I might otherwise have had to accept and manage or hire someone else to fix.
This is one reason I think understanding more about the science behind clicker training is worthwhile. Regardless of where you are coming from, knowing more about operant conditioning just gives you more options and makes it easier to understand how to make good training choices. I hope learning more about operant conditioning helps trainers make better choices because understanding the terminology makes it easier to apply each quadrant effectively and is a good way to avoid getting caught up in the moral questions of whether some quadrants are better than others. We all need to listen to our gut feelings and conscience when we are training, but we also need to make informed choices about what is the best training tool to use at any given time.Since I started clicker training, I have met and continue to meet a lot of other horse people who are also learning clicker training. I meet people who found clicker training because they had difficult horses that were not successful under other training methods. They often achieve great success by switching to clicker training and focusing on using positive reinforcement only. For a horse that has a lot of baggage, this is a huge change in the training environment and some of them really blossom. The absence of any kind of coercion gives these horses the chance to express themselves and make their own choices. People with these horses become big proponents of clicker training using only positive reinforcement because for them, it was so clear that nothing else worked and that the horses love that kind of training.
I meet other people who are happy with what they do and just want to add some refinement or increase their horse's motivation. Some of them do just add the click and treat to their current program (which is probably using other quadrants) and this is enough for them. Some of them go a little deeper and start to look at using more shaping and changing the way they use pressure and release and other traditional training methods. And some of them eventually get to the point where they are using as much positive reinforcement as possible. And, of course, there are all levels in between. It depends so much upon where the person is at that moment. Are they actively competing or do they have riding goals? Do they have their own horse? Multiple horses? Are they teaching? How much time do they have to invest in learning something completely new? Changing everything over?
I also meet people who want to clicker train horses because they have clicker trained other species. These people also love and promote using only positive reinforcement because it is what they know and they enjoy the challenge of figuring out how to use the same tools with a new animal. These people are often very interested in teaching animals how to learn and want to use the animalís intelligence to create a great team. For them, using positive reinforcement only is important as they want very operant and creative animals.
All these people are out there looking for information on how to use clicker training with their horse. They have different levels of horse related skills. They have different goals (have a nice pet, train a horse to a specific activity, rehab a horse with physical issues, etc.). And they have different goals for what kind of relationship they want to have with their horses. My goal in this article was to provide enough information so that anyone interested in clicker training horses could find something useful, maybe by seeing other training options that are available to them, maybe just by seeing why trainers make different choices. I also think that understanding the science behind clicker training helps trainers analyze why some trainers are successful and others are not. There are many qualities that make a great trainer and being able to identify what makes a trainer great is one of the first steps to becoming one.
Katie Bartlett (Feb 2009)
Thank you for reading. If you have any comments, suggestions or corrections, please let me know (email me at firstname.lastname@example.org) . At some point I will update this article to reflect any changes in my training methods or philosophy and make any necessary revisions. This article started off as a little idea and grew and grew. I did a lot of reading and because it was for my own education, I did not take note of where I got different pieces of information. Although some of the content of this article is my own thinking, I read so much that I cannot entirely separate out where I got my ideas and some ideas that came to me when I was writing might have been based on information I had read in the past.
I guess this is my long winded way of saying that I am not attempting to take credit for anyone else's work and this article is a synthesis a lot of different material combined with my own experimentation and work. Some of the resources I used that I do want to acknowledge are:
Alexandra Kurland's books, DVD's and many conversations
with her at clinics.
Kathy Sdao's lectures at Clicker Expo and her DVD sets "Advanced Clicker Training" and "Know way, Know how"
Ken Ramirez's lectures at Clicker Expo and the book "Animal Training"
Kay Laurence and her microshaping lecture at Clicker Expo
Paul Chance's book "Learning and Behavior"
Jesus Rosales-Ruiz's lectures at Clicker Expo and some conversations with him at those events
Discussions on various lists including Clickryder, Beyond Basics Clicker Training, Clicker Solutions and The_click_that_teaches
I also want to thank Lore Haug and Jane Jackson for reading it and offering suggestions, corrections and other feedback.