06/07/2020
In positive reinforcement, is negative punishment the other side of the coin? If you train with positive reinforcement, is it unavoidable that you will be using negative punishment in tandem?
This argument is one I have heard often. People love to say that if you don’t reward an incorrect response then you are “punishing” the learner. Therefore, you DO use punishment in your training. CHECKMATE.
Wrong.
No reinforcement is no reinforcement. It does not mean that the learner is being punished. A behavior that is not being reinforced will not continue. That does not necessarily mean that punishment is the explanation for its decline, rather that behavior simply does not work. It does not result in reinforcement.
There are two types of punishment. We have talked a lot about positive punishment here which is the addition of an aversive, something the learner does not like, following an undesired response to decrease it. An example of positive punishment is smacking the horse on the shoulder for pawing. An aversive, smacking the horse, is added following the response of pawing to decrease its likelihood. Pawing=smack.
Negative punishment is the less talked about form of punishment here. It is a bit more difficult to put into practice with horses and therefore, is not commonly used. Negative punishment is removing an appetitive, something the learner likes, following an undesired response to decrease it. A human example is taking a child’s video games away after they talk back. Talking back=video games taken away. A horse example is stop scratching a horse when he steps on your foot. An appetitive, scratching, is being subtracted or removed following the response of stepping on your foot. Stepping on your foot=scratching taken away.
Some would have you believe that the absence of reinforcement is synonymous with negative punishment, and this is just not true. Not adding an appetitive/no positive reinforcement does not equal negative punishment. In order for it to be negative punishment, an appetitive must be subtracted/removed/taken away. You cannot take something away that was never given in the first place. So, not giving an appetitive is not really the same as taking one away.
Back to our human example. Let’s say I have bought my child a new video game. When I get home from the store, my child talks back to me. So, I stow the video game in my closet and decide I won’t be giving it to him today. Will this affect the child? No. He hasn’t lost anything as a result of his behavior because he never had it to begin with. That would be a rather ineffective way of trying to punish the behavior of talking back, so why then do people want to count it as such in positive reinforcement training?
Now let’s say we are training a default behavior with positive reinforcement. We want to reward the horse for standing calmly with his nose to himself. We are setting our horse up for success and we have him in protected contact with a fence between us. The horse tries reaching over the fence for us, but we are standing safely out of reach and we do not bridge and reinforce the horse for reaching. Is this negative punishment? No. The absence of reinforcement is not negative punishment. Reaching=nothing/no reinforcement. Nothing happens. Good or bad. Reaching does not result in reinforcement, and so the behavior serves no purpose and it will decline. Soon the horse stands quietly, keeping his nose more to himself, so we bridge and reinforce! A behavior that works! The horse offers that behavior more and the reaching stops.
This does not mean that we are using negative punishment. There was nothing being taken away. I am not taking food away from the horse because it has not been given to him to be taken away. If I wanted to negatively punish the horse for reaching, I could leave. I would be removing an appetitive, which is my presence/attention and the opportunity for reinforcement by leaving. Soon the horse would learn that reaching=human taken away.
When someone insinuates that not reinforcing an incorrect response is negative punishment, it cannot be by definition. Something good must be removed following the undesired behavior. You are not removing an appetitive as in negative punishment when reinforcement is withheld. Nothing is being taken away. It's just, nothing is being added.
In order for negative punishment to be perceived as such, the learner must be impacted by the loss of the appetitive in order to make the connection that its removal was a result of a particular response. You cannot take away something they don’t already have. So, how can this connection be made that something they liked was taken away as a result of an incorrect response? They can’t. All they know is that their response did not result in reinforcement. It didn’t work. They should try something else to see if it will work.
I would be more comfortable calling this process extinction than negative punishment. For example, if the horse that reaches had learned previously that reaching resulted in reinforcement from owners who fed the horse a treat every time he shoved his nose in their pockets, then by not reinforcing that behavior anymore, it would stop through the process of extinction. Extinction is when a behavior fades or disappears because it no longer results in reinforcement. However, in some cases a behavior that is tried in effort to earn reinforcement, does not have to go through an extinction process because it has no reinforcement history. It is a novel behavior. The animal tries it and when it doesn’t work, it moves on to the next one that might.
I have known a few nifty horses who could untie themselves and open stall doors. One horse I knew would not stop at just letting himself out of the stall, he would also release all his barn mates too! This was problematic for the barn owners, and this particular horse soon found himself bolted in his stall with a special lock he could not unlatch himself. He quickly realized no amount of fiddling with the latch would open the door and the behavior diminished. This is another example of extinction. The reinforcement, escaping/freedom, was no longer the outcome, which is why the behavior stopped, not because the behavior was punished. There was nothing to take away as consequence for fiddling with the latch. Freedom could not be taken away as punishment for his crime of trying to open the stall door, because he was already locked in his stall! Trying to open the latch=nothing/no reinforcement. The behavior did not work. So, he quit.
If I don't go to work, I don't get a paycheck. Reinforcement, in the form of my paycheck is contingent on me showing up to work and doing my job. If my boss does not give me a paycheck because I did not come to work all week, I missed an opportunity for reinforcement. My boss did not take my paycheck away as consequence for not showing up- I never got a check. Reinforcement is not being added. Nothing is being taken from me. Staying at home=no reinforcement.
Now, if I show up to work and put in my time I should earn a check that reflects the hours I worked, but my production is low, so my boss docks my pay, now he is taking something away from me, and this is negative punishment. Low production=money taken away.
Reward based training does focus on giving or adding a reward/appetitive for the correct response. But that is an overly simplified explanation. The biggest part of reward-based training is setting the learner up to succeed in the first place, so they don’t experience a lot of failure that results in confusion and frustration. Basically, this means we try to limit incorrect responses in the first place. This involves things like environmental and antecedent arrangement, shaping and thin slicing, using targets and mats to explain what we want so we can initiate those reinforceable moments.
Basically, no reinforcement is no reinforcement. You cannot subtract or remove an appetitive that has never been given in the first place. Not reinforcing incorrect responses is not quite the same as negative punishment. And last, reward-based training works best when you find a rhythm. When the horse steadily offers correct responses that he in turn receives reinforcement for. If you hit a dead spot, it’s best to evaluate how you can help the horse get it right, and regain momentum. If there are a lot of lulls or incorrect responses, that is not ideal. Rearrange the training situation. Lower the criteria, return to the previous step in your shaping plan, use a target, figure out how to explain what you want better. Set the horse up to give the correct response, so when you try again you can hit your stride.