Tuesday, May 5, 2009

Operant Conditioning

Operant Conditioning.

Today I am going to try to tackle the concepts of Operant Conditioning or OC.  Somehow, I wonder if I am crazy to try it, considering all the controversy about it, but I hope that if I remain neutral enough, I may be able to shed some light on this, often confusing, subject.

Before I start on OC, I should start with Classical Conditioning.  The concept of Classical Conditioning was started by Pavlov when he was doing some experiments with dogs.  He started by ringing a bell and then feeding a dog.  He would do this several times and then after awhile, he found that the dog would begin salivate as soon as he heard the bell even if he was not being fed.  Pavlov called this Classical Conditioning.

Then a man called Skinner began experimenting with rats and other animals including humans.  He wanted to study what it would take to get people to change their behaviors.  He wanted to study the changes because he really wanted to study learning.  But the only way that he could actually measure learning was by observing a change in behavior.  So he defined learning as a change in behavior.  He later discovered that animals would change their behavior according to, at least, two different stimulating mechanisms that were applied after the behavior occurred.  Of course, this was in sharp contrast to Pavlovian Classical Conditioning because in Classical Conditioning, the stimulus came BEFORE the behavior and in fact caused it. (i.e. the ringing of the bell)

In Skinners approach, the stimuli would come AFTER the behavior was observed and it would either increase the behavior or it would diminish the behavior.  Skinner called this process Operant Conditioning.  It should be noted that in Operant Conditioning, Skinner would have to wait until the subject presented or offered the behavior before he could do something to encourage or diminish its frequency.

So the fundamental difference between Operant and Classical is the following:

In Classical Conditioning, the stimulant comes before the behavior.

In Operant Conditioning, the stimulant comes after the behavior.

So far it has been very simple.  But this is where the simplicity ends.  Skinner later categorized his stimulants into two basic categories. 

One where the stimulus encourages a recurrence of the behavior; and this he called a reinforcement.

And the other where the stimulus discouraged the recurrence of the behavior; and this he called a punishment.

You can see where confusion starts to creep in.  In our culture, the term punishment is connected with concepts of cruelty and misery.  But Skinner was not looking at it in this way.  He was simply trying to classify a stimulus that would discourage or decrease the frequency of a behavior.  He could have just as easily called it a deterrent and it would have meant the same thing.  But he didn’t and unfortunately, we are stuck with the term and the stigma that comes with it.

Next Skinner further subdivided the punishments and reinforcements into a form of stimulus that could be applied, in other words something that you would give to the subject, and something that could be removed. i.e. something that you would take away from the subject.  For these categories, he selected the incredibly unfortunate terms of positive for giving, and negative for taking away.  Of course, Skinner simply meant positive and negative in the mathematical sense of adding something or subtracting something.  But again, language and culture predisposes us to think of positive as good, and negative as bad.  It is a pity because when we engage in mathematics, we really do not think of adding and subtracting as good or bad, but with reinforcements and punishments, the terms negative and positive have a profound effect upon our attitudes toward the stimulus.

So perhaps it might be useful to use some examples to help rid ourselves of the notions of good and bad, or kind and cruel.

Consider a person who is talking a lot and I want him to be quiet.  So, I tell him that if he stops talking, I will give him some money.  So, I want a behavior (the talking) to diminish.  Therefore, according to Skinner, whatever I do is considered a punishment.  And if I give him something, it is positive.  So if I pay someone to be quiet, then I am applying a positive punishment.

Consider another person who is talking.  Let us say that he is in court of law and speaking out of turn.  The judge could levy a fine on him for contempt of court, thus effectively silencing the person.  So again, we want the behavior of talking to diminish, and so we are going to punish.  But instead of giving him money to be quiet, we are taking money for speaking.  Thus we have applied a negative punishment.

Of course, the opposite could be applied.  If we want him to talk more, we could reinforce his talking by paying him to speak more, or we could levy a fine if he does not talk enough.  Thus, paying him to speak is a positive reinforcement, while fining him for not talking enough is a negative reinforcement.

Now consider the possibility that a person is complaining about a headache.  And I am tired of hearing him whine about it.  So I give him some medicine to make the headache go away and now he is quiet.  Herein lies the incredibly confusing nature of Operant Conditioning’s terminology.  Is it positive punishment because I gave him medicine which made him quiet?  Or is it Negative punishment because I took his headache away?

Well this is a trick question really.  It is neither because, in this case, the medicine came nor the headache left BEFORE the behavior changed.  The stimulus came before and was the cause of the change.  Therefore, it is NOT Operant Conditioning.  It is Classical Conditioning and incidentally is a contributor to the placebo effect.

Remember that the Operant Conditioning stimulus of reinforcements and punishments come after the behavior and stimulate the behavior to increase or diminish.

Now if we want to engage in training, we need to increase desirable behaviors and decrease undesirable behaviors.  Then we need to take those behaviors and connect them with a cue or command.  And Operant Conditioning lends itself very well to effecting these changes.  But first, it is most effective if you divest yourself of the cultural prejudices that are attached to the terminology.  Reinforcements and Punishments are not kind or cruel per se, they are simply ways in which we can distinguish between stimulus that increases, or decreases the frequency of a particular behavior.

So we can clearly identify 4 distinct mechanisms that will increase or decrease the frequency of a behavior.  They are:

  • Positive Punishment
  • Negative Punishment
  • Positive Reinforcement
  • Negative Reinforcement

The standard way that Operant Conditioning is used is in the following manner:

We pick a behavior that we like or is at least close to what we like and then use reinforcements to make it happen more and more often.  Soon, the behavior will become very predictable.  Once the behavior is predictable, we borrow a bit, (not a lot, just a bit) from Classical Conditioning.  As soon as you can predict the behavior in a reliable fashion, you start giving some kind of signal just before you think the behavior is about to happen. This is called a cue and it can be a hand signal, a voice command, or anything that the subject can easily detect.  Soon, you will classically condition the subject to perform the behavior right after the cue or command is given.  As soon as the command is given, the subject will perform the behavior and then you can reinforce the behavior with either a negative or positive reinforcement.

Whether you select a negative or positive reinforcement is entirely dependant on logistics, mechanics, and your personal preference.  When training dolphins, it is difficult to reinforce a behavior by jumping into the water and taking something away from them.  Thus, logistically, the best way to reinforce a behavior is to give them something like a treat.

Horses on the other hand are traditionally trained using almost exclusively negative reinforcements.  The way this is done is that a rider will apply leg pressure to the side of a Horse.  When the horse moves in a desirable way, the rider will remove (Negative) the pressure from his leg and thus the Horse will begin to increase the frequency of that movement under those conditions.  It is the same with the reins.  When a rider puts pressure on a rein, the rein transmits that pressure to the mouth of the Horse.  If the Horse turns his head in the correct direction, then the pressure is removed, (Negative) and thus the Horse will increase his tendency to move his head in the correct manner.

This is the traditional way that Horses have been trained for millennia.  However, there are a dedicated and growing group of people who are trying to train Horses using exclusively positive reinforcements and this has generated a new movement toward what are considered more gentle techniques.  I have no desire to take sides on this issue.  For me, animals learn according to all four mechanisms and I do not believe in the exclusive application of one over the other as I can see the possibility of abusing any technique.  But I will also concede that it is pretty difficult to be cruel if one were to only engage in Positive Reinforcements.  In fact, one would have to be pretty creative to find a way to do it.  And in a world where many people do not want to take the time to filter out all the ways in which they might abuse a method, sticking strictly to positive reinforcements is not a bad way to err on the side of caution.  The only problem I see in this approach is that although working with a horse on the ground will present itself as very amenable to positive reinforcements, riding a horse may be a bit more difficult if one were to restrict oneself to positives only.  In fact, I am not sure how to do it.  Even a shift of the weight until the horse performs a correct maneuver is technically a negative reinforcement but I am not the expert in the exclusive application of positive reinforcements.  I work with animals according to what I consider their nature, the environment, and the mechanical configuration that presents itself.  I do my best to be as kind and gentle as possible and leave the treatment of exclusively training with positives only to those experts in that field.

Because Horses are prone to a lot of undesirable behaviors, it is incumbent upon me to address the issue of those stimuli that reduce the frequency of behaviors; namely Punishments. Although I recognize that punishments are not inherently bad or cruel, I do recognize that they are most amenable to abuse and can easily be misapplied.  Thus I avoid punishments wherever my imagination will allow it.  To promote this idea, I will give an example:

For the Horse that does not stand still for mounting:

The common way to handle this is to put one foot in the stirrup, and when the Horse starts to walk off, to hop along with the horse and by some dangerous miracle, eventually pull your self into the saddle.  Then to take the reins and pull on the reins as much as possible to make the horse stop, and even back him up.  Foul language is often applied at this juncture which, although helpful for the rider, is wasted on the Horse.

Consider this alternative:

First teach the Horse the command “Stay”.  When you want to teach this command, make a commitment to teach the command independently of any other lesson.  In other words, do NOT expect the Horse to learn this command during the ONE TIME you mount him per day to go riding.  Take time out to make this its own lesson.  To do this, take your horse somewhere where there is no food or other distractions.  Let him stand by himself and start counting to your self to see how long he will stand on his own without moving.  Let us say you manage to count to 6 before he starts moving. You now know what his threshold is.  Let him stand there again, and count to 5.  When you reach 5, say “Good Horse” and give him a treat.  If he moves before you reach the count of five, say “Wrong”, and put him back in his place.  Then try counting to 5 again.  If you consistently cannot reach the number 5, then you know that your threshold is too high and lower your expectations.  When you find a good threshold, repeat the exercise for a couple of minutes and then slowly start increasing the count over a period of several days.  When you can ask the Horse to stand still for a count of 10, you are ready to ask him to stand still for mounting.

This example gives us the foundation for a new approach.  It is based on the idea that we want to give commands to a Horse to DO something as opposed to DON’T.  When we are mounting and the Horse starts to walk off, we often find ourselves giving commands that the Horse has never been taught, but that we are convinced they must understand.  The most common of these is “Quit”. (In my case, I am erroneously convinced that the Horse is familiar with the meanings of many words that would make a sailor blush) This would not be so bad if it were not likely that the rider has never spent a dedicated amount of time to teach the Horse what that particular command means.  For the most part, Horses are simply expected to just understand the English word.  And of course, the command “Quit” is extremely vague and difficult to understand for everyone but the person saying it.  So rather than tell the Horse what not to do, consider asking the Horse to perform an action that physically precludes the performance of the undesirable behavior.  Essentially, instead of saying “Don’t move.” Consider saying, “Do Stand Still.”  The difference is subtle but powerful.  And as you become a Do rider instead of a Don’t rider, you will find that your Horse will begin to look for all sorts of other things that he might be able to DO for you.  This is a wonderful place to start and keep a relationship and ultimately far more fulfilling since you will not be constantly rebuking your Horse for doing things you don’t like.  


Post a Comment