Psychological learning theory

I briefly covered behavior modification in a prior post. In this post I’ll explain classical and operant conditioning in more detail, with examples to illustrate the concepts. The principles of behaviorism, or learning theory, are fundamental to the science of psychology. Two of the names most commonly associated with behavioral psychology are J. B. Watson and B.F. Skinner. Two key words in learning theory are stimulus and response.

Classical conditioning is also known as Pavlovian conditioning, based on Ivan Pavlov’s famous experiments with drooling dogs. Salivation is what behaviorists call an unconditioned response to an unconditioned stimulus – the presentation of food. In other words, neither dogs nor humans have to be taught to salivate when we see and smell food that appeals to us. A bell is initially a neutral stimulus, having nothing to do with food or salivation. But when a bell is rung every time food is presented, it becomes a conditioning stimulus, as the brain learns to associate it with mealtime. Eventually the ringing of the bell alone, without the presentation of food, will stimulate salivation – a conditioned response.

Classical conditioning is one of the most powerful tools used by marketers and advertisers to condition behavior on a mass scale, through the popular media. They systematically condition consumers to associate pleasant or desirable things with symbols such as McDonalds’ golden arches, logos, slogans, jingles, and attractive people giving sales pitches. They use it because it works. You see bikini-clad babes posing at car and boat shows because it increases the sales of the cars and boats  they’re posing in front of.

Where classical conditioning is a passive mode of learning, involving the creation of unconscious associations, operant conditioning involves systematic responses that shape a target behavior, making it occur either more frequently or less frequently. The process starts with recording the baseline frequency of the target behavior, i.e. how frequently it naturally occurs without systematic reinforcements being applied. Things that happen consistently as a consequence of the target behavior will tend to make it occur more frequently, if followed by a rewarding – or positively reinforcing – response (e.g. praise, money, candy, affection, etc.). If an expected reward is withheld – negative reinforcement – or the behavior is somehow punished – aversive reinforcement – the behavior tends to occur less frequently. Negative reinforcement is also used to increase the frequency of the behavior, when an aversive consequence (e.g. pain, shaming) is removed/avoided.

We might go to work even if we don’t really want to, because we know that our behavior will be reinforced by a paycheck. We know that if we stop going to work, the reinforcer will be withheld. Operant conditioning is the way we shape the behavior of our children, and train animals to obey our commands or to learn tricks. It explains the motivation athletes have to spend long hours exercising and practicing their skills.

The other principle to understand about operant conditioning is ratios of reinforcement, which can determine how lasting a conditioned behavior is. A hungry, caged rat can be taught to press a lever relatively quickly, if it’s rewarded with a food pellet every time the lever is pressed – a 1:1 ratio of reinforcement. But if you stop reinforcing the learned behavior with food, it won’t persist. In order to make the new behavior more persistent, you gradually “thin out” the frequency of reinforcement, perhaps starting with a 1:2 ratio. Now the rat only gets food every second time it presses the lever. Then you can go to other fixed ratios (1:3, 1:4); but if the ratio becomes too thin or if the food pellets stop coming, the learned behavior ceases, or in behavioral terms is extinguished.

If you really want a target behavior to persist without reinforcing it at a fixed interval, you move to a variable ratio: you vary the ratio, so the rat doesn’t know how many times it will have to press the lever (1:2, then 1:5, then 1:3, then 1:6, then 1:2, etc.) in order to get the food pellet. A hungry rat will keep pressing the bar, having learned that it will eventually get rewarded with a pellet. A well-fed rat will find better things to do with its time.

To take this to the level of human conditioning, think of the difference between a vending machine (with a 1:1 ratio of reinforcement) and a slot machine (with a variable rate of reinforcement). Every time you feed the required amount of money into a soda machine and press a button, you expect to get a soda. If you don’t and you’re very thirsty, you might try a second time. But if your behavior isn’t reinforced the second time, you certainly won’t keep feeding money to the machine.

But if you’re sitting at a slot machine, you don’t expect to be reinforced every time you put in a quarter and pull the lever. You might  get a sequence like this: nothing, $2, nothing, nothing, $5, nothing, nothing, nothing, $3, nothing, nothing, etc.. The behavior of feeding money to the machine and pulling the lever might persist until you’re out of money. Gambling machines have been called “addictive” because when we get money back from the machine, we get a jolt of the neurotransmitter serotonin ( a positive reinforcer) and persist, anticipating the next jolt – much like a hungry rat conditioned to persist in pressing a lever, knowing it will eventually get a food pellet.

