Reinforcement Schedules in Dogs: Learning and Maintenance

Two dogs receive exactly the same treat for exactly the same behavior. One gets it every time; the other gets it unpredictably, on average every fourth repetition. After a few weeks the second dog works faster, pauses less, and keeps going far longer when the treats stop altogether.

The reinforcer was identical. What differed was the schedule — the rule specifying which responses get paid — and schedule effects are among the most reliably replicated findings in learning science. This article covers the classical schedule types and what each produces, the mechanism behind the effects, the partial reinforcement extinction effect and its clinical consequences, and what the canine evidence actually establishes. That last section is unusually important here, because the confident schedule prescriptions in training literature rest almost entirely on research in other species (and the neurochemistry is covered separately).

A dog looks attentively at its handler’s hand during a training session, while the handler wears a treat pouch. The outdoor scene captures a moment of positive reinforcement training with a softly blurred background.

1. What a Reinforcement Schedule Is

1.1 The Definition

A reinforcement schedule is the rule determining which occurrences of a behavior produce a reinforcer. Schedules can be based on the number of responses (ratio), on elapsed time (interval), and can be fixed or variable within either. Continuous reinforcement — every correct response paid — is the limiting case.

Skinner's analysis in The Behavior of Organisms (1938) and the systematic treatment in Ferster and Skinner's Schedules of Reinforcement (1957) established the central point: behavior is a function not merely of whether consequences occur but of how they are patterned. The same reinforcer on different schedules produces measurably different behavior.

1.2 The Variables That Travel With It

Magnitude. Larger or higher-value reinforcers generally speed acquisition, non-linearly and subject to satiation. Magnitude should be calibrated to task difficulty and current motivational state rather than held constant.

Timing. Reinforcer effectiveness decreases as the delay between behavior and consequence grows. This is the rationale for conditioned reinforcers: a marker bridges the gap and specifies precisely which behavior earned the outcome — which requires having defined that behavior precisely in the first place (the operationalization problem).

Predictability. Fixed schedules make the requirement knowable; variable schedules do not. This distinction does most of the explanatory work in what follows (and connects directly to prediction error).

2. The Four Classical Schedules

2.1 Continuous Reinforcement

Every correct response is reinforced. The contingency is maximally clear, and acquisition is correspondingly fast. The limitation is low extinction resistance: behavior established only under continuous reinforcement fades quickly when payment stops — which, in ordinary life, it constantly does.

2.2 Fixed Ratio

Reinforcement after a set number of responses. FR schedules produce high steady rates interrupted by a post-reinforcement pause whose length scales with the ratio — one of the most robust findings in the schedule literature. Pushing the ratio up too quickly produces ratio strain: pausing increases, errors rise, and the animal disengages.

2.3 Variable Ratio

Reinforcement after a variable number of responses averaging some value. VR produces the highest and most sustained response rates of the classical schedules, with little post-reinforcement pausing, and the greatest resistance to extinction.

The reason is structural: because the next response might be the paid one, there is no point in the sequence at which pausing is warranted. This makes VR the standard recommendation for maintenance — with an important boundary condition established in §5.

2.4 Fixed and Variable Interval

Fixed interval pays the first response after a set period has elapsed, producing the characteristic scallop: little responding after reinforcement, accelerating as the interval runs out. Pure FI is rare in deliberate training but appears whenever a trainer inadvertently regularizes timing — reinforcing a stay after roughly the same duration each time.

Variable interval pays the first response after a varying period. It produces steady moderate rates with little pausing, and it does not reward high rates, since extra responses before the interval elapses earn nothing. That makes VI suited to sustained-attention behaviors rather than rapid repetition.

3. Why Schedules Have These Effects

3.1 The Prediction Error Account

Schultz, Dayan and Montague (1997) established that midbrain dopamine neurons encode reward prediction error — the discrepancy between expected and received outcome — rather than reward itself. An unexpected reward produces a burst; a fully predicted one produces little change; an omitted expected reward produces a dip below baseline.

Applied to schedules, this yields a coherent mechanism. Under continuous reinforcement, once the contingency is learned each outcome becomes fully predicted, prediction error attenuates, and with it the teaching signal. Under variable ratio, the next response might pay, so expectation never fully converges and prediction error remains available throughout the sequence.

3.2 What That Account Is and Is Not

This is a theoretically well-grounded integration of classical schedule findings with reward neuroscience, and it generates testable predictions. It is not a description of measured canine neurophysiology. No neuroimaging study has compared prediction error signals under different schedule types in dogs, and the dopaminergic account rests on primate and rodent recording (the state of that evidence in dogs).

3.3 Acquisition Versus Maintenance

The two phases have opposite requirements. Acquisition needs clarity: every correct response paid, no ambiguity about what produced the outcome. Maintenance needs durability, which comes from the animal having learned to continue through unpaid trials. Confusing the two is the most common practical error in either direction.

4. The Partial Reinforcement Extinction Effect

4.1 The Finding

Behavior trained under intermittent reinforcement extinguishes more slowly than behavior trained under continuous reinforcement. Documented by Humphreys (1939) and replicated across species, tasks, and reinforcer types, the PREE is among the most reliable phenomena in learning science.

It is not gradual forgetting. It is a learned disposition — tolerance for non-reward — specifically shaped by the training history.

4.2 Why It Happens

Amsel's frustration theory (1958, 1962) holds that intermittent training teaches the animal to keep responding in the presence of the frustrative state produced by non-reward. When extinction begins, it does what it learned to do: continue.

The sequential account proposes that the animal learns to associate the aftereffects of non-reinforcement with subsequent availability, so those aftereffects continue to cue responding during extinction. The two accounts are difficult to distinguish empirically and are not mutually exclusive.

In prediction-error terms: after continuous reinforcement, every extinction trial delivers a strong negative prediction error and the association degrades fast. After intermittent reinforcement, early extinction trials are indistinguishable from ordinary unpaid trials, so the signal accumulates slowly (the fuller extinction picture).

4.3 The Clinical Consequence Nobody Wants

The PREE applies to unwanted behavior exactly as it applies to wanted behavior. A dog whose barking is sometimes answered, sometimes ignored, and sometimes met with being pushed away is on an intermittent schedule — and the resulting behavior is substantially harder to extinguish than if it had been reinforced every time.

This makes consistency during an extinction programme mechanistically necessary rather than merely advisable. Occasional reinforcement does not slow progress; it re-establishes the behavior on an intermittent schedule and increases its future resistance.

4.4 A Correction on Extinction Bursts

The transient increase in behavior when reinforcement stops is usually presented as inevitable. It is not. Lerman and Iwata (1995) analysed 113 sets of extinction data and found bursting in 24% of cases — 36% where extinction was used alone against only 12% where it was combined with reinforcing an alternative behavior.

That work comes from applied behavior analysis with human participants, so the percentages should not be transferred literally. The direction is what matters, and it converts a warning into a method: pairing extinction with differential reinforcement roughly halves the burst rate (and frustration is the mechanism to watch).

5. What the Canine Evidence Actually Shows

5.1 The Gap Is Real

Parametric schedule research in dogs — controlled comparison of FR, VR, FI, and VI on standardized tasks with measurement of acquisition rate, response rate, pausing, and extinction resistance — appears to be absent from the peer-reviewed literature. So does a direct demonstration of the PREE in dogs using a parametric extinction design.

This is not a footnote. It means the applied schedule framework is applied to dogs on the strength of seven decades of work on pigeons and rats, without species-specific validation of the parameters being prescribed.

5.2 But Schedule Manipulation in Dogs Is Not Absent

Two studies deserve more prominence than they usually receive.

Feuerbacher and Wynne (2014) gave dogs a concurrent choice between food and petting and then thinned the food schedule. All groups showed sensitivity to the thinning by reducing time allocated to food, with substantial group and individual differences in how sensitive they were. This is not a full parametric comparison, but it is a direct manipulation of schedule density in dogs with a measured behavioral consequence — and it establishes that individual sensitivity varies considerably.

Cimarelli et al. (2021) is the more consequential and is routinely omitted from schedule discussions. Naïve dogs were clicker-trained on a novel behavior with either continuous reinforcement or reinforcement on 60% of clicks. Partial rewarding did not improve learning speed — and the partially rewarded dogs subsequently showed a more pessimistic bias in a cognitive bias test than the continuously rewarded group.

5.3 What That Result Changes

Read carefully, Cimarelli et al. does not contradict the PREE, which concerns resistance to extinction rather than speed of acquisition. It contradicts a common training conflation: the belief that thinning reinforcement early builds durability at no cost. In dogs that have not yet formed the association, it bought nothing in speed and appears to have cost something in affective state.

The correct synthesis is narrower than the usual advice. Reinforce continuously while a behavior is being acquired. Introduce variability once it is fluent. The sequence is not stylistic preference; the reversal has been tested in dogs and did not work (with the emotional cost being the part that gets overlooked).

5.4 The Applied Training Literature Was Not About Schedules

Hiby et al. (2004), Rooney and Cowan (2011), and Blackwell et al. (2008) are regularly cited in schedule discussions. None was designed to investigate schedules. They compared training method categories — broadly reward-based against punishment-based — and found reward-based approaches associated with better obedience outcomes and fewer problem behaviors.

Rooney and Cowan's finding on consistency of reinforcement delivery comes closest to a schedule interpretation, and it is a broad construct that does not isolate specific parameters. These studies support reward-based training. They do not validate any particular schedule (the method-level evidence in detail).

6. Welfare and Schedule Design

6.1 Frustration Is Part of the Mechanism

Intermittent schedules work partly by teaching tolerance for non-reward, which means frustration is not an unwanted side effect but the process itself. Gradual ratio stretching builds that tolerance without the costs of chronic frustration.

The welfare problem arises with abruptness: transitions made too fast, ratios stretched too quickly, extinction imposed without preparation. Then frustration becomes distress, showing up as displacement behavior, avoidance of the training context, and degraded performance.

6.2 High Rate Is Not Distress

A dog working fast and enthusiastically on a well-designed variable schedule is showing motivational engagement, not stress. Effortful behavior is not evidence of a welfare problem.

The signals that a schedule has gone wrong are specific: displacement behavior out of context, avoidance of the training environment, repetitive or stereotypic behavior, and quality degrading while rate is maintained. The indicated response is to reduce the ratio or raise reinforcement density — not to conclude the dog is unmotivated, and not to read effortful behavior as contentment either (since behavior does not report the state directly) (and arousal is the variable to monitor alongside).

6.3 Individual Assessment Is the Unit

Given Feuerbacher and Wynne's finding of substantial individual variation in schedule sensitivity, the schedule that produces optimal performance in one dog may produce distress in another. Breed and species-level generalizations are the wrong grain (as temperament research would predict).

7. Applied Recommendations

7.1 Acquisition

Continuous reinforcement until the behavior meets criterion reliably. This applies whether the behavior is being lured, shaped, targeted, or trained through demonstration (as with social learning approaches). Introducing variability early costs acquisition speed and, on the canine evidence, affective state.

7.2 The Transition

Once acquisition is stable, plan the move deliberately: a low fixed ratio first, then a low variable ratio, with ratio and variability rising incrementally as behavioral history supports it. Watch for ratio strain and step back when it appears.

This transition is not optional for any behavior expected to hold in the real world, where reinforcement is infrequent and delayed by default.

7.3 Maintenance

Match the schedule to the goal. Variable ratio where high-rate performance matters — sport, working tasks requiring rapid responding. Variable interval where sustained calm engagement matters — detection work, extended stays, settling.

A recurring error is expecting real-world performance from behavior that was never transitioned off continuous reinforcement. The remedy is systematic preparation, not reducing reward. High average reinforcement rates are entirely compatible with building durability: the goal is variability in when and for what, not less reward overall.

7.4 Behavior Modification

Schedule operations interact with emotional state, which changes the calculus.

In desensitization and counterconditioning, maintain continuous reinforcement throughout the sub-threshold phase. The objective is a robust positive conditioned response, and variability introduces uncertainty exactly where uncertainty is the problem (the fear-learning mechanics involved).

In differential reinforcement programmes, keep the alternative behavior on continuous reinforcement during establishment. Thinning too early generates frustration that can worsen the presenting problem, particularly where aggression is involved (as is common in reactive cases).

And anticipate the PREE. Problem behaviors that were intermittently reinforced during the household history before assessment will resist extinction. That is a prediction to build into the treatment plan, not evidence that the plan is failing.

8. Summary: Schedules at a Glance

Continuous reinforcement — Every correct response paid. Produces: fastest acquisition, clearest contingency. Limitation: low extinction resistance. Use: establishing new behavior, and throughout sub-threshold counterconditioning.

Fixed ratio — Payment after a set number of responses. Produces: high steady rates with a post-reinforcement pause scaling with ratio. Risk: ratio strain if stretched too fast. Use: first step away from continuous reinforcement.

Variable ratio — Payment after a varying number averaging some value. Produces: highest sustained rates, minimal pausing, greatest extinction resistance. Requires: solid prior acquisition. Use: maintenance where rate matters.

Fixed interval — Payment for the first response after a set time. Produces: the scallop, with responding accelerating toward the interval's end. Mostly appears by accident when trainers regularize timing.

Variable interval — Payment for the first response after a varying time. Produces: steady moderate rates, no reward for high rates. Use: sustained vigilance and calm maintenance.

The PREE — Intermittently reinforced behavior extinguishes more slowly. Applies to unwanted behavior too, which is why inconsistent responses to problem behavior are so costly and why consistency during extinction is mechanically required rather than merely tidy.

9. Research Gaps and Critical Appraisal

No parametric schedule research in dogs. Comparison of FR, VR, FI, and VI on standardized canine tasks with the standard measures appears absent from the peer-reviewed literature. Optimal VR means, appropriate stretching rates, and schedule-by-reinforcer interactions in dogs are inferred, not established.

No direct PREE demonstration in dogs. Despite the effect's centrality to applied recommendations, a parametric extinction design in dogs analogous to the classical paradigms does not appear to exist.

But the canine evidence is not zero, and the part that exists is inconvenient. Feuerbacher and Wynne (2014) manipulated schedule density in dogs and found substantial individual variation in sensitivity. Cimarelli et al. (2021) tested early thinning directly and found no learning benefit and a measurable affective cost. Reviews that describe canine schedule evidence as entirely absent overlook both.

The applied literature does not bear on schedules. Hiby et al. (2004), Rooney and Cowan (2011), and Blackwell et al. (2008) compared method categories, not schedule parameters, and are cross-sectional survey data with the usual limits on causal inference.

Extinction bursts are the minority case. They appeared in 24% of analysed applications, and in 12% when combined with differential reinforcement (Lerman & Iwata, 1995) — human applied behavior analysis, not canine.

Individual and breed moderation is uncharacterized. Differences in arousal, impulsivity, and frustration tolerance plausibly moderate schedule effects and have not been systematically examined, and breed-level claims about schedule sensitivity have no empirical basis (as breed-behavior research would caution) (the flexibility dimension in particular).

Emotional state interactions are unstudied. How fear, anxiety, or chronic stress modify schedule effects in dogs matters clinically and rests on extrapolation from non-clinical laboratory samples.

The neurobiological account is untested in dogs. Whether prediction error signals vary with schedule type in dogs as they do in rodents and primates is a testable question that awake canine imaging could in principle address (given the existing imaging work).

10. Conclusion

Reinforcement schedules are how individual training moments accumulate into behavioral dispositions — fast or slow, persistent or fragile, robust under real conditions or only under ideal ones. The general findings are among the most durable in learning science: continuous reinforcement builds fastest, intermittent schedules resist extinction better, variable ratio produces the highest rates, and inadvertently intermittent reinforcement of unwanted behavior is among the most consequential errors in companion dog management. What the field cannot currently supply is dog-specific parametric data justifying confident prescription of particular values, and that gap is larger than training discourse usually admits. The canine evidence that does exist is worth more attention than it gets, and it cuts against a popular recommendation: thinning reinforcement early neither speeds learning nor leaves the dog unaffected. The workable position is to hold the general principles, sequence them correctly — build continuously, then vary — and let the individual dog's behavior and welfare indicators set the parameters that the literature cannot.

Key Insights (Takeaways)

The schedule, not just the reward, determines what behavior emerges. Continuous reinforcement produces the fastest acquisition and the weakest durability; variable ratio produces the highest rates and the strongest resistance to extinction.
The partial reinforcement extinction effect works against you as reliably as for you. Behavior that is sometimes answered and sometimes ignored becomes markedly harder to extinguish, which makes consistency during an extinction programme mechanically necessary rather than merely good practice.
Do not thin reinforcement early. In naïve dogs, rewarding 60% of clicks did not improve learning speed and left the dogs with a more pessimistic cognitive bias (Cimarelli et al., 2021). Build continuously, vary once the behavior is fluent.
The schedule framework applied to dogs is largely borrowed. Parametric comparison of schedule types in dogs, and a direct demonstration of the PREE in dogs, appear absent from the literature — though canine schedule manipulation is not entirely absent, and individual sensitivity to schedule thinning varies substantially (Feuerbacher & Wynne, 2014).
Extinction bursts are the exception, not the rule, and differential reinforcement is the lever. Bursting occurred in 24% of analysed cases and only 12% when extinction was combined with reinforcing an alternative behavior (Lerman & Iwata, 1995).

References

Amsel, A. (1958). The role of frustrative nonreward in noncontinuous reward situations. Psychological Bulletin, 55(2), 102–119. https://doi.org/10.1037/h0043125

Amsel, A. (1962). Frustrative nonreward in partial reinforcement and discrimination learning: Some recent history and theoretical extension. Psychological Review, 69(4), 306–328. https://doi.org/10.1037/h0040388

Blackwell, E. J., Twells, C., Seawright, A., & Casey, R. A. (2008). The relationship between training methods and the occurrence of behaviour problems, as reported by owners, in a population of domestic dogs. Journal of Veterinary Behavior, 3(5), 207–217. https://doi.org/10.1016/j.jveb.2007.10.008

Chance, P. (2014). Learning and behavior (7th ed.). Cengage Learning.

Cimarelli, G., Schoesswender, J., Vitiello, R., Huber, L., & Virányi, Z. (2021). Partial rewarding during clicker training does not improve naïve dogs' learning speed and induces a pessimistic-like affective state. Animal Cognition, 24(1), 107–119. https://doi.org/10.1007/s10071-020-01425-9

Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. Appleton-Century-Crofts. https://doi.org/10.1037/10627-000

Feuerbacher, E. N., & Wynne, C. D. L. (2012). Relative efficacy of human social interaction and food as reinforcers for domestic dogs and hand-reared wolves. Journal of the Experimental Analysis of Behavior, 98(1), 105–129. https://doi.org/10.1901/jeab.2012.98-105

Feuerbacher, E. N., & Wynne, C. D. L. (2014). Most domestic dogs (Canis lupus familiaris) prefer food to petting: Population, context, and schedule effects in concurrent choice. Journal of the Experimental Analysis of Behavior, 101(3), 385–405. https://doi.org/10.1002/jeab.81

Hiby, E. F., Rooney, N. J., & Bradshaw, J. W. S. (2004). Dog training methods: Their use, effectiveness and interaction with behaviour and welfare. Animal Welfare, 13(1), 63–69.

Humphreys, L. G. (1939). The effect of random alternation of reinforcement on the acquisition and extinction of conditioned eyelid reactions. Journal of Experimental Psychology, 25(2), 141–158. https://doi.org/10.1037/h0058221

Lerman, D. C., & Iwata, B. A. (1995). Prevalence of the extinction burst and its attenuation during treatment. Journal of Applied Behavior Analysis, 28(1), 93–94. https://doi.org/10.1901/jaba.1995.28-93

Lewis, D. J. (1960). Partial reinforcement: A selective review of the literature since 1950. Psychological Bulletin, 57(1), 1–28. https://doi.org/10.1037/h0044137

Mackintosh, N. J. (1974). The psychology of animal learning. Academic Press.

Mazur, J. E. (2013). Learning and behavior (7th ed.). Psychology Press.

Rooney, N. J., & Cowan, S. (2011). Training methods and owner–dog interactions: Links with dog behaviour and learning ability. Applied Animal Behaviour Science, 132(3–4), 169–177. https://doi.org/10.1016/j.applanim.2011.03.007

Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593–1599. https://doi.org/10.1126/science.275.5306.1593

Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Appleton-Century-Crofts.

Michael Sauerwein

1. Juni 2026

Back