Dissociable dopamine dynamics for learning and motivation (2019)

https://www.nature.com/articles/s41586-019-1235-y

Abstract

The dopamine projection from ventral tegmental area (VTA) to nucleus accumbens (NAc) is critical for motivation to work for rewards and reward-driven learning. How dopamine supports both functions is unclear. Dopamine cell spiking can encode prediction errors, which are vital learning signals in computational theories of adaptive behaviour. By contrast, dopamine release ramps up as animals approach rewards, mirroring reward expectation. This mismatch might reflect differences in behavioural tasks, slower changes in dopamine cell spiking or spike-independent modulation of dopamine release. Here we compare spiking of identified VTA dopamine cells with NAc dopamine release in the same decision-making task. Cues that indicate an upcoming reward increased both spiking and release. However, NAc core dopamine release also covaried with dynamically evolving reward expectations, without corresponding changes in VTA dopamine cell spiking. Our results suggest a fundamental difference in how dopamine release is regulated to achieve distinct functions: broadcast burst signals promote learning, whereas local control drives motivation.

Main

Dopamine is famously related to ‘reward’—but how exactly? One function involves learning from unexpected rewards. Brief increases in dopamine cell firing encode reward prediction errors (RPEs)1,2,3—learning signals for optimizing future motivated behaviour. Dopamine manipulations can affect learning as if they are altering RPEs4,5,6, but they also affect motivated behaviours immediately, as if dopamine signals reward expectation (value)5. Furthermore, NAc dopamine escalates during motivated approach, consistent with dopamine encoding value7,8,9,10,11.

With few exceptions2,12,13, midbrain dopamine firing has been examined during classical conditioning in head-fixed animals3,14, unlike forebrain dopamine release. We therefore compared firing with release under the same conditions. We identified VTA dopamine neurons using optogenetic tagging3,13. To measure NAc dopamine release, we used three independent methods—microdialysis, voltammetry and the optical sensor dLight15—with convergent results. Our primary conclusion is that although RPE-scaled VTA dopamine spike bursts provide abrupt changes in dopamine release appropriate for learning, separate NAc dopamine fluctuations associated with motivation arise independently from VTA dopamine cell firing.

Dopamine tracks motivation in key loci

We trained rats in an operant ‘bandit’ task5 (Fig. 1a, b). On each trial, illumination of a nose-poke port (‘Light-on’) prompted approach and entry (‘Centre-in’). After a variable hold period (0.5–1.5 s), white noise (‘Go cue’) led the rat to withdraw (‘Centre-out’) and poke an adjacent port (‘Side-in’). On rewarded trials, this Side-in event was accompanied by a food-hopper click that prompted the rat to approach a food port (‘Food-port-in’) to collect a sugar pellet. Leftward and rightward choices were each rewarded with independent probabilities, which occasionally changed without warning. When rats were more likely to receive rewards, they were more motivated to perform the task. This was apparent in their ‘latency’—the time between Light-on and Centre-inwhich was sensitive to the outcome of the preceding few trials (Extended Data Fig. 1) and thereby scaled inversely with reward rate (Fig. 1b).

Fig. 1: Dopamine release covaries with reward rate specifically in NAc core and ventral prelimbic cortex.
figure1

a, Bandit-task events. b, Example session. Top row, reward probabilities in each block (left:right); row two, ticks indicate outcome of each trial (tall, rewarded; short, unrewarded); row three, leaky-integrator estimate of reward rate (black) and running average of latency (cyan; inverted log scale); bottom row, NAc core dopamine in the same session (1-min samples). DA, dopamine. c, Top, microdialysis locations in medial frontal cortex and striatum (see also Extended Data Fig. 1). n = 51 probe locations from 12 rats, each with 2 microdialysis probes that were lowered between sessions. Bar colour indicates correlation between dopamine and reward rate. ACC, anterior cingulate cortex; dPL, dorsal prelimbic cortex; vPL, ventral prelimbic cortex; IL, infralimbic cortex; DMS, dorsal-medial striatum. Middle, averaged cross-correlograms between dopamine and reward rate. Red bars indicate 99% confidence interval from shuffled time series. Bottom, relationships between neurochemicals and reward rate (multiple regression). NA, noradrenaline; 5-HT, serotonin; ACh, acetylcholine; GABA, γ-aminobutyric acid; Glu, glutamate; NM, normetadrenaline; DOPAC, 3,4-dihydroxyphenylacetic acid; 3-MT, 3-methoxytyramine; HVA, homovanillic acid; 5-HIAA, 5-hydroxyindoleacetic acid. d, Effect of block transitions on reward rate (left), latency (middle) and NAc core dopamine (right). Transitions were classified by whether the experienced reward rate increased (n = 25) or decreased (n = 33). Data are from all 14 sessions in which NAc core dopamine was measured (one per rat, combining data from new and previously reported5 animals), and plotted as mean ± s.e.m. e, Composite maps of correlations between dopamine and reward rate (n = 19 rats, 33 sessions, 58 probe placements). Brain atlas outlines in this figure were reproduced with permission from Paxinos and Watson, 200551.

We previously reported5 a correlation between NAc dopamine release and reward rate, consistent with the motivational role of mesolimbic dopamine16. Here, we first aimed to determine whether this relationship is observed throughout forebrain targets, consistent with ‘globally broadcast’ dopamine signalling17, or is restricted to specific subregions. We further hypothesized that these dopamine dynamics would differ between striatum and cortex, as these structures have distinct dopamine uptake–degradation kinetics18 and may use dopamine for distinct functions19,20.

Using microdialysis with high performance liquid chromatography–mass spectrometry (HPLC–MS), we surveyed medial frontal cortex and striatum (Fig. 1c, Extended Data Fig. 1). We simultaneously assayed 21 neurotransmitters and metabolites with 1-min time resolution, and used regression to compare chemical time series with behavioural variables (Extended Data Fig. 2).

We replicated the correlation between reward rate and NAc dopamine—in contrast to other neurotransmitters (Fig. 1c, d). However, this relationship was localized to NAc core, and did not hold in the NAc shell or dorsal–medial striatum. Contrary to our hypothesis, we observed a similar spatial pattern in frontal cortex: dopamine release correlated with reward rate in ventral prelimbic cortex, but not in more dorsal or ventral subregions (Fig. 1c, e). Though unexpected, these twin ‘hotspots’ of value-related dopamine release have an intriguing parallel in human neuroimaging: blood oxygen level-dependent signal correlates with subjective value, specifically in NAc and ventral–medial prefrontal cortex21.

VTA firing is unrelated to motivation

We next addressed whether this motivation-related forebrain dopamine arises from variable firing of midbrain dopamine cells. The NAc core receives dopamine input from lateral portions of VTA (VTA-l)6,22,23. In head-fixed mice, VTA-l dopamine neurons reportedly have uniform, RPE-like responses to conditioned stimuli3. To record VTA-l dopamine cells, we infected the VTA with adeno-associated virus (AAV) for Cre-dependent expression of channelrhodopsin (AAV-DIO-ChR2) in rats that express Cre recombinase under a tyrosine hydroxylase (TH) promoter (see Methods). Optrodes (Fig. 2a, b) recorded single-unit responses to brief blue-laser pulses (Fig. 2c, Extended Data Figs. 3, 4, Supplementary Fig. 1). We found 27 well-isolated VTA-l cells with reliable short-latency spikes, and identified them as dopamine neurons.

Fig. 2: Activity of identified VTA dopamine neurons does not change with reward rate.
figure2

a, Left, optrode schematic with 16 tetrodes around 200-µm-diameter optic fibre. Right, example of optrode placement within lateral VTA. Scale bar, 1 mm. Red, dopamine cell marker tyrosine hydroxylase; green, ChR2–EYFP; yellow, overlap. For all placements, see Extended Data Fig. 3. b, VTA dopamine cell spikes. Red bars indicate detected bursts and number of spikes in each burst (see Methods). Scale, 0.5 s, 0.5 mV. c, Example neuron response to laser pulses of increasing duration. d, Session-wide firing rate versus spike width (at half-maximum) for each VTA cell. Blue, tagged dopamine cells; purple, a distinct cluster of presumed non-dopamine neurons. Insets, examples of average waveforms (negative voltage upwards). e, Firing rate (blue; 1-min bins) of a VTA dopamine neuron during a bandit task. Latency (cyan) covaries with reward rate, but firing rate does not. f, Firing rate for all VTA neurons (blue, dopamine; purple, non-dopamine; grey, unclassified) in low versus high reward-rate blocks. None showed significant differences (Wilcoxon signed rank test using 1-min bins, all P > 0.05 after correcting for multiple comparisons). g, Average cross-correlation between dopamine cell firing and reward rate shows no significant relationship. h, Analysis of dopamine firing rate at block transitions (same format as Fig. 1d). n = 95 reward increases, 76 decreases. i. Distributions of inter-spike-intervals (ISIs, left) and spike bursts (right) are unchanged between higher and lower reward-rate blocks (Kolmogorov–Smirnov statistics: ISIs, 0.138, P = 0.92; bursts, 0.165, P = 0.63).

All dopamine neurons were tonically active, with relatively low firing rates (mean 7.7 Hz, range 3.7–12.9 Hz; compared to all VTA-l neurons recorded together with dopamine cells, P < 0.001 one-tailed Mann–Whitney test). They also had longer-duration spike waveforms (P < 5 × 10−6, one-tailed Mann–Whitney test), although there were exceptions (Fig. 2d), which confirms that waveform duration is an insufficient marker of dopamine cells in vivo3,24. A distinct cluster of VTA-l neurons (n = 38, from the same sessions) with brief waveforms and higher firing rates (>20 Hz; mean 41.3 Hz, range 20.1–97.1 Hz) included no tagged dopamine cells. We presume that these faster-firing cells are GABAergic and/or glutamatergic3,25, and refer to them as ‘non-dopamine’ below.

We recorded the same dopamine cells across multiple behavioural tasks. VTA-l dopamine cells responded strongly to randomly timed food-hopper clicks, and progressively less strongly when these clicks were made more predictable by preceding cues (Extended Data Fig. 5). This is consistent with canonical RPE-like coding by dopamine cells in Pavlovian tasks2,3,26.

On the basis of evidence from anaesthetized animals, it has previously been argued that altered dopamine levels measured with microdialysis arise from changes in the tonic firing rate of dopamine cells27 and/or the proportion of active versus inactive dopamine neurons28. However, in the bandit task, tonic dopamine cell firing in each block of trials was indifferent to reward rate (Fig. 2e, g). There was no significant change in the firing rates of individual dopamine cells, or those of any other VTA-l neurons, between higher- and lower-reward blocks (Fig. 2f, h; see also ref. 29 for concordant results in head-fixed mice). There was also no overall change in the rate at which dopamine cells fire bursts of spikes (Fig. 2i). Furthermore, we did not observe any dopamine cells switching between active and inactive states. The proportion of time dopamine cells spent inactive (long inter-spike intervals) was very low, and did not change between higher- and lower-reward blocks (Fig. 2i).

The anatomy of the VTA–NAc dopamine projection has been intensively investigated6,22,23, but—given this apparent functional mismatch between firing and release—we reconfirmed that we were recording from the correct portion of the VTA. Small injections of the retrograde tracer cholera toxin B (CTb) into NAc core resulted in dense labelling of TH+ neurons within the same VTA-l area as our optrode recordings (Extended Data Fig. 3). Within the approximate recording zone, 21% of TH+ cells were also CTb+, and this is likely to be an underestimate of the fraction of NAc core-projecting VTA-l dopamine cells, as our tracer injections did not completely fill the NAc core. Thus, our sample of n = 27 tagged VTA dopamine cells (plus many more untagged cells) almost certainly includes NAc core-projecting neurons. Finally, in an additional rat we recorded two tagged VTA-l dopamine cells after infusing AAV selectively into the NAc core (Extended Data Fig. 3). Both retrogradely infected cells had firing patterns that closely resembled the other tagged dopamine cells in all respects, including a lack of tonic firing changes with varying reward rate (Supplementary Fig. 1). We conclude that changes in tonic VTA-l dopamine cell firing are not responsible for motivation-related changes in forebrain dopamine release.

Tracking release on multiple timescales

Does NAc dopamine release track reward rate per se, as suggested in some theories30, or is this correlation driven by dynamic fluctuations in dopamine release that are too fast to resolve with microdialysis? We argued for the latter possibility on the basis of voltammetry data5, but sought confirmation using an independent measure of dopamine release that can span different timescales. The dLight1 suite of genetically encoded optical dopamine indicators was engineered by inserting circularly permutated GFP into dopamine D1 receptors15. Binding of dopamine causes a highly specific increase in fluorescence (Fig. 3a). We infused AAV into NAc to express either dLight1.1 (four verified NAc placements from three rats) or the brighter variant dLight1.3b (six verified NAc placements from four rats) and monitored fluorescence by fibre photometry. We observed clear NAc dopamine responses to Pavlovian reward-predictive cues, similarly to VTA dopamine cell firing (Extended Data Fig. 5).

Fig. 3: Bridging timescales of dopamine measurement.
figure3

a, Fluorescence response of dLight1.3b. Inset, titrations of dopamine (n = 15 regions of interest (ROIs)) and noradrenaline (n = 9). Main figure, bath-applied neurotransmitters (all n = 12 ROIs). His, histamine. b, Sample bandit session including normalized NAc dLight1.3b signal (1-min bins). c, dLight signal changes with block transitions. n = 35 reward rate increases, 45 decreases. d, Cross-correlation between dLight and reward rate. e, Closer view of the shaded portion of b. Arrows: black, Centre-nose-in; light red, Side-in (rewarded); light blue, Side-in (unrewarded); dark red, Food-port-in (rewarded); dark blue, Food-port-in (unrewarded). Next rows: leaky-integrator estimate of reward rate; dLight at low resolution (1 min); dLight at high resolution (50 Hz, green; five-point median-filtered, black); model state values (cyan); and RPEs (magenta). After several unrewarded trials, state values early in the trial are low, then reward delivery evokes a positive RPE and accompanying sharp increase in dopamine. Successive rewarded trials diminish RPEs, but increase state values, accompanied by ramping dopamine. f, Short timescale crosscorrelations show close relationship between dLight and value, and smaller relationship to RPE. g, Within-trial correlations between model variables and dLight with different lags; correlation to both value and RPE is strongest to dLight about 0.3 s later. h, In all sessions, maximum correlation was greater for value than for RPE or reward rate.

For the bandit task, we first examined the dLight signal in 1-min bins (Fig. 3b) for comparison to microdialysis. We again saw a clear relationship between NAc dopamine release and reward rate, in both cross-correlation and analysis of block transitions (Fig. 3c, d). We next examined more closely how this relationship arises. Rather than slowly varying on a timescale of minutes, the dLight signal showed highly dynamic fluctuations within and between each trial (Fig. 3e). We compared these fluctuations to instantaneous state values and RPEs estimated from a reinforcement-learning model (a semi-Markov decision process5). As was previously reported using voltammetry5, moment-by-moment NAc dopamine showed a strong correlation with state values (Fig. 3f), visible as ramping up within trials when rewards were expected (Fig. 3e). We also saw transient increases with less-expected reward deliveries, consistent with RPE (examined below). In every dLight session, dopamine showed a stronger correlation with values than either RPEs or reward rate (Fig. 3h, Extended Data Fig. 6). Correlations with both state values and RPE were maximal with respect to the dLight signal ~0.3 s later, consistent with a brief lag caused by neural processing of cues and sensor-response time (Fig. 3g; with voltammetry, we reported a lag of 0.4–0.5 s)5.

Dopamine firing does not explain release

We next compared dopamine cell firing and release around bandit-task events. External stimuli at Light-on, Go cue and rewarded Side-in (food-hopper click) each evoked a rapid firing increase (Fig. 4a). These responses were observed in the great majority of dopamine cells (Fig. 4c), although the relative magnitude of responses to different cues varied from cell to cell (Supplementary Fig. 1). The NAc dLight signal also responded rapidly and reliably to each of these salient cues (Fig. 4b, c), consistent with burst firing of dopamine cells driving dopamine release.

Fig. 4: Phasic VTA dopamine firing does not account for NAc dopamine dynamics.
figure4

a, Event-aligned activity of VTA-l dopamine cells. Top, spike rasters for one representative cell; bottom, average spike rate (n = 29). In all panels, error bands indicate ± s.e.m. b, Event-aligned NAc dLight. Top, representative session; bottom, average (n = 10), normalized to peak rewarded Side-in response. Throughout this figure, dLight signals are shown relative to a 2-s ‘baseline’ epoch ending 1 s before Centre-in. Note increases (arrows) shortly before Centre-in and Food-port-in. c, Cumulative distributions of time taken for dopamine cells (solid; n = 29), dLight (dashed; n = 10), to increase following cue onsets (shuffle test compared to baseline, 10,000 shuffles, P < 0.01, multiple comparisons corrected). For Light-on, only latencies <1 s included; for Side-in only rewarded trials. Median latencies (from sigmoid fit): Light-on, firing 152 ms, dLight 266 ms; Go cue, firing 67 ms, dLight 212 ms; Side-in, firing 85 ms, dLight 129 ms. Non-dopamine cells were typically indifferent to cue onsets (Extended Data Fig. 8). d, Distinct cue-evoked, approach-related dopamine release. Top, average dopamine cell firing (n = 29); middle, average dLight (n = 10); bottom, voltammetry (n = 6), normalized to peak short-latency Light-on response. Left panels, latencies <1 s, right, latencies >2 s. Data are aligned on Light-on (solid) or Centre-in (dotted); red dashed line, median latency. For longer latencies there is no increase in firing near Centre-in, but dLight and voltammetry show a marked increase. e, Scatter plot comparing peak signals aligned on Light-on (y axis) or Centre-in (x axis). For each cell, session connected lines indicate data for distinct latency ranges (<1 s, >2 s). Dopamine firing (top) consistently shows Light-on response for short-latency trials (two-way analysis of variance (ANOVA), alignment × latency interaction, F = 7.47, P = 0.0008). dLight (middle), voltammetry (bottom) signals are consistently better aligned to Centre-in (two-way ANOVA for dLight: alignment × latency interaction, F = 9.28, P = 0.0043). f, Dopamine increases during approach, quantified as ramp angle (see Methods). Circles indicate individual dopamine cells (n = 29), dLight sessions (n = 10).

We also saw clear increases in NAc dopamine release as rats approached the start port (just before Centre-in) and the food port (just before Food-port-in). This fits well with the extensive voltammetry literature showing that motivated approach behaviours are accompanied by rapid increases in NAc core dopamine5,7,8,9,10,11. However, the VTA-l dopamine cell population did not show a corresponding increase in firing at these times (Fig. 4a; see Extended Data Fig. 7 for additional comparisons, including to non-dopamine cells).

To better dissociate cue-evoked, and approach-related, dopamine activity, we separated trials by short (<1 s) and long (>2 s) latencies (Fig. 4d, e). Increases in dopamine cell firing were consistently locked to the cue onset at Light-on, preferentially for short-latency trials. All 25 dopamine cells with significant firing rate increases after Light-on were better aligned to Light-on than Centre-in (Fig. 4e). By contrast, increases in NAc dopamine release before Centre-in were distinct from cue-evoked dopamine release (Fig. 4d, e). dLight signals consistently increased before Centre-in on long-latency trials (ten out of ten sessions) and before food-port-in (nine out of ten sessions), without corresponding increases in dopamine firing (Fig. 4f).

Finally we considered how event-related dopamine signals depend on recent reward history. During the early part of each trial, dopamine cell firing was not dependent on reward rate (Fig. 5a), despite the influence of reward rate on motivation (Fig. 5b). Subsequently, the phasic response to the reward cue at Side-in was reliably stronger when the reward rate was lower (Fig. 5a), consistent with positive RPE encoding. When this reward cue was omitted, dopamine cells paused firing, though encoding of negative RPEs was much weaker or absent, whether examined at the population level (Fig. 5a, b) or as individual cells (Extended Data Fig. 8). It has previously been proposed that negative RPEs are encoded in the duration of dopamine pauses31, but this was observed in just 2 out of 29 individual neurons. Similar results were obtained if reward expectation was estimated in other ways, including trial-based reinforcement learning models (actor-critic and Q-learning) or simply by counting recent rewards (Extended Data Fig. 8).

Fig. 5: Reward history affects VTA dopamine cell firing and NAc dopamine release differently.
figure5

a, Top, averaged firing rates of dopamine cells (n = 29) aligned to Side-in, broken down by reward rate (terciles, calculated separately for each cell). Before Side-in, activity does not depend on reward expectation. After Side-in rewarded (red) and unrewarded (blue) trials are shown separately. Food-click response is stronger when reward rate is low, consistent with encoding of positive RPEs. Bottom, fraction of individual dopamine cells with a firing rate that significantly varies with reward rate at each moment (shuffle test, P < 0.01, multiple comparisons corrected). Tick marks at top indicate times when this fraction was significantly higher than chance (binomial, P < 0.01). After Side-in, only negative correlations are tested—that is, potential RPE coding. b, Regression plots for sessions with recorded dopamine cells, showing the effect of recent reward history on (log-) latency (top) and dopamine spiking. Asterisks indicate significant regression weights (t-test, P < 0.05). During the 0.5 s before Go cue (while rat must maintain a steady nose poke for trial to proceed) dopamine spiking is unaffected by reward history (middle). This changes once the outcome is revealed (bottom; assessing peak or trough of activity in the 0.5 s after Side-in), but only for rewarded trials. c, d, Same as above, except for dLight (normalized to peak Side-in response). Dopamine release reliably scales with reward rate even before Side-in.

Dopamine release at Side-in also showed a clear, transient encoding of positive RPEs, but not of negative RPEs (Fig. 5c, d). This dLight response was slightly delayed and prolonged compared to firing, consistent with time taken for release and reuptake32, but remained a subsecond phenomenon. Unlike firing, however, dLight signals early in each trial were greater when recent trials had been rewarded (Fig. 5c), consistent with value coding. We observed this dependence on reward history even when the rat was not actively moving, but was maintaining a nose poke in the centre port while waiting for the Go cue (Fig. 5d). Overall, we conclude that NAc dopamine release reflects both cue-evoked responses and reward expectation, and that only the former can be well accounted for by VTA-l dopamine cell firing.

Discussion

VTA-l provides the predominant source of dopamine to the NAc core6,23,24. VTA-l dopamine cells, including those that project to the NAc core, consistently display RPE-encoding bursts3,12. VTA bursts are thought to be particularly important for driving NAc dopamine32, and indeed we found that cue-evoked VTA bursts were matched by NAc release. However, we additionally found value-related patterns of NAc dopamine release that were not generated by firing of VTA-l dopamine cells, either on long (tonic) or short (phasic) timescales. Other dopamine subpopulations may carry distinct signals13,33,34, and we cannot rule out the possibility that firing of dopamine cell subpopulations not recorded from here produces value-related dopamine in NAc core. However, value-related firing has never been reported for any dopamine cells, across a wide range of studies. Our results suggest that NAc dopamine dynamics are controlled in different ways, at different times and for different functions, and that recording dopamine cells is important but not sufficient for understanding dopamine signals35.

Release from dopamine terminals is potently influenced by local, non-spiking mechanisms36,37,38,39,40. For example, NAc dopamine release is modulated by the basolateral amygdala even when VTA spiking is pharmacologically suppressed41,42. It has been noted for decades that local control of dopamine release might achieve functions distinct from those of dopamine cell spiking36,43, but this has not been incorporated into theoretical views of dopamine. Distinct striatal subregions contribute to different types of decisions, and may influence their own dopamine release according to need44. It remains to be determined just how localized this control of dopamine release can be. One limitation shared by the 3 ways that we measured dopamine release is that they all sample on a spatial scale of at least 100 µm, whereas in vivo microscopy suggests that dopamine release may be heterogeneous at considerably smaller scales15.

Our results do not support the existence of any separate tonic dopamine signal that could mediate motivational effects of dopamine. Instead, dopamine shifts that appear slow if measured slowly (with microdialysis) resolve into rapid fluctuations if measured rapidly (with voltammetry or dLight). Furthermore, recordings of identified VTA dopamine cells by ourselves and others30 provide strong evidence against the idea29 that changes in tonic dopamine cell firing drive tonic changes in dopamine release. Although tonic firing can be altered by lesions or drug manipulations28, we are not aware of sustained changes in firing rate in any behavioural task. Firing can ramp downwards on a timescale of about 1 s during anticipation of motivationally relevant events45,46. However, this decline is the opposite of what would be required to boost dopamine release with reward expectation, and instead bears more resemblance to a sequence of transient negative prediction errors47. Although sustained signals encoding ongoing reward rate could be computationally useful30, dopamine instead provides rapidly fluctuating error and value signals. It remains possible that sustained signals are computed at a subsequent step, by intracellular signalling pathways downstream of dopamine receptors.

Many groups have observed ramping dopamine release as rats approach rewards5,7,8,9,10,11, consistent with encoding escalating reward expectations. Some have argued that these dopamine ramps simply reflect RPEs, by supposing that rats either rapidly forget values48 or that they have a warped set of state representations49. This latter idea is not supported by our observation that ramping is rapidly modulated from trial to trial on the basis of updated reward expectations, becoming stronger within a short sequence of successive rewards while RPE-like responses to cues become weaker (Fig. 3e). More generally, any theory in which dopamine solely conveys RPEs (learning signals) cannot account for the very well-established connection between ongoing mesolimbic dopamine and motivation16. The NAc core is not needed for highly trained responses to conditioned stimuli, but is particularly important when deciding to perform time-consuming work to obtain rewards50. NAc core dopamine appears to provide an essential dynamic signal of how worthwhile it is to allocate time and effort to work5,44, even though this signal is not present in VTA dopamine cell firing.

Methods

Animals

All animal procedures were approved by the University of Michigan or University of California San Francisco Institutional Committees on Use and Care of Animals. Male rats (300–500 g, either wild-type Long-Evans or TH-Cre+ with a Long-Evans background52) were maintained on a reverse 12:12 light:dark cycle and tested during the dark phase. Rats were mildly food deprived, receiving 15 g of standard laboratory rat chow daily in addition to food rewards earned during task performance. No sample size precalculation was performed. The investigators were not blinded to allocation during experiments and outcome assessment.


Behaviour

Pretraining and testing were performed in computer-controlled Med Associates operant chambers (25 cm × 30 cm at widest point) each with a five-hole nose-poke wall, as previously described5. Bandit-task sessions used the following parameters: block lengths were 35-45 trials, randomly selected for each block; hold period before Go cue was 500–1,500 ms (uniform distribution); left–right reward probabilities were 10, 50 and 90% (for electrophysiology, photometry, voltammetry and previously reported microdialysis rats5) or 20, 50 and 80% (newly reported microdialysis rats).

Current reward rate was estimated using a time-based leaky-integrator53. Reward rate was incremented each time a reward was received, and decayed exponentially at a rate set by parameter τ (the time in s for the reward rate to decrease by ~63%, that is, 1−1/e). For all analyses, τ was selected on the basis of the rat’s behaviour, maximizing the (negative) correlation between reward rate and log(latency) in each session. The correlations between forebrain dopamine and reward rate were not highly sensitive to this choice of τ (Extended Data Fig. 1).

To classify block transitions as ‘increasing’ or ‘decreasing’ in reward rate, we compared the average leaky-integrator reward rate in the last 5 min of a block to the average reward rate in the first 8 min of the subsequent block.

Rats used for electrophysiology and photometry also performed a Pavlovian approach task, in the same operant chamber with the houselight on throughout the session. Three auditory cues (2 kHz, 5 kHz and 9 kHz) were associated with different probabilities of food delivery (counterbalanced across rats). Cues were played as a train of tone pips (100 ms on, 50 ms off) for a total duration of 2.6 s followed by a delay period of 500 ms. Cues and unpredicted reward deliveries were delivered in pseudorandom order with a variable inter-trial interval (15–30 s, uniform distribution).


Microdialysis

Surgery

Rats were implanted bilaterally with guide cannulae (CMA, 830 9024) in cortex and striatum. One group (n = 8) received one guide cannula targeting prelimbic and infralimbic cortex (anteroposterior (AP) +3.2 mm, mediolateral (ML) 0.6 mm relative to bregma; and dorsoventral (DV) 1.4 mm below brain surface) and another targeting dorsomedial striatum and nucleus accumbens in the opposite hemisphere (AP +1.3, ML 1.9 and DV 3.4). Both implants were angled 5 degrees away from each other along the rostral–caudal plane. A second group (n = 4) received one guide cannula targeting anterior cingulate cortex (AP +1.6, ML 0.8 and DV 0.8) and another targeting accumbens (core/shell in the opposite hemisphere at AP +1.6, ML 1.4 and DV 5.5 (n = 2) or AP +1.6, ML 1.9 and DV 5.7 (n = 2). Implant sides were counterbalanced across rats. Animals were allowed to recover for one week before retraining.

Chemicals

Water, methanol, and acetonitrile for mobile phases were Burdick & Jackson HPLC grade, purchased from VWR (Radnor). All other chemicals were purchased from Sigma Aldrich unless otherwise noted. Artificial cerebrospinal fluid (aCSF) comprised 145 mM NaCl, 2.68 mM KCl, 1.40 mM CaCl2, 1.01 mM MgSO4, 1.55 mM Na2HPO4 and 0.45 mM NaH2PO4, adjusted pH to 7.4 with NaOH. Ascorbic acid (250 nM final concentration) was added to reduce oxidation of analytes.

Sample collection and HPLC-MS

On testing day, animals were placed in the operant chamber with the houselight on. Custom-made concentric polyacrylonitrile membrane microdialysis probes (1-mm dialysing AN69 membrane; Hospal) were inserted bilaterally into guide cannula and perfused continuously (Chemyx, Fusion 400) with aCSF at 2 µl/min for 90 min to allow equilibration. After 5-min baseline collection the houselight was extinguished, cueing the animal to bandit-task availability. Sample collection continued at 1-min intervals and samples were immediately derivatized54 with 1.5 µl sodium carbonate, 100 mM; 1.5 µl benzoyl chloride (2% (v/v) benzoyl chloride in acetonitrile); and 1.5 µl isotopically labelled internal standard mixture diluted in 50% (v/v) acetonitrile containing 1% (v/v) sulfuric acid, and spiked with deuterated ACh and choline (C/D/N isotopes) to a final concentration of 20 nM. Sample series collection alternated between the two probes at 30-s intervals in each of 26 sessions, except for one session in which a broken membrane resulted in just one series (51 sample series total). Samples were analysed using Thermo Scientific UHPLC systems (Accela, or Vanquish Horizon interfaced to a Quantum Ultra triple quadrupole mass spectrometer fitted with a HESI II ESI probe), operating in multiple reaction monitoring. Five-microlitre samples were injected onto a Phenomenex core-shell biphenyl Kinetex HPLC column (2.1 mm × 100 mm). Mobile phase A was 10 mM ammonium formate with 0.15% formic acid, and mobile phase B was acetonitrile. The mobile phase was delivered an elution gradient at 450 µl/min as follows: initial, 0% B; 0.01 min, 19% B; 1 min, 26% B; 1.5 min, 75% B; 2.5 min, 100% B; 3 min, 100% B; 3.1 min, 5% B; and 3.5 min, 5% B. Thermo Xcalibur QuanBrowser (Thermo Fisher Scientific) was used to automatically process and integrate peaks. Each of the >100,000 peaks were visually inspected individually to ensure proper integration.

Analysis

All neurochemical concentration data were smoothed with a three-point moving average (y′ = [0.25 × (y−1) + 0.5y + 0.25 × (y+1)]) and z-score normalized within each session to facilitate between-session comparisons. For each target region, a cross-correlogram was generated for each session and the average of the sessions was plotted. One-per cent confidence boundaries were generated for each subplot by shuffling one time series 100,000 times and generating a distribution of correlation coefficients for each session. Multiple regression models were generated using the regress function in MATLAB, with the neurochemical as the outcome variable and behavioural metrics as predictors. Regression coefficients were determined significant at three alpha levels (0.05, 0.0005 and 0.000005), after Bonferroni-correction for multiple comparisons (alpha/(21 chemicals × 7 regions × 9 behavioural regressors)). For analysis of block transitions data were binned into 3-min epochs, discarding the sample that included the transition time.


Electrophysiology

Rats (n = 25) were implanted with custom-designed drivable optrodes, each consisting of 16 tetrodes (constructed from 12.5-µm nichrome wire, Sandvik) glued onto the side of a 200-µm optic fibre and extending up to 500 µm below the fibre tip. During the same surgery, we injected 1 µl AAV2/5-EF1a-DIO-ChR2(H134R)-EYFP into the lateral VTA (AP 5.6, ML 0.8, DV 7.5) or NAc core (AP 1.6, ML 1.6, DV 6.4). Wideband (1–9,000 Hz) brain signals were sampled (30,000 samples per s) using Intan digital headstages. Optrodes were lowered at least 80 µm at the end of each recording session. Individual units were isolated offline using a MATLAB implementation of MountainSort55 followed by careful manual inspection.

Classification

To identify whether an isolated VTA-l unit was dopaminergic (TH+), we used the stimulus-associated latency test56. In brief, at the end of each experimental session, we connected the optrode to a laser diode and delivered light pulse trains of different widths and frequencies. For a unit to be identified as light-responsive it needed to reach the significance level of P < 0.001 for 5-ms and 10-ms pulse trains. We also compared the light evoked waveforms (within 10 ms of laser pulse onset) to session-wide averages; all light-evoked units had a Pearson correlation coefficient of >0.9. Dopamine neurons were successfully recorded from four rats with VTA-l AAV infusions (IM657, 1 unit; IM1002, 3 units; IM1003, 15 units; IM1037, 9 units) and one rat with NAc core AAV (IM-1078, 2 units). Peak width was defined as the full-width-at-half-maximum of the most prominent negative component of the aligned, averaged spike waveform. Non-tagged VTA neurons with session-wide firing rate >20 Hz and peak width <200 µs were classified as non-dopamine cells. To ensure that we were comparing dopamine and non-dopamine cells within the same subregions, we only analysed non-dopamine cells recorded during sessions with at least one optically tagged dopamine cell.

Analysis

Spike bursts were detected by the conventional ‘80/160 template’ approach57: each time an inter-spike-interval of 80 ms or less occurs, these and subsequent spikes are considered part of a burst until there is an interval of 160 ms or more. For comparison of ‘tonic’ firing to reward rate, dopamine spikes were counted in 1-min bins. To examine faster changes, spike density functions were constructed by convolving spike trains with a Gaussian kernel with variance 20 ms. To determine how quickly a neuron responded to a given cue, we used 40-ms bins (sliding in steps of 20 ms) and used a shuffle test (10,000 shuffles) for each time bin comparing the firing rate after cue onset to firing rate in the 250 ms immediately preceding the cue. The first bin at which the post-cue firing rate was significantly (P < 0.01, correcting for multiple comparisons) greater than baseline firing was considered the time to cue response.

Peak firing rate was calculated as the maximum (Gaussian-smoothed) firing rate of each trial in a 250-ms window after side-in for rewarded trials, and the valley was calculated as the minimum firing rate in a 2-s window, starting one second after side-in for unrewarded trials.

To calculate a ramp angle during approach behaviours, we smoothed mean firing rates with a 50-ms Gaussian kernel, detected the maximum/minimum of the resulting signal in a 0.5-s window before each event (centre-in or food-port-in) and measured the signed angle connecting the two extrema. To compare firing rates in ‘high’ and ‘low’ reward blocks, for each session we performed a median split of average leaky-integrator reward rate in each block.


Voltammetry and computational model

Fast-scan cyclic voltammetry results shown here reanalyse data previously presented in detail5. Within-trial estimates of state value and reward prediction errors were calculated using a semi-Markov decision process reinforcement learning model, exactly as previously described5.


Photometry

We used a viral approach to express the genetically encoded optical dopamine sensor dLight15. Under isoflurane anaesthesia, 1 μl of AAV9-CAG-dLight (1 × 1012 viral genomes per ml; UC Davis vector core) was slowly (100 nl/min) injected (Nanoject III, Drummond) through a 30-µm glass micropipette in ventral striatum bilaterally (AP: 1.7 mm, ML: 1.7 mm, DV: −7.0 mm). During the same surgery optical fibres (400-µm core, 430-µm total diameter) attached to a metal ferrule (Doric) were inserted (target depth 200 µm higher than AAV) and cemented in place. Data were collected > three weeks later, to allow for dLight expression.

For dLight excitation blue (470 nm) and violet (405 nm; control) LEDs were sinusoidally modulated at distinct frequencies (211 Hz and 531 Hz, respectively58). Both excitation and emission signals passed through minicube filters (Doric) and bulk fluorescence was measured with a femtowatt detector (Newport, Model 2151) sampling at 10 kHz. Demodulation produced separate 470 nm (dopamine) and 405 nm (control) signals, which were then rescaled to each other via a least-square fit58. Fractional fluorescence signal (dF/F) was then defined as (470–405_fit)/405_fit. For all analyses this signal was downsampled to 50 Hz and smoothed with a five-point median filter. For presentation of 470 nm and 405 nm signals separately, see Extended Data Fig. 7.

Data from an optic fibre placement were included in analyses if the fibre tip was in NAc, and the fluorescence response to at least one task cue had a z-score of >1. These criteria excluded one rat, and yielded three rats/four placements (IM1065-left, IM1066-bilateral, IM1089-right) for dLight1.1, and four rats/six placements (IM1088-bilateral, IM1105-right, IM1106-bilateral, IM1107-right) for dLight1.3b. Similar results were obtained for dLight1.1 and dLight1.3 (Extended Data Fig. 7), so data were combined.

To calculate a ramp angle during approach behaviours, we detected the maximum/minimum of the resulting signal in a 0.5-s window before each event (centre-in or food-port-in) and measured the signed angle connecting the two extrema.


Affinity and molecular specificity of dLight1.3b

In vitro measurements were performed as previously described15. In brief, HEK293T (ATCC CRL#1573) cells were cultured and transfected with plasmids encoding dlight1.3b driven by a CMV promoter, and washed with HBSS (Life Technologies) supplemented with Ca2+ (4mM) and Mg2+ (2 mM) before imaging. Imaging was performed using a 40× oil-based objective on an inverted Zeiss Observer LSN710 confocal microscope with 488 nm/513 nm (excitation/emission) wavelengths. For testing the sensor’s fluorescence responses, neurotransmitters were directly applied to the bath during time-lapse imaging, in at least two independent experiments. Titrations of dopamine and noradrenaline were obtained by performing tenfold serial dilutions to achieve eight different concentrations. All other neurotransmitters were tested at three sequential concentrations (100 nM, 1 µM and 10 µM). All neurotransmitter concentrations were obtained by dilution from a 1 mM stock concentration in HBSS, prepared fresh. Raw fluorescence intensities from time lapse imaging were quantified on Fiji; each ROI was manually drawn on the membrane of individual cells. Fluorescent fold change (ΔF/F) was calculated as F peak (averaged fluorescence intensity of four frames) − F basal (averaged fluorescence intensity of four frames before addition of ligands)/F basal. Graphs and statistical analysis were performed using GraphPad Prism 6. Data points were analysed with a one-site specific binding curve fit to obtain Kd values. In box-and-whisker plots, the box covers the 25% to 75% range and whiskers extend from minimum to maximum values.


Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

The AAV.Synapsin.dLight1.3b virus used in this study has been deposited with Addgene (no. 125560; http://www.addgene.org). All data will be available through the Collaborative Research in Computational Neuroscience data sharing website (https://doi.org/110.6080/K0VQ30V9).

Code availability

Custom MATLAB code is available on request from J.D.B.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

  2. 2.

    Pan, W. X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).

  3. 3.

    Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

  4. 4.

    Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).

  5. 5.

    Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).

  6. 6.

    Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 (2018).

  7. 7.

    Phillips, P. E., Stuber, G. D., Heien, M. L., Wightman, R. M. & Carelli, R. M. Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003).

  8. 8.

    Roitman, M. F., Stuber, G. D., Phillips, P. E., Wightman, R. M. & Carelli, R. M. Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24, 1265–1271 (2004).

  9. 9.

    Wassum, K. M., Ostlund, S. B. & Maidment, N. T. Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task. Biol. Psychiatry 71, 846–854 (2012).

  10. 10.

    Howe, M. W., Tierney, P. L., Sandberg, S. G., Phillips, P. E. & Graybiel, A. M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).

  11. 11.

    Syed, E. C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).

  12. 12.

    Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006).

  13. 13.

    da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).

  14. 14.

    Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).

  15. 15.

    Patriarchi, T., Cho, J. R., Merten, K., Howe, M. W., et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018).

  16. 16.

    Salamone, J. D. & Correa, M. The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485 (2012).

  17. 17.

    Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).

  18. 18.

    Garris, P. A. & Wightman, R. M. Different kinetics govern dopaminergic transmission in the amygdala, prefrontal cortex, and striatum: an in vivo voltammetric study. J. Neurosci. 14, 442–450 (1994).

  19. 19.

    Frank, M. J., Doll, B. B., Oas-Terpstra, J. & Moreno, F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009).

  20. 20.

    St Onge, J. R., Ahn, S., Phillips, A. G. & Floresco, S. B. Dynamic fluctuations in dopamine efflux in the prefrontal cortex and nucleus accumbens during risk-based decision making. J. Neurosci. 32, 16880–16891 (2012).

  21. 21.

    Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013).

  22. 22.

    Ikemoto, S. Dopamine reward circuitry: two projection systems from the ventral midbrain to the nucleus accumbens-olfactory tubercle complex. Brain Res. Brain Res. Rev. 56, 27–78 (2007).

  23. 23.

    Breton, J. M. et al. Relative contributions and mapping of ventral tegmental area dopamine and GABA neurons by projection target in the rat. J. Comp. Neurol. (2018).

  24. 24.

    Ungless, M. A., Magill, P. J. & Bolam, J. P. Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303, 2040–2042 (2004).

  25. 25.

    Morales, M. & Margolis, E. B. Ventral tegmental area: cellular heterogeneity, connectivity and behaviour. Nat. Rev. Neurosci. 18, 73–85 (2017).

  26. 26.

    Morris, G., Arkadir, D., Nevet, A., Vaadia, E. & Bergman, H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143 (2004).

  27. 27.

    Floresco, S. B., West, A. R., Ash, B., Moore, H. & Grace, A. A. Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat. Neurosci. 6, 968–973 (2003).

  28. 28.

    Grace, A. A. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nat. Rev. Neurosci. 17, 524–532 (2016).

  29. 29.

    Cohen, J. Y., Amoroso, M. W. & Uchida, N. Serotonergic neurons signal reward and punishment on multiple timescales. eLife 4, e06346 (2015).

  30. 30.

    Niv, Y., Daw, N. & Dayan, P. How fast to work: response vigor, motivation and tonic dopamine. Adv. Neural Inf. Process. Syst. 18, 1019 (2006).

  31. 31.

    Bayer, H. M., Lau, B. & Glimcher, P. W. Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol. 98, 1428–1439 (2007).

  32. 32.

    Chergui, K., Suaud-Chagny, M. F. & Gonon, F. Nonlinear relationship between impulse flow, dopamine release and dopamine elimination in the rat brain in vivo. Neuroscience 62, 641–645 (1994).

  33. 33.

    Parker, N. F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).

  34. 34.

    Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).

  35. 35.

    Trulson, M. E. Simultaneous recording of substantia nigra neurons and voltammetric release of dopamine in the caudate of behaving cats. Brain Res. Bull. 15, 221–223 (1985).

  36. 36.

    Glowinski, J., Chéramy, A., Romo, R. & Barbeito, L. Presynaptic regulation of dopaminergic transmission in the striatum. Cell. Mol. Neurobiol. 8, 7–17 (1988).

  37. 37.

    Zhou, F. M., Liang, Y. & Dani, J. A. Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat. Neurosci. 4, 1224–1229 (2001).

  38. 38.

    Threlfell, S. et al. Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons. Neuron 75, 58–64 (2012).

  39. 39.

    Cachope, R. et al. Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. Cell Reports 2, 33–41 (2012).

  40. 40.

    Sulzer, D., Cragg, S. J. & Rice, M. E. Striatal dopamine neurotransmission: regulation of release and uptake. Basal Ganglia 6, 123–148 (2016).

  41. 41.

    Floresco, S. B., Yang, C. R., Phillips, A. G. & Blaha, C. D. Basolateral amygdala stimulation evokes glutamate receptor-dependent dopamine efflux in the nucleus accumbens of the anaesthetized rat. Eur. J. Neurosci. 10, 1241–1251 (1998).

  42. 42.

    Jones, J. L. et al. Basolateral amygdala modulates terminal dopamine release in the nucleus accumbens and conditioned responding. Biol. Psychiatry 67, 737–744 (2010).

  43. 43.

    Schultz, W. Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J. Neurophysiol. 56, 1439–1461 (1986).

  44. 44.

    Berke, J. D. What does dopamine mean? Nat. Neurosci. 21, 787–793 (2018).

  45. 45.

    Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron 67, 144–155 (2010).

  46. 46.

    Pasquereau, B. & Turner, R. S. Dopamine neurons encode errors in predicting movement trigger occurrence. J. Neurophysiol. 113, 1110–1123 (2015).

  47. 47.

    Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).

  48. 48.

    Morita, K. & Kato, A. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front. Neural Circuits 8, 36 (2014).

  49. 49.

    Gershman, S. J. Dopamine ramps are a consequence of reward prediction errors. Neural Comput. 26, 467–471 (2014).

  50. 50.

    Nicola, S. M. The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600 (2010).

  51. 51.

    Paxinos, G. & Watson, C. The Rat Brain in Stereotaxic Coordinates 5th edn (Elsevier Academic, 2005).

  52. 52.

    Witten, I. B. et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011).

  53. 53.

    Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004).

  54. 54.

    Wong, J. M. et al. Benzoyl chloride derivatization with liquid chromatography-mass spectrometry for targeted metabolomics of neurochemicals in biological samples. J. Chromatogr. A 1446, 78–90 (2016).

  55. 55.

    Chung, J. E. et al. A fully automated approach to spike sorting. Neuron 95, 1381–1394 (2017).

  56. 56.

    Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013).

  57. 57.

    Grace, A. A. & Bunney, B. S. The control of firing pattern in nigral dopamine neurons: burst firing. J. Neurosci. 4, 2877–2890 (1984).

  58. 58.

    Lerner, T. N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).

Download references

Acknowledgements

We thank P. Dayan, H. Fields, L. Frank, C. Donaghue and T. Faust for their comments on an early version of the manuscript, and V. Hetrick, R. Hashim and T. Davidson for technical assistance and advice. This work was supported by the National Institute on Drug Abuse, the National Institute of Mental Health, the National Institute on Neurological Disorders and Stroke, the University of Michigan, Ann Arbor, and the University of California, San Francisco.

Reviewer information

Nature thanks Margaret Rice and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

A.M. performed and analysed the electrophysiology and photometry, and applied the computational model. J.R.P. performed and analysed the microdialysis with assistance from J.-M.T.W. and supervision by R.T.K. A.A.H. developed the behavioural task and initial photometry setup, and performed the voltammetry. L.T.V. performed retrograde tracing and analysis. T.P. and L.T. developed the dLight sensor and shared expertise. J.D.B. designed and supervised the study, and wrote the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to Joshua D. Berke.