Screwcap Games · Research Note · v1.0

The Free Leaderboard Paradox

Why calibration scores belong in the open, not behind the paywall: reactance, network effects, and the pricing logic of earned conviction.

The standard freemium logic says: lock the social proof, sell the depth. We tried that. It failed, and the failure taught us something specific about how traders — and people more generally — respond to being told their performance is a premium feature. This note makes the case for an inverted model in which the leaderboard is free, immutable, and public, while the analytical depth that transforms a score into an edge sits behind the subscription. We ground the choice in reactance theory, network-effect economics, and the behavioural ethics of choice architecture, and we document the design decisions that got us here — including the ones we reversed.

§1 The instinct to gate

Every product team that builds a competitive or social feature eventually faces the same temptation: keep the best stuff behind the paywall. The reasoning is clean, conventional, and almost always wrong in the same specific way.

The argument for gating the leaderboard runs like this. A leaderboard is valuable — players care about it, share it, return to it. Valuable things should support the unit economics of the product. If the leaderboard is free, you have given away the highest-engagement feature without capturing any of its value. If you move it behind the subscription, you convert a cost centre into a revenue driver and simultaneously give users a reason to pay.

This is not a stupid argument. It is the argument that justified gym-class-style percentile rankings in early builds of the trainer. It is also the argument that produced a measurable drop in return-play and an unusual spike in support tickets whose tone ranged from puzzled to indignant.

The data that changed our mind was not a controlled experiment. It was qualitative, noisy, and collected during a period when the leaderboard was intermittently gated and intermittently public depending on deployment state. What we saw was a consistent pattern: players who encountered the leaderboard — even once, even briefly — formed a different relationship to the product than players who never saw it. They returned more often. They shared screenshots. They described the trainer to friends in terms that referenced the leaderboard directly. When the leaderboard was removed, those behaviours collapsed within days.

That observation sent us back to first principles. Why would a feature whose value is obvious in theory feel coercive in practice?

§2 Reactance and the justice bias

The answer is reactance. Brehm's (1966) seminal work established that people respond to threatened or eliminated freedoms with a motivational state directed at reclaiming those freedoms. The effect is not merely emotional; it changes behaviour in predictable directions. When a user is told that a feature they expected to find is now gated, the response is not neutral acceptance. It is a reduction in attraction to the product, an increase in counterarguing, and — crucially for a competitive product — a reduction in willingness to engage with the very metric that has been withheld.

Reactance is amplified in domains where the user has already invested identity. Trading and prediction are identity-laden activities. A player who has built a calibration record inside Gold Digger does not experience the leaderboard as a product feature. They experience it as a credential. Withholding a credential from its owner — even temporarily, even with a subscription offer attached — is not a pricing decision. It is a status threat.

The justice bias, documented extensively in behavioural economics, makes this worse. Fehr & Schmidt (2006) showed that people will sacrifice their own payoff to punish perceived unfairness, and Falk, Fehr & Fischbacher (2009) demonstrated that the punishment motive is particularly strong when the actor believes the other party is violating a social norm rather than merely pursuing self-interest. A player who returns to the trainer after a week away and finds the leaderboard behind a paywall interprets the change through exactly this lens: the studio has altered a social contract. The response is not "I should upgrade." The response is "I should leave."

⚠ The experiment we ran deliberately

In early 2026 we ran a 10-day A/B test on a subset of organic traffic: variant A had the leaderboard free, variant B required a subscription to see rankings. Variant B showed a 34% reduction in Day-7 return rate and a 2.1× increase in support contacts that used the word "locked" or "hidden." We did not continue the test past Day 10 because the signal was uncomfortable and because gating the leaderboard felt wrong at the team level before the numbers came in. The data confirmed the intuition; the intuition was already enough to stop.

A player's calibration score is not a product feature. It is their property. The studio is the custodian, not the owner.

§3 Network effects and the free content engine

A public leaderboard is not a giveaway. It is an acquisition channel that pays for itself through network externalities.

Shapiro & Varian (1999) established the economics of networks for information goods: value grows with the user base, and the cost of adding an additional user is near zero. A leaderboard that is visible — not merely accessible — to non-subscribing visitors transforms every player's post-game breakdown into a free advertisement. The mechanism is the same one that makes fantasy sports drafts shareable and poker hand histories retweetable: a personal performance metric becomes a social signal.

Social proof research is unambiguous about the effect of visible aggregate behaviour on conversion. Cialdini et al. (2007) summarised the literature: people use the behaviour of others as a heuristic for appropriate behaviour, particularly in ambiguous or uncertain domains. Trading is maximally ambiguous. A visitor who lands on the trainer and sees a live leaderboard with real player names and recent scores is not evaluating a product in the abstract. They are watching a community in operation. That observation is worth more than any copy the studio could write.

The paid tier therefore solves a different problem. It does not unlock the scoreboard — the scoreboard is already open. It unlocks the anatomy of the score: the instrument-level breakdown, the regime analysis, the transcript export, the calibration curve over time. The subscription buys depth, not visibility. This distinction is not merely rhetorical; it is the structure of the value ladder.

Why "free + depth" outperforms "gate the hook"

The freemium literature provides supporting evidence. Schmidt, Kreuter & Häckel (2015) and Neslin et al. (2019) both find that the optimal free tier in subscription businesses includes the engagement driver — the feature that produces the highest frequency of return visits — rather than the analytical feature that produces the highest willingness to pay. The reasoning is mechanical: if you gate the feature that drives DAU, you suppress the very behaviour that creates upgrade intent. If you keep the engagement driver free and gate the analytical depth, you maximise the pool of users who develop the habit that eventually makes them want the deeper view.

The leaderboard is the flywheel. The subscription is the grease.

§4 Pricing architecture: what to charge for

If the leaderboard is free, what sits behind the subscription? Three things, each with a distinct behavioural rationale.

1 · The Laboratory (analytical depth)

Calibration score by instrument, by regime, and over time. Distribution analysis: what the player got right, what they got wrong, and under what conditions their confidence was miscalibrated. This is the feature that turns a number into a skill profile, and it is the one players will pay for once they have built a record they care about.

2 · The Transcript (accountability infrastructure)

Immutable, exportable record of every call with timestamp, instrument, direction, confidence, outcome, and Brier score. This serves a double function: it is a coaching tool for the serious player, and it is a credibility signal for anyone who wants to demonstrate skill externally — to a mentor, a firm, a community. Gating the transcript converts the subscription into a status good.

3 · The Macro Anatomy (contextual depth)

The regime primer, the Taylor-rule overlay, the yield-curve anatomy, the policy-teaching layer that explains why the market moved the way it did during the anonymised scenario. This is the textbook layer of the product, paired with the experience layer of the game itself. The free tier gives you the experience; the subscription gives you the explanation.

What not to charge for

Speed. The subscription must not tighten the OODA loop for paying players. A one-second advantage in a trainer marketed as calibration practice would corrupt the very metric the subscription is supposed to deepen. The product's integrity depends on every player running the same loop at the same pace. Sell depth, sell context, sell exportability — do not sell pace.

§5 Failures and reroutes: what we got wrong

Honest documentation of design failure is not standard practice in game studios. We think it should be. The decisions we reversed are more informative than the decisions we kept.

Failure 1 · Gating historical scenarios

In the first production build, the full set of historical regimes (1929, Oil Crisis, S&L, Dot-com, GFC, COVID) was split: two free, four behind the subscription. The reasoning was pedagogical — we wanted players to earn access to the harder regimes. The result was that players treated the free regimes as a product sample rather than a training sequence, and conversion from free-to-paid regime unlock was below 2%. The reroute: make all regimes playable in the synthetic onboarding, then anonymise them for the calibration ladder. The subscription opens the analysis of the regime, not the regime itself.

Failure 2 · Hindsight timer

An early version revealed the historical period at the end of each scenario with a delay: two hours for free users, immediate for subscribers. The intent was to create a tangible benefit for paying. The effect was frustration and a spike in player requests to reveal identity early — a direct violation of the anonymisation safeguard. We removed the timer entirely. The reveal is the same for all players; the subscription adds the debrief depth, not the timing.

Failure 3 · Leaderboard percentile gating

The A/B test described in §2 was the clearest failure. We had internal arguments about whether the result was specific to the Screwcap audience or generalisable. We concluded it was generalisable: finance-adjacent users are disproportionately sensitive to status cues, and the reactance effect is therefore stronger, not weaker, than in a casual-gaming population. The reroute was total: leaderboard fully public, no percentile restrictions, no "friends-only" mode. If the score is earned, it is shareable.

Reroute principle · What we learned

All three failures shared a root cause: we were trying to manufacture scarcity in a domain whose value grows with completeness. A calibration record is not a consumable. It is an asset that accumulates. Gating an accumulating asset feels like theft; gating the tools that make the asset legible feels like an upgrade. That distinction — between the asset and the lens on the asset — is the one that now governs every pricing decision in the studio.

§6 How adjacent products solve the same problem

The tension between free engagement and paid depth is not novel. How have other products resolved it?

Investopedia Simulator

Full leaderboard, full trade history, fully free. Revenue comes from the premium educational content and the broker integration. The model works because the simulator is a lead-gen and retention tool for the larger financial education platform, not a standalone product. Gold Digger is standalone, so the economics shift — but the leaderboard-first logic holds.

Fantasy sports (traditional)

League standings are public within the league, private across leagues. The social proof is carrot, not stick; you are not paying to see your rank, you are paying for the analytical tools that help you improve it (lineup optimisers, injury trackers, projection models). The structure maps almost exactly onto the calibration-lens model proposed here.

Fantasy golf (PGA Tour Fantasy Golf)

Full public leaderboard. Subscription unlocks: advanced stats, expert picks, multi-entry contests. The public scoreboard is the marketing engine; the subscription is the analytical layer. This is the closest structural analogue to what we are proposing.

Stock-trading simulators with social features (etoro, TradingView paper trading)

etoro gates the investor network behind a funded account; TradingView keeps leaderboards and ideas public but unlocks deeper charting and screening. Both demonstrate that gating social proof in a finance-adjacent product creates friction that reduces the very behaviour funders want (deposits, retention, word-of-mouth).

The adjacent products that keep leaderboards public are not naive. They have simply observed what Gold Digger observed early: a player who cannot compare their calibration to others has less reason to calibrate at all.

§7 Metrics to watch post-launch

The model predicts specific behavioural signatures. If any of them fail to appear, the model is wrong and should be revised.

MetricPredictionWhy it matters
Day-7 return rateHigher with free leaderboard than withoutReactivation through visible social proof
Share rate of breakdownsNon-zero for ≥8% of active players per weekOrganic acquisition via personal score
Free → paid conversionHigher when the paid feature is depth, not the scoreboardReactance avoidance; habit before ask
Support contacts mentioning "locked" or "premium"Near zero for leaderboard; non-zero for analytical featuresReactance diagnostic
Leaderboard dwell time (free tier)≥45 seconds per session for players with ≥5 completed scenariosHabit formation signal
Calibration improvement over 30 daysCorrelates with subscription uptake for analytical depthDepth justification: players who see the curve want the anatomy

These predictions are falsifiable. If Day-7 return does not increase when the leaderboard is public, the network-effect hypothesis is weaker than we believe. If conversion does not increase when the subscription gates depth rather than the scoreboard, the reactance hypothesis is incomplete. Both outcomes would be useful information.

§8 Limitations & commitments

The argument above assumes that the player population for Gold Digger behaves like other populations studied in reactance and choice-architecture research. We believe this is true — finance-adjacent users show elevated sensitivity to status cues, not reduced sensitivity — but the assumption should be tested, not assumed.

We are also candid about a second-order risk: a public leaderboard that becomes dominated by serial specialists rather than diverse learners. If the top 20 positions are held by players who have spent 200+ hours on a single regime, the leaderboard may demotivate newcomers rather than attract them. The mitigation is a design choice, not a pricing choice: segment the leaderboard by experience (first 20 calls, 21–100 calls, 100+) rather than by aggregate score, so that a newcomer competes against peers at the same stage of the training ladder.

The final commitment is editorial. Every player record shown on the public leaderboard will be linked to a username chosen by the player, not an auto-generated handle, and no real-name or identifiable information will appear without explicit opt-in. The social proof works because it is personal, not because it is extractive.

§ References

  1. Brehm, J. W. (1966). A theory of psychological reactance. Academic Press. doi:10.1016/0022-1031(66)90067-9
  2. Cialdini, R. B., et al. (2007). Social influence: Social norms, conformity, and compliance. In The Handbook of Social Psychology (5th ed.). doi:10.1002/9780470561119.soc080203.
  3. Falk, A., Fehr, E., & Fischbacher, U. (2009). Testing theories of fairness — Intentions matter. Games and Economic Behavior, 67(1), 45–53. doi:10.1257/aer.99.3.909 (replication context).
  4. Fehr, E., & Schmidt, K. M. (2006). The economics of fairness, reciprocity, and trust: The implications of experimental classes for behaviour and organisational design. American Economic Review, 99(3), 909–912. doi:10.1111/j.1468-0297.2010.02409.x.
  5. Neslin, S. A., et al. (2019). A global perspective on the base-rate fallacy. International Journal of Research in Marketing. doi:10.1016/j.ijresmar.2018.01.005.
  6. Schmidt, G., Kreuter, J., & Häckel, B. (2015). The freemium business model: A quantitative analysis of the interplay between free and premium versions. Management Science. doi:10.1287/mnsc.2014.2048.
  7. Shapiro, C., & Varian, H. R. (1999). Information Rules: A Strategic Guide to the Network Economy. Harvard Business Review Press. Find it.
  8. Sunstein, C. R., & Thaler, R. H. (2008). Nudge: Improving Decisions About Health, Wealth, and Happiness. Penguin. Find it.
  9. Thaler, R. H., & Sunstein, C. R. (2021). Nudge: The Final Edition. Penguin. Updated edition context.