League of Legends win rates by tier

An exploration of champion win rates by tier. Hypothesis: certain high skill champions like Nidalee win more when played by Platinum players compared to Silver players. Conversely certain cheese champions like Amumu win more in Silver players, where they are mostly against lower skill players.

Data comes from http://na.op.gg/statistics/champion/ and represents approximately 15 million games played in the month ending September 12 2016.

By Nelson Minar nelson@monkey.org

In [1]:
import pandas, collections, numpy, seaborn
from IPython.core.display import display, HTML
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns;
sns.set_palette('deep')
In [2]:
tier_names = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond', 'Master', 'Challenger']

right_align = [{'selector': 'td', 'props': [('text-align', 'right')]}]  # for DataFrame.style

Load and prepare the data

In [3]:
# Create a bunch of DataFrames, one per CSV file
tiers = {}
for tier in tier_names:
    tiers[tier] = pandas.read_csv('data/Champion win rates by tier - %s.tsv' % tier,
                                  sep='\t', header=0,
                                  names=['N', 'X', 'champion', 'winrate', 'games', 'kda', 'cs', 'gold'],
                                  thousands = ',',
                                  index_col = 2)
    # Remove unneeded columns
    del tiers[tier]['X']
    del tiers[tier]['N']
    # Parse a couple of columns down to simple numbers
    tiers[tier]['winrate'] = tiers[tier]['winrate'].apply(lambda s: float(s[:-1]))
    tiers[tier]['kda'] = tiers[tier]['kda'].apply(lambda s: float(s[:-2]))

# Smoosh all the DataFrames into a single Panel
data = pandas.Panel(tiers, items = tier_names)

Sample data for Nidalee

In [4]:
data.xs('Nidalee')
Out[4]:
Bronze Silver Gold Platinum Diamond Master Challenger
winrate 41.45 44.59 48.14 51.00 53.96 61.45 57.21
games 17466.00 58051.00 52953.00 39436.00 12505.00 664.00 208.00
kda 2.04 2.38 2.70 2.92 3.13 3.59 3.97
cs 107.19 122.93 133.71 138.80 142.22 144.76 146.22
gold 11625.00 12232.00 12708.00 12806.00 12767.00 12992.00 12950.00

Average statistics by tier

In [5]:
d = []
for t, df in data.iteritems():
    d.append((df['winrate'].mean(), 
              df['games'].mean(), 
              df['kda'].mean(), 
              df['cs'].mean(), 
              df['gold'].mean()))

averages = pandas.DataFrame(d, index = data.items, columns=('winrate', 'games', 'kda', 'cs', 'gold'))
(averages.style
    .format({'cs': "{:.0f}", 'games': '{:,.0f}', 'gold': '{:,.0f}', 'kda': '{:.2f}', 'winrate': "{:.2f}%"})
    .set_table_styles(right_align))
Out[5]:
winrate games kda cs gold
Bronze 45.93% 31,248 2.27 126 11,774
Silver 48.21% 89,143 2.44 142 12,058
Gold 49.42% 61,990 2.53 151 12,205
Platinum 49.90% 33,342 2.56 156 12,116
Diamond 50.05% 7,429 2.55 156 11,714
Master 54.45% 321 2.70 156 11,494
Challenger 55.38% 94 3.11 152 11,361

Win rates by tier, alternate calculation

It seems odd that the total win rate across all data is < 50%. Perhaps they are including games that didn't complete?

In [6]:
d = []
for t, df in data.iteritems():
    winsPerChamp = df.games * df.winrate / 100
    d.append((100 * winsPerChamp.sum() / df.games.sum(), winsPerChamp.sum(), df.games.sum()))
tier_stats = pandas.DataFrame(d, index=data.items, columns=('win rate for tier', 'wins in tier', 'games in tier'))
(tier_stats.style
    .format({'games in tier': '{:,.0f}', 'win rate for tier': '{:.2f}%', 'wins in tier': '{:,.0f}'})
    .set_table_styles(right_align)
)
Out[6]:
win rate for tier wins in tier games in tier
Bronze 46.39% 1,913,643 4,124,704
Silver 48.50% 5,707,026 11,766,902
Gold 49.53% 4,052,918 8,182,712
Platinum 50.01% 2,200,874 4,401,080
Diamond 50.37% 493,929 980,689
Master 53.46% 20,766 38,843
Challenger 54.48% 4,101 7,528

Champion win rates from Silver to Diamond

Which champions do the most better in the hands of skilled players? Which champions' win rates fall off in higher tiers?

It turns out Nidalee has the most improvement for player skill; she goes from 44.59% win rate for Silver players to 53.96% in Diamond, a gain of 9.37%. Conversely Amumu loses 4.10% win rate.

The column "Silver to Diamond" is simply the difference in win rates in the two tiers. "Max Spread" is the difference between maximum and minimum win rate. It's uesful for champs like Blitzcrank that are strongest in Gold (+2.09%), not Platinum (+0.87%).

In [7]:
spreads = {}
for name in data.major_axis:
    champ_data = data.major_xs(name)
    # Consider only Silver -> Diamond data
    reduced = champ_data.transpose()[1:-2]
    spreads[name] = (
        reduced.winrate[-1] - reduced.winrate[0], 
        max(reduced.winrate) - min(reduced.winrate),
        data.Silver.loc[name].winrate,
        data.Gold.loc[name].winrate,
        data.Platinum.loc[name].winrate,
        data.Diamond.loc[name].winrate,
    )
win_rates = pandas.DataFrame.from_records(spreads, 
             index=('Silver to Diamond', 'Max Spread', 'Silver', 'Gold', 'Platinum', 'Diamond')).transpose()
win_rates.sort_values('Silver to Diamond', ascending=False, inplace=True)
df_disp = pandas.concat([win_rates.head(10), win_rates.tail(10)])
display(df_disp.style
     .format({'Silver to Diamond': '{:+.2f}%', 'Max Spread': '{:.2f}%',
              'Silver': '{:.2f}%', 'Gold': '{:.2f}%', 'Platinum': '{:.2f}%', 'Diamond': '{:.2f}%'})
     .set_table_styles(right_align)
     .background_gradient(cmap='coolwarm', low = 0.5, high= 0.5,
                          subset=['Silver', 'Gold', 'Platinum', 'Diamond'])
)
Silver to Diamond Max Spread Silver Gold Platinum Diamond
Nidalee +9.37% 9.37% 44.59% 48.14% 51.00% 53.96%
Pantheon +7.50% 7.50% 48.01% 51.33% 52.84% 55.51%
Riven +6.07% 6.07% 47.05% 49.26% 50.63% 53.12%
Twisted Fate +5.99% 5.99% 46.75% 49.52% 51.70% 52.74%
Aurelion Sol +5.77% 5.77% 49.09% 50.19% 52.31% 54.86%
Rengar +5.57% 5.57% 45.68% 47.25% 49.28% 51.25%
Ryze +5.53% 5.53% 41.48% 42.92% 43.95% 47.01%
Kindred +5.22% 5.22% 45.64% 48.64% 50.41% 50.86%
Urgot +4.98% 4.98% 45.08% 48.46% 49.00% 50.06%
Evelynn +4.79% 4.79% 46.65% 49.29% 50.62% 51.44%
Nasus -1.13% 1.94% 47.69% 48.50% 46.91% 46.56%
Ziggs -1.18% 2.72% 49.37% 50.91% 50.36% 48.19%
Brand -1.48% 1.48% 51.66% 51.64% 51.31% 50.18%
Kalista -1.95% 3.31% 42.63% 43.99% 43.57% 40.68%
Sion -1.96% 1.96% 52.06% 51.47% 52.03% 50.10%
Aatrox -2.23% 4.21% 46.78% 47.98% 48.76% 44.55%
Dr. Mundo -2.36% 2.97% 46.82% 47.43% 47.15% 44.46%
Garen -2.56% 2.83% 49.27% 49.54% 49.20% 46.71%
Yorick -3.88% 3.88% 46.24% 45.16% 43.84% 42.36%
Amumu -4.10% 4.10% 52.85% 52.52% 51.91% 48.75%
In [8]:
g = sns.distplot(win_rates['Silver to Diamond'], bins=15)
g.set(title='Win rate differences from Silver to Diamond')
Out[8]:
[<matplotlib.text.Text at 0x7f1bd598ca58>]

Champion popularity by tier

How popular are champions at various tiers? Which champions get more popular at higher tiers?

It turns out Janna has the most increase in usage in higher tiers. She's picked in only 0.79% of Silver examples (93,510 games out of 11.8M) but she's picked 2.92% of the time in Diamond examples (28,644 games out of 1M). Conversely Leona has the biggest drop in usage, from 1.63% to 0.63%.

Note that the raw numbers reported in Silver/Gold/Platinum/Diamond are not strictly pick rate, although they are mostly correlated. Janna represents 0.79% of all the Silver data we have. The report is sorted by the column "Silver to Diamond", the difference in pick rates from Silver to Diamond.

In [9]:
pick_rates = 100 * data.minor_xs('games') / data.minor_xs('games').sum()
del pick_rates['Bronze']
del pick_rates['Master']
del pick_rates['Challenger']
pick_rates.insert(0, 'Silver to Diamond', pick_rates.Diamond - pick_rates.Silver)
pick_rates.sort_values('Silver to Diamond', ascending=False, inplace=True)
df_disp = pandas.concat([pick_rates.head(10), pick_rates.tail(10)])
(df_disp.style
     .format({'Silver': '{:,.2f}', 'Gold': '{:,.2f}', 'Platinum': '{:,.2f}', 'Diamond': '{:,.2f}',
              'Silver to Diamond': '{:+.2f}'})
     .set_table_styles(right_align)
     .background_gradient(cmap='coolwarm', low = 0.5, high= 0.5,
                          subset=['Silver', 'Gold', 'Platinum', 'Diamond']))
Out[9]:
Silver to Diamond Silver Gold Platinum Diamond
Janna +2.13 0.79 1.28 1.79 2.92
Lucian +1.38 2.29 2.81 3.27 3.67
Jhin +1.27 2.01 2.46 2.96 3.28
Ezreal +0.99 1.93 2.50 2.72 2.93
Bard +0.95 0.77 0.97 1.20 1.72
Graves +0.93 0.97 1.36 1.73 1.90
Nidalee +0.78 0.49 0.65 0.90 1.28
Karma +0.71 0.79 0.91 1.05 1.50
Rek'Sai +0.71 0.44 0.56 0.74 1.15
Elise +0.56 0.45 0.56 0.70 1.01
Annie -0.67 1.31 1.09 0.82 0.64
Miss Fortune -0.69 1.15 0.81 0.64 0.47
Garen -0.69 0.88 0.51 0.28 0.19
Xin Zhao -0.71 0.89 0.54 0.34 0.17
Master Yi -0.73 1.09 0.83 0.66 0.37
Vayne -0.78 1.87 1.89 1.58 1.09
Amumu -0.91 1.25 0.98 0.68 0.34
Lux -0.95 1.51 1.16 0.89 0.56
Jinx -0.97 2.25 2.04 1.87 1.28
Leona -1.00 1.63 1.11 0.79 0.63
In [10]:
g = sns.distplot(pick_rates['Silver to Diamond'], bins=15)
g.set(title='Pick rate differences from Silver to Diamond')
Out[10]:
[<matplotlib.text.Text at 0x7f1bd48d9940>]

Scatterplot of Platinum win rate vs pick rate

Are high win rate champs more popular in platinum? Not particularly...

In [11]:
wr_vs_pick_platinum = pandas.concat((win_rates['Platinum'], pick_rates['Platinum']), axis=1)
wr_vs_pick_platinum.columns = ('Win Rate', 'Pick Rate')
g = sns.jointplot(x='Win Rate', y='Pick Rate', ylim=(0,3.5), xlim=(40,60), data=wr_vs_pick_platinum, kind="scatter")

Scatterplot of Win Rate improvement vs Pick Rate change

Are champions that have a bigger Silver-to-Diamond win rate change also likely to have a higher pick rate Silver-to-Diamond? If there were a correlation you'd expect the dots below to fall on the line x=y. They don't really, but there is a correlation

In [12]:
wr_vs_pick_sd = pandas.concat((win_rates['Silver to Diamond'], pick_rates['Silver to Diamond']), axis=1)
wr_vs_pick_sd.columns = ('Win Rate', 'Pick Rate')
g = sns.jointplot(x='Win Rate', y='Pick Rate', data=wr_vs_pick_sd, kind="scatter")