League of Legends win rates by tier¶

An exploration of champion win rates by tier. Hypothesis: certain high skill champions like Nidalee win more when played by Platinum players compared to Silver players. Conversely certain cheese champions like Amumu win more in Silver players, where they are mostly against lower skill players.

Data comes from http://na.op.gg/statistics/champion/ and represents approximately 15 million games played in the month ending September 12 2016.

By Nelson Minar nelson@monkey.org

import pandas, collections, numpy, seaborn
from IPython.core.display import display, HTML
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns;
sns.set_palette('deep')

tier_names = ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond', 'Master', 'Challenger']

right_align = [{'selector': 'td', 'props': [('text-align', 'right')]}]  # for DataFrame.style

Load and prepare the data¶

# Create a bunch of DataFrames, one per CSV file
tiers = {}
for tier in tier_names:
    tiers[tier] = pandas.read_csv('data/Champion win rates by tier - %s.tsv' % tier,
                                  sep='\t', header=0,
                                  names=['N', 'X', 'champion', 'winrate', 'games', 'kda', 'cs', 'gold'],
                                  thousands = ',',
                                  index_col = 2)
    # Remove unneeded columns
    del tiers[tier]['X']
    del tiers[tier]['N']
    # Parse a couple of columns down to simple numbers
    tiers[tier]['winrate'] = tiers[tier]['winrate'].apply(lambda s: float(s[:-1]))
    tiers[tier]['kda'] = tiers[tier]['kda'].apply(lambda s: float(s[:-2]))

# Smoosh all the DataFrames into a single Panel
data = pandas.Panel(tiers, items = tier_names)

Sample data for Nidalee¶

data.xs('Nidalee')

Average statistics by tier¶

d = []
for t, df in data.iteritems():
    d.append((df['winrate'].mean(), 
              df['games'].mean(), 
              df['kda'].mean(), 
              df['cs'].mean(), 
              df['gold'].mean()))

averages = pandas.DataFrame(d, index = data.items, columns=('winrate', 'games', 'kda', 'cs', 'gold'))
(averages.style
    .format({'cs': "{:.0f}", 'games': '{:,.0f}', 'gold': '{:,.0f}', 'kda': '{:.2f}', 'winrate': "{:.2f}%"})
    .set_table_styles(right_align))

Win rates by tier, alternate calculation¶

It seems odd that the total win rate across all data is < 50%. Perhaps they are including games that didn't complete?

d = []
for t, df in data.iteritems():
    winsPerChamp = df.games * df.winrate / 100
    d.append((100 * winsPerChamp.sum() / df.games.sum(), winsPerChamp.sum(), df.games.sum()))
tier_stats = pandas.DataFrame(d, index=data.items, columns=('win rate for tier', 'wins in tier', 'games in tier'))
(tier_stats.style
    .format({'games in tier': '{:,.0f}', 'win rate for tier': '{:.2f}%', 'wins in tier': '{:,.0f}'})
    .set_table_styles(right_align)
)

Champion win rates from Silver to Diamond¶

Which champions do the most better in the hands of skilled players? Which champions' win rates fall off in higher tiers?

It turns out Nidalee has the most improvement for player skill; she goes from 44.59% win rate for Silver players to 53.96% in Diamond, a gain of 9.37%. Conversely Amumu loses 4.10% win rate.

The column "Silver to Diamond" is simply the difference in win rates in the two tiers. "Max Spread" is the difference between maximum and minimum win rate. It's uesful for champs like Blitzcrank that are strongest in Gold (+2.09%), not Platinum (+0.87%).

spreads = {}
for name in data.major_axis:
    champ_data = data.major_xs(name)
    # Consider only Silver -> Diamond data
    reduced = champ_data.transpose()[1:-2]
    spreads[name] = (
        reduced.winrate[-1] - reduced.winrate[0], 
        max(reduced.winrate) - min(reduced.winrate),
        data.Silver.loc[name].winrate,
        data.Gold.loc[name].winrate,
        data.Platinum.loc[name].winrate,
        data.Diamond.loc[name].winrate,
    )
win_rates = pandas.DataFrame.from_records(spreads, 
             index=('Silver to Diamond', 'Max Spread', 'Silver', 'Gold', 'Platinum', 'Diamond')).transpose()
win_rates.sort_values('Silver to Diamond', ascending=False, inplace=True)
df_disp = pandas.concat([win_rates.head(10), win_rates.tail(10)])
display(df_disp.style
     .format({'Silver to Diamond': '{:+.2f}%', 'Max Spread': '{:.2f}%',
              'Silver': '{:.2f}%', 'Gold': '{:.2f}%', 'Platinum': '{:.2f}%', 'Diamond': '{:.2f}%'})
     .set_table_styles(right_align)
     .background_gradient(cmap='coolwarm', low = 0.5, high= 0.5,
                          subset=['Silver', 'Gold', 'Platinum', 'Diamond'])
)

g = sns.distplot(win_rates['Silver to Diamond'], bins=15)
g.set(title='Win rate differences from Silver to Diamond')

[<matplotlib.text.Text at 0x7f1bd598ca58>]

Champion popularity by tier¶

How popular are champions at various tiers? Which champions get more popular at higher tiers?

It turns out Janna has the most increase in usage in higher tiers. She's picked in only 0.79% of Silver examples (93,510 games out of 11.8M) but she's picked 2.92% of the time in Diamond examples (28,644 games out of 1M). Conversely Leona has the biggest drop in usage, from 1.63% to 0.63%.

Note that the raw numbers reported in Silver/Gold/Platinum/Diamond are not strictly pick rate, although they are mostly correlated. Janna represents 0.79% of all the Silver data we have. The report is sorted by the column "Silver to Diamond", the difference in pick rates from Silver to Diamond.

pick_rates = 100 * data.minor_xs('games') / data.minor_xs('games').sum()
del pick_rates['Bronze']
del pick_rates['Master']
del pick_rates['Challenger']
pick_rates.insert(0, 'Silver to Diamond', pick_rates.Diamond - pick_rates.Silver)
pick_rates.sort_values('Silver to Diamond', ascending=False, inplace=True)
df_disp = pandas.concat([pick_rates.head(10), pick_rates.tail(10)])
(df_disp.style
     .format({'Silver': '{:,.2f}', 'Gold': '{:,.2f}', 'Platinum': '{:,.2f}', 'Diamond': '{:,.2f}',
              'Silver to Diamond': '{:+.2f}'})
     .set_table_styles(right_align)
     .background_gradient(cmap='coolwarm', low = 0.5, high= 0.5,
                          subset=['Silver', 'Gold', 'Platinum', 'Diamond']))

g = sns.distplot(pick_rates['Silver to Diamond'], bins=15)
g.set(title='Pick rate differences from Silver to Diamond')

[<matplotlib.text.Text at 0x7f1bd48d9940>]

Scatterplot of Platinum win rate vs pick rate¶

Are high win rate champs more popular in platinum? Not particularly...

wr_vs_pick_platinum = pandas.concat((win_rates['Platinum'], pick_rates['Platinum']), axis=1)
wr_vs_pick_platinum.columns = ('Win Rate', 'Pick Rate')
g = sns.jointplot(x='Win Rate', y='Pick Rate', ylim=(0,3.5), xlim=(40,60), data=wr_vs_pick_platinum, kind="scatter")

Scatterplot of Win Rate improvement vs Pick Rate change¶

Are champions that have a bigger Silver-to-Diamond win rate change also likely to have a higher pick rate Silver-to-Diamond? If there were a correlation you'd expect the dots below to fall on the line x=y. They don't really, but there is a correlation

wr_vs_pick_sd = pandas.concat((win_rates['Silver to Diamond'], pick_rates['Silver to Diamond']), axis=1)
wr_vs_pick_sd.columns = ('Win Rate', 'Pick Rate')
g = sns.jointplot(x='Win Rate', y='Pick Rate', data=wr_vs_pick_sd, kind="scatter")

	Bronze	Silver	Gold	Platinum	Diamond	Master	Challenger
winrate	41.45	44.59	48.14	51.00	53.96	61.45	57.21
games	17466.00	58051.00	52953.00	39436.00	12505.00	664.00	208.00
kda	2.04	2.38	2.70	2.92	3.13	3.59	3.97
cs	107.19	122.93	133.71	138.80	142.22	144.76	146.22
gold	11625.00	12232.00	12708.00	12806.00	12767.00	12992.00	12950.00

	winrate	games	kda	cs	gold
Bronze	45.93%	31,248	2.27	126	11,774
Silver	48.21%	89,143	2.44	142	12,058
Gold	49.42%	61,990	2.53	151	12,205
Platinum	49.90%	33,342	2.56	156	12,116
Diamond	50.05%	7,429	2.55	156	11,714
Master	54.45%	321	2.70	156	11,494
Challenger	55.38%	94	3.11	152	11,361

	win rate for tier	wins in tier	games in tier
Bronze	46.39%	1,913,643	4,124,704
Silver	48.50%	5,707,026	11,766,902
Gold	49.53%	4,052,918	8,182,712
Platinum	50.01%	2,200,874	4,401,080
Diamond	50.37%	493,929	980,689
Master	53.46%	20,766	38,843
Challenger	54.48%	4,101	7,528

	Silver to Diamond	Max Spread	Silver	Gold	Platinum	Diamond
Nidalee	+9.37%	9.37%	44.59%	48.14%	51.00%	53.96%
Pantheon	+7.50%	7.50%	48.01%	51.33%	52.84%	55.51%
Riven	+6.07%	6.07%	47.05%	49.26%	50.63%	53.12%
Twisted Fate	+5.99%	5.99%	46.75%	49.52%	51.70%	52.74%
Aurelion Sol	+5.77%	5.77%	49.09%	50.19%	52.31%	54.86%
Rengar	+5.57%	5.57%	45.68%	47.25%	49.28%	51.25%
Ryze	+5.53%	5.53%	41.48%	42.92%	43.95%	47.01%
Kindred	+5.22%	5.22%	45.64%	48.64%	50.41%	50.86%
Urgot	+4.98%	4.98%	45.08%	48.46%	49.00%	50.06%
Evelynn	+4.79%	4.79%	46.65%	49.29%	50.62%	51.44%
Nasus	-1.13%	1.94%	47.69%	48.50%	46.91%	46.56%
Ziggs	-1.18%	2.72%	49.37%	50.91%	50.36%	48.19%
Brand	-1.48%	1.48%	51.66%	51.64%	51.31%	50.18%
Kalista	-1.95%	3.31%	42.63%	43.99%	43.57%	40.68%
Sion	-1.96%	1.96%	52.06%	51.47%	52.03%	50.10%
Aatrox	-2.23%	4.21%	46.78%	47.98%	48.76%	44.55%
Dr. Mundo	-2.36%	2.97%	46.82%	47.43%	47.15%	44.46%
Garen	-2.56%	2.83%	49.27%	49.54%	49.20%	46.71%
Yorick	-3.88%	3.88%	46.24%	45.16%	43.84%	42.36%
Amumu	-4.10%	4.10%	52.85%	52.52%	51.91%	48.75%

	Silver to Diamond	Silver	Gold	Platinum	Diamond
Janna	+2.13	0.79	1.28	1.79	2.92
Lucian	+1.38	2.29	2.81	3.27	3.67
Jhin	+1.27	2.01	2.46	2.96	3.28
Ezreal	+0.99	1.93	2.50	2.72	2.93
Bard	+0.95	0.77	0.97	1.20	1.72
Graves	+0.93	0.97	1.36	1.73	1.90
Nidalee	+0.78	0.49	0.65	0.90	1.28
Karma	+0.71	0.79	0.91	1.05	1.50
Rek'Sai	+0.71	0.44	0.56	0.74	1.15
Elise	+0.56	0.45	0.56	0.70	1.01
Annie	-0.67	1.31	1.09	0.82	0.64
Miss Fortune	-0.69	1.15	0.81	0.64	0.47
Garen	-0.69	0.88	0.51	0.28	0.19
Xin Zhao	-0.71	0.89	0.54	0.34	0.17
Master Yi	-0.73	1.09	0.83	0.66	0.37
Vayne	-0.78	1.87	1.89	1.58	1.09
Amumu	-0.91	1.25	0.98	0.68	0.34
Lux	-0.95	1.51	1.16	0.89	0.56
Jinx	-0.97	2.25	2.04	1.87	1.28
Leona	-1.00	1.63	1.11	0.79	0.63