import pandas as pdNB: Narrow vs Wide Tables
There are in general at least two ways to represent a dataset – as a narrow table or as a set of wide tables.
Let’s look at a very simple example.
pets = pd.Series("cat dog ferret snake turtle parraot".split()).sample(1000, replace=True).to_list()
people = pd.Series("A B C D E F G".split()).sample(1000, replace=True).to_list()
NARROW = pd.DataFrame(dict(pet=pets, owner=people)).groupby(['owner', 'pet']).pet.count().to_frame('n')This is a narrow table.
It has few columns and many rows.
Columns are types of things, and values in rows are either instances or subtypes.
NARROW| n | ||
|---|---|---|
| owner | pet | |
| A | cat | 21 |
| dog | 30 | |
| ferret | 23 | |
| parraot | 28 | |
| snake | 17 | |
| turtle | 23 | |
| B | cat | 25 |
| dog | 28 | |
| ferret | 27 | |
| parraot | 24 | |
| snake | 27 | |
| turtle | 22 | |
| C | cat | 23 |
| dog | 18 | |
| ferret | 33 | |
| parraot | 23 | |
| snake | 20 | |
| turtle | 23 | |
| D | cat | 28 |
| dog | 17 | |
| ferret | 25 | |
| parraot | 26 | |
| snake | 24 | |
| turtle | 21 | |
| E | cat | 29 |
| dog | 19 | |
| ferret | 24 | |
| parraot | 26 | |
| snake | 24 | |
| turtle | 23 | |
| F | cat | 19 |
| dog | 25 | |
| ferret | 15 | |
| parraot | 25 | |
| snake | 26 | |
| turtle | 23 | |
| G | cat | 29 |
| dog | 21 | |
| ferret | 18 | |
| parraot | 23 | |
| snake | 28 | |
| turtle | 27 |
WIDE = NARROW.n.unstack()This is a wide table.
One column’s values are projected onto the feature space (as columns).
The other column becomes a unique list (as is the feature space).
WIDE| pet | cat | dog | ferret | parraot | snake | turtle |
|---|---|---|---|---|---|---|
| owner | ||||||
| A | 21 | 30 | 23 | 28 | 17 | 23 |
| B | 25 | 28 | 27 | 24 | 27 | 22 |
| C | 23 | 18 | 33 | 23 | 20 | 23 |
| D | 28 | 17 | 25 | 26 | 24 | 21 |
| E | 29 | 19 | 24 | 26 | 24 | 23 |
| F | 19 | 25 | 15 | 25 | 26 | 23 |
| G | 29 | 21 | 18 | 23 | 28 | 27 |
Narrow columns are more manageable by databases.
In fact, relational databases impose a limit on the number of columns you can have for a table, excluding high-dimensional spaces.
Wide columns are more usable for analysis.
You can do also these things with narrow tables using .group_by()
A = WIDE / WIDE.sum()B = WIDE.T / WIDE.T.sum()A.style.background_gradient()| pet | cat | dog | ferret | parraot | snake | turtle |
|---|---|---|---|---|---|---|
| owner | ||||||
| A | 0.120690 | 0.189873 | 0.139394 | 0.160000 | 0.102410 | 0.141975 |
| B | 0.143678 | 0.177215 | 0.163636 | 0.137143 | 0.162651 | 0.135802 |
| C | 0.132184 | 0.113924 | 0.200000 | 0.131429 | 0.120482 | 0.141975 |
| D | 0.160920 | 0.107595 | 0.151515 | 0.148571 | 0.144578 | 0.129630 |
| E | 0.166667 | 0.120253 | 0.145455 | 0.148571 | 0.144578 | 0.141975 |
| F | 0.109195 | 0.158228 | 0.090909 | 0.142857 | 0.156627 | 0.141975 |
| G | 0.166667 | 0.132911 | 0.109091 | 0.131429 | 0.168675 | 0.166667 |
B.style.background_gradient()| owner | A | B | C | D | E | F | G |
|---|---|---|---|---|---|---|---|
| pet | |||||||
| cat | 0.147887 | 0.163399 | 0.164286 | 0.198582 | 0.200000 | 0.142857 | 0.198630 |
| dog | 0.211268 | 0.183007 | 0.128571 | 0.120567 | 0.131034 | 0.187970 | 0.143836 |
| ferret | 0.161972 | 0.176471 | 0.235714 | 0.177305 | 0.165517 | 0.112782 | 0.123288 |
| parraot | 0.197183 | 0.156863 | 0.164286 | 0.184397 | 0.179310 | 0.187970 | 0.157534 |
| snake | 0.119718 | 0.176471 | 0.142857 | 0.170213 | 0.165517 | 0.195489 | 0.191781 |
| turtle | 0.161972 | 0.143791 | 0.164286 | 0.148936 | 0.158621 | 0.172932 | 0.184932 |
NARROW.unstack()| n | ||||||
|---|---|---|---|---|---|---|
| pet | cat | dog | ferret | parraot | snake | turtle |
| owner | ||||||
| A | 21 | 30 | 23 | 28 | 17 | 23 |
| B | 25 | 28 | 27 | 24 | 27 | 22 |
| C | 23 | 18 | 33 | 23 | 20 | 23 |
| D | 28 | 17 | 25 | 26 | 24 | 21 |
| E | 29 | 19 | 24 | 26 | 24 | 23 |
| F | 19 | 25 | 15 | 25 | 26 | 23 |
| G | 29 | 21 | 18 | 23 | 28 | 27 |
1 Hot Encoding
Project values onto the column axis.
# pd.get_dummies?pd.get_dummies(WIDE.cat)| 19 | 21 | 23 | 25 | 28 | 29 | |
|---|---|---|---|---|---|---|
| owner | ||||||
| A | False | True | False | False | False | False |
| B | False | False | False | True | False | False |
| C | False | False | True | False | False | False |
| D | False | False | False | False | True | False |
| E | False | False | False | False | False | True |
| F | True | False | False | False | False | False |
| G | False | False | False | False | False | True |
pd.get_dummies(NARROW.n)| 15 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 33 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| owner | pet | ||||||||||||||||
| A | cat | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False |
| dog | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | |
| ferret | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| parraot | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | |
| snake | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | |
| turtle | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| B | cat | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False |
| dog | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | |
| ferret | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | |
| parraot | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | |
| snake | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | |
| turtle | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | |
| C | cat | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False |
| dog | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | |
| ferret | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | True | |
| parraot | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| snake | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | |
| turtle | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| D | cat | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False |
| dog | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | |
| ferret | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | |
| parraot | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | |
| snake | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | |
| turtle | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | |
| E | cat | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False |
| dog | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | |
| ferret | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | |
| parraot | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | |
| snake | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | |
| turtle | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| F | cat | False | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False |
| dog | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | |
| ferret | True | False | False | False | False | False | False | False | False | False | False | False | False | False | False | False | |
| parraot | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | False | |
| snake | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False | False | |
| turtle | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| G | cat | False | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False |
| dog | False | False | False | False | False | True | False | False | False | False | False | False | False | False | False | False | |
| ferret | False | False | True | False | False | False | False | False | False | False | False | False | False | False | False | False | |
| parraot | False | False | False | False | False | False | False | True | False | False | False | False | False | False | False | False | |
| snake | False | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | |
| turtle | False | False | False | False | False | False | False | False | False | False | False | True | False | False | False | False |
pd.get_dummies(NARROW.n).sum().plot.bar();