Some Pandas tips for Jupyter Notebooks¶

https://pandas.pydata.org/pandas-docs/stable/user_guide/

A first recommendation is to not use print() but to import and use display()

You might need from IPython.display import display

import pandas as pd
import numpy as np
#from IPython.display import display

Quickly create a DataFrame¶

In the Formatting Output section below we use a conventional method for creating a DataFrame using a Python Dictionary.

However, there is a quick and dirty method using cut/paste from an Excel Sheet, or indeed an editor.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_clipboard.html

Just 'cut/copy' data from an editor, or Excel - which puts it onto the 'clipboard - and run the read_clipboard() command to generate a new DataFrame.

For example, this 'copy' from a notebook++ edit:

./data1.png

Running the next Cell - be careful as the 'sep=' parameter can make a big difference to the end result - gives:

df = pd.read_clipboard(sep=',')
display(df)

Formatting output¶

You can change the ways that Pandas in a Jupyter Notebook formats its float output GLOBALLY by using Pandas Python .float_format strings. Note that the changes will remain in place until the kernel restarts.

Here's some examples - to make sure you see these as intended, re-start the kernel.

df = pd.DataFrame({'A': [0, 10, 200, 3000, 40000],
                   'B': [5.6789, 623.98765432, 72.345, 8987.1, 9.0],
                   'C': ['the', 'quick', 'brown', 'fox', 'jumps']})
display(df)

pd.options.display.float_format = '{:,.2f}'.format
display(df)

pd.options.display.float_format = '{:,.2g}'.format
display(df)

Display control¶

In fact there are quite a lot of display options you can access.

https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html

Amongst many are:

pd.options.display.width=None

pd.set_option('display.max_columns', 999)

pd.set_option('display.height', 1000)

pd.set_option('display.max_rows', 500)

pd.set_option('display.max_columns', 500)

pd.set_option('display.width', 1000)

Styling¶

Or you can get into styling

https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html

# First, reset float_format from the setting we set above
pd.options.display.float_format = None

Now set up a random(ish) DataFrame with some missing values

np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
               axis=1)
df.iloc[3, 3] = np.nan
df.iloc[0, 2] = np.nan

display(df)

To use styling, you either have to show the DataFrame as the last Cell command, or, use the display() command instead. Then you can produce functions that can be used as style maps

def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color

s = df.style.applymap(color_negative_red)
display(s)

def highlight_max(s):
    '''
    highlight the maximum in a Series in light green.
    '''
    is_max = s == s.max()
    return ['background-color: lightgreen' if v else '' for v in is_max]

display(df.style.apply(highlight_max))

And you can chain them together

s = df.style.\
    applymap(color_negative_red).\
    apply(highlight_max)

display(s)

And there are some built-in styles to use

s = df.style.highlight_null(null_color='red')
display(s)

Or, change the way that we display float values (as a simple alternative to pd.options.display.float_format, see above) using .set_precision()

s = df.style\
  .applymap(color_negative_red)\
  .apply(highlight_max)\
  .highlight_null(null_color='red')\
  .set_precision(2)

display(s)

	HIP	Name	Designation	V	B_V	Distance(ly)	Chart
0	32349	Sirius	9 alpha CMa	-1.44	0.01	8.6	322
1	30438	Canopus	alpha Car	-0.62	0.16		454
2	71683	Rigil Kent	alpha Cen	-0.28c	0.71	4.4	985
3	69673	Arcturus	16 alpha Boo	-0.05v	1.24	37	696
4	91262	Vega	3 alpha Lyr	0.03v	0.00	25.3	1153
5	24608	Capella	13 alpha Aur	0.08v	0.80	42	73
6	24436	Rigel	19 beta Ori	0.18v	-0.03		279
7	37279	Procyon	10 alpha CMi	0.4	0.43	11.4	224
8	7588	Achernar	alpha Eri	0.45v	-0.16	144	478
9	27989	Betelgeuse	58 alpha Ori	0.45v	1.50		229
10	68702		beta Cen	0.61v	-0.23		986
11	97649	Altair	53 alpha Aql	0.76v	0.22	16.8	1267
12	60718	Acrux	alpha Cru	0.77c	-0.24		1002

	A	B	C	D	E
0	1.0	1.329212	NaN	-0.316280	-0.990810
1	2.0	-1.070816	-1.438713	0.564417	0.295722
2	3.0	-1.626404	0.219565	0.678805	1.889273
3	4.0	0.961538	0.104011	NaN	0.850229
4	5.0	1.453425	1.057737	0.165562	0.515018
5	6.0	-1.336936	0.562861	1.392855	-0.063328
6	7.0	0.121668	1.207603	-0.002040	1.627796
7	8.0	0.354493	1.037528	-0.385684	0.519818
8	9.0	1.686583	-1.325963	1.428984	-2.089354
9	10.0	-0.129820	0.631523	-0.586538	0.290720

	A	B	C	D	E
0	1	1.32921	nan	-0.31628	-0.99081
1	2	-1.07082	-1.43871	0.564417	0.295722
2	3	-1.6264	0.219565	0.678805	1.88927
3	4	0.961538	0.104011	nan	0.850229
4	5	1.45342	1.05774	0.165562	0.515018
5	6	-1.33694	0.562861	1.39285	-0.063328
6	7	0.121668	1.2076	-0.00204021	1.6278
7	8	0.354493	1.03753	-0.385684	0.519818
8	9	1.68658	-1.32596	1.42898	-2.08935
9	10	-0.12982	0.631523	-0.586538	0.29072

	A	B	C	D	E
0	1	1.32921	nan	-0.31628	-0.99081
1	2	-1.07082	-1.43871	0.564417	0.295722
2	3	-1.6264	0.219565	0.678805	1.88927
3	4	0.961538	0.104011	nan	0.850229
4	5	1.45342	1.05774	0.165562	0.515018
5	6	-1.33694	0.562861	1.39285	-0.063328
6	7	0.121668	1.2076	-0.00204021	1.6278
7	8	0.354493	1.03753	-0.385684	0.519818
8	9	1.68658	-1.32596	1.42898	-2.08935
9	10	-0.12982	0.631523	-0.586538	0.29072

	A	B	C	D	E
0	1	1.32921	nan	-0.31628	-0.99081
1	2	-1.07082	-1.43871	0.564417	0.295722
2	3	-1.6264	0.219565	0.678805	1.88927
3	4	0.961538	0.104011	nan	0.850229
4	5	1.45342	1.05774	0.165562	0.515018
5	6	-1.33694	0.562861	1.39285	-0.063328
6	7	0.121668	1.2076	-0.00204021	1.6278
7	8	0.354493	1.03753	-0.385684	0.519818
8	9	1.68658	-1.32596	1.42898	-2.08935
9	10	-0.12982	0.631523	-0.586538	0.29072

GreyMamba

Thinking Allowed … (under construction)

Thinking Allowed … (under construction)

Thinking Allowed … (under construction)

Tips for using Pandas

Some Pandas tips for Jupyter Notebooks¶

Quickly create a DataFrame¶

Formatting output¶

Display control¶

Styling¶

	A	B	C
0	0	5.678900	the
1	10	623.987654	quick
2	200	72.345000	brown
3	3000	8987.100000	fox
4	40000	9.000000	jumps

	A	B	C
0	0	5.68	the
1	10	623.99	quick
2	200	72.34	brown
3	3000	8,987.10	fox
4	40000	9.00	jumps

	A	B	C	D	E
0	1	1.3	nan	-0.32	-0.99
1	2	-1.1	-1.4	0.56	0.3
2	3	-1.6	0.22	0.68	1.9
3	4	0.96	0.1	nan	0.85
4	5	1.5	1.1	0.17	0.52
5	6	-1.3	0.56	1.4	-0.063
6	7	0.12	1.2	-0.002	1.6
7	8	0.35	1	-0.39	0.52
8	9	1.7	-1.3	1.4	-2.1
9	10	-0.13	0.63	-0.59	0.29