Tips for using Pandas
Some Pandas tips for Jupyter Notebooks¶
https://pandas.pydata.org/pandas-docs/stable/user_guide/
A first recommendation is to not use print() but to import and use display()
You might need from IPython.display import display
import pandas as pd
import numpy as np
#from IPython.display import display
Quickly create a DataFrame¶
In the Formatting Output section below we use a conventional method for creating a DataFrame using a Python Dictionary.
However, there is a quick and dirty method using cut/paste from an Excel Sheet, or indeed an editor.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_clipboard.html
Just 'cut/copy' data from an editor, or Excel - which puts it onto the 'clipboard - and run the read_clipboard() command to generate a new DataFrame.
For example, this 'copy' from a notebook++ edit:
Running the next Cell - be careful as the 'sep=' parameter can make a big difference to the end result - gives:
df = pd.read_clipboard(sep=',')
display(df)
Formatting output¶
You can change the ways that Pandas in a Jupyter Notebook formats its float output GLOBALLY by using Pandas Python .float_format strings. Note that the changes will remain in place until the kernel restarts.
Here's some examples - to make sure you see these as intended, re-start the kernel.
df = pd.DataFrame({'A': [0, 10, 200, 3000, 40000],
'B': [5.6789, 623.98765432, 72.345, 8987.1, 9.0],
'C': ['the', 'quick', 'brown', 'fox', 'jumps']})
display(df)
pd.options.display.float_format = '{:,.2f}'.format
display(df)
pd.options.display.float_format = '{:,.2g}'.format
display(df)
Display control¶
In fact there are quite a lot of display options you can access.
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
Amongst many are:
pd.options.display.width=None
pd.set_option('display.max_columns', 999)
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
Styling¶
Or you can get into styling
https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
# First, reset float_format from the setting we set above
pd.options.display.float_format = None
Now set up a random(ish) DataFrame with some missing values
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
axis=1)
df.iloc[3, 3] = np.nan
df.iloc[0, 2] = np.nan
display(df)
To use styling, you either have to show the DataFrame as the last Cell command, or, use the display()
command instead. Then you can produce functions that can be used as style maps
def color_negative_red(val):
"""
Takes a scalar and returns a string with
the css property `'color: red'` for negative
strings, black otherwise.
"""
color = 'red' if val < 0 else 'black'
return 'color: %s' % color
s = df.style.applymap(color_negative_red)
display(s)
def highlight_max(s):
'''
highlight the maximum in a Series in light green.
'''
is_max = s == s.max()
return ['background-color: lightgreen' if v else '' for v in is_max]
display(df.style.apply(highlight_max))
And you can chain them together
s = df.style.\
applymap(color_negative_red).\
apply(highlight_max)
display(s)
And there are some built-in styles to use
s = df.style.highlight_null(null_color='red')
display(s)
Or, change the way that we display float values (as a simple alternative to pd.options.display.float_format, see above) using .set_precision()
s = df.style\
.applymap(color_negative_red)\
.apply(highlight_max)\
.highlight_null(null_color='red')\
.set_precision(2)
display(s)

Made in RapidWeaver