GreyMamba

Thinking Allowed … (under construction)

Thinking Allowed … (under construction)

Nothing is interesting if you're not interested
Helen Clark MacInnes
Python Stuff

I tend to spend too much time mucking around with Python code - as you can see from the bits of code scattered around this site. So, I've decided to add a page specifically for stuff related to just Python. It'll include snippets of code, interesting applications, algorithms or anything else Python related. Code that is purely there as a means to demonstrate or calculate something specific will remain as it is, in the most relevant post. Most of this stuff will be migrating to my sister pages -

- in the near future.

Sorry if you were hoping to see something on the Pythonidae family of snakes - interesting as they are.

Tips for using Pandas

Essentially, this is an HTML file generated from the original Jupiter Notebook
Tips for using Pandas

Some Pandas tips for Jupyter Notebooks

https://pandas.pydata.org/pandas-docs/stable/user_guide/

A first recommendation is to not use print() but to import and use display()

You might need from IPython.display import display

In [1]:
import pandas as pd
import numpy as np
#from IPython.display import display

Quickly create a DataFrame

In the Formatting Output section below we use a conventional method for creating a DataFrame using a Python Dictionary.

However, there is a quick and dirty method using cut/paste from an Excel Sheet, or indeed an editor.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_clipboard.html

Just 'cut/copy' data from an editor, or Excel - which puts it onto the 'clipboard - and run the read_clipboard() command to generate a new DataFrame.

For example, this 'copy' from a notebook++ edit:

./data1.png

Running the next Cell - be careful as the 'sep=' parameter can make a big difference to the end result - gives:

In [12]:
df = pd.read_clipboard(sep=',')
display(df)
HIP Name Designation V B_V Distance(ly) Chart
0 32349 Sirius 9 alpha CMa -1.44 0.01 8.6 322
1 30438 Canopus alpha Car -0.62 0.16 454
2 71683 Rigil Kent alpha Cen -0.28c 0.71 4.4 985
3 69673 Arcturus 16 alpha Boo -0.05v 1.24 37 696
4 91262 Vega 3 alpha Lyr 0.03v 0.00 25.3 1153
5 24608 Capella 13 alpha Aur 0.08v 0.80 42 73
6 24436 Rigel 19 beta Ori 0.18v -0.03 279
7 37279 Procyon 10 alpha CMi 0.4 0.43 11.4 224
8 7588 Achernar alpha Eri 0.45v -0.16 144 478
9 27989 Betelgeuse 58 alpha Ori 0.45v 1.50 229
10 68702 beta Cen 0.61v -0.23 986
11 97649 Altair 53 alpha Aql 0.76v 0.22 16.8 1267
12 60718 Acrux alpha Cru 0.77c -0.24 1002

Formatting output

You can change the ways that Pandas in a Jupyter Notebook formats its float output GLOBALLY by using Pandas Python .float_format strings. Note that the changes will remain in place until the kernel restarts.

Here's some examples - to make sure you see these as intended, re-start the kernel.

In [3]:
df = pd.DataFrame({'A': [0, 10, 200, 3000, 40000],
                   'B': [5.6789, 623.98765432, 72.345, 8987.1, 9.0],
                   'C': ['the', 'quick', 'brown', 'fox', 'jumps']})
display(df)

pd.options.display.float_format = '{:,.2f}'.format
display(df)

pd.options.display.float_format = '{:,.2g}'.format
display(df)
A B C
0 0 5.678900 the
1 10 623.987654 quick
2 200 72.345000 brown
3 3000 8987.100000 fox
4 40000 9.000000 jumps
A B C
0 0 5.68 the
1 10 623.99 quick
2 200 72.34 brown
3 3000 8,987.10 fox
4 40000 9.00 jumps
A B C
0 0 5.7 the
1 10 6.2e+02 quick
2 200 72 brown
3 3000 9e+03 fox
4 40000 9 jumps

Display control

In fact there are quite a lot of display options you can access.

https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html

Amongst many are:

pd.options.display.width=None

pd.set_option('display.max_columns', 999)

pd.set_option('display.height', 1000)

pd.set_option('display.max_rows', 500)

pd.set_option('display.max_columns', 500)

pd.set_option('display.width', 1000)

In [4]:
# First, reset float_format from the setting we set above
pd.options.display.float_format = None

Now set up a random(ish) DataFrame with some missing values

In [5]:
np.random.seed(24)
df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4), columns=list('BCDE'))],
               axis=1)
df.iloc[3, 3] = np.nan
df.iloc[0, 2] = np.nan
In [6]:
display(df)
A B C D E
0 1.0 1.329212 NaN -0.316280 -0.990810
1 2.0 -1.070816 -1.438713 0.564417 0.295722
2 3.0 -1.626404 0.219565 0.678805 1.889273
3 4.0 0.961538 0.104011 NaN 0.850229
4 5.0 1.453425 1.057737 0.165562 0.515018
5 6.0 -1.336936 0.562861 1.392855 -0.063328
6 7.0 0.121668 1.207603 -0.002040 1.627796
7 8.0 0.354493 1.037528 -0.385684 0.519818
8 9.0 1.686583 -1.325963 1.428984 -2.089354
9 10.0 -0.129820 0.631523 -0.586538 0.290720

To use styling, you either have to show the DataFrame as the last Cell command, or, use the display() command instead. Then you can produce functions that can be used as style maps

In [7]:
def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    color = 'red' if val < 0 else 'black'
    return 'color: %s' % color

s = df.style.applymap(color_negative_red)
display(s)
A B C D E
0 1 1.32921 nan -0.31628 -0.99081
1 2 -1.07082 -1.43871 0.564417 0.295722
2 3 -1.6264 0.219565 0.678805 1.88927
3 4 0.961538 0.104011 nan 0.850229
4 5 1.45342 1.05774 0.165562 0.515018
5 6 -1.33694 0.562861 1.39285 -0.063328
6 7 0.121668 1.2076 -0.00204021 1.6278
7 8 0.354493 1.03753 -0.385684 0.519818
8 9 1.68658 -1.32596 1.42898 -2.08935
9 10 -0.12982 0.631523 -0.586538 0.29072
In [8]:
def highlight_max(s):
    '''
    highlight the maximum in a Series in light green.
    '''
    is_max = s == s.max()
    return ['background-color: lightgreen' if v else '' for v in is_max]

display(df.style.apply(highlight_max))
A B C D E
0 1 1.32921 nan -0.31628 -0.99081
1 2 -1.07082 -1.43871 0.564417 0.295722
2 3 -1.6264 0.219565 0.678805 1.88927
3 4 0.961538 0.104011 nan 0.850229
4 5 1.45342 1.05774 0.165562 0.515018
5 6 -1.33694 0.562861 1.39285 -0.063328
6 7 0.121668 1.2076 -0.00204021 1.6278
7 8 0.354493 1.03753 -0.385684 0.519818
8 9 1.68658 -1.32596 1.42898 -2.08935
9 10 -0.12982 0.631523 -0.586538 0.29072

And you can chain them together

In [9]:
s = df.style.\
    applymap(color_negative_red).\
    apply(highlight_max)

display(s)
A B C D E
0 1 1.32921 nan -0.31628 -0.99081
1 2 -1.07082 -1.43871 0.564417 0.295722
2 3 -1.6264 0.219565 0.678805 1.88927
3 4 0.961538 0.104011 nan 0.850229
4 5 1.45342 1.05774 0.165562 0.515018
5 6 -1.33694 0.562861 1.39285 -0.063328
6 7 0.121668 1.2076 -0.00204021 1.6278
7 8 0.354493 1.03753 -0.385684 0.519818
8 9 1.68658 -1.32596 1.42898 -2.08935
9 10 -0.12982 0.631523 -0.586538 0.29072

And there are some built-in styles to use

In [10]:
s = df.style.highlight_null(null_color='red')
display(s)
A B C D E
0 1 1.32921 nan -0.31628 -0.99081
1 2 -1.07082 -1.43871 0.564417 0.295722
2 3 -1.6264 0.219565 0.678805 1.88927
3 4 0.961538 0.104011 nan 0.850229
4 5 1.45342 1.05774 0.165562 0.515018
5 6 -1.33694 0.562861 1.39285 -0.063328
6 7 0.121668 1.2076 -0.00204021 1.6278
7 8 0.354493 1.03753 -0.385684 0.519818
8 9 1.68658 -1.32596 1.42898 -2.08935
9 10 -0.12982 0.631523 -0.586538 0.29072

Or, change the way that we display float values (as a simple alternative to pd.options.display.float_format, see above) using .set_precision()

In [11]:
s = df.style\
  .applymap(color_negative_red)\
  .apply(highlight_max)\
  .highlight_null(null_color='red')\
  .set_precision(2)

display(s)
A B C D E
0 1 1.3 nan -0.32 -0.99
1 2 -1.1 -1.4 0.56 0.3
2 3 -1.6 0.22 0.68 1.9
3 4 0.96 0.1 nan 0.85
4 5 1.5 1.1 0.17 0.52
5 6 -1.3 0.56 1.4 -0.063
6 7 0.12 1.2 -0.002 1.6
7 8 0.35 1 -0.39 0.52
8 9 1.7 -1.3 1.4 -2.1
9 10 -0.13 0.63 -0.59 0.29
RapidWeaver Icon

Made in RapidWeaver

Back
 
RapidWeaver Icon

Made in RapidWeaver