Dataframe styling in Jupyter Notebook

Dataframe styling in Jupyter Notebook - python

I have a pandas dataframe to which I applied cell coloring based on the values in a second dataframe. (The 2 dataframes are the same size). I did this based on this SO answer shown here:
Now that I've colored the dataframe, the cell outlines have disappeared. I saw a suggestion to use the following to add cell outlines:
df_navigator = df_navigator.data.style.set_properties(**{'text-align': 'left','border-color':'Black','border-width':'thin','border-style':'dotted'})
If I do that, then the cell coloring disappears.
How can I keep the custom cell coloring while adding the black borders back in to the displayed dataframe?
Adding the full code for reproducing:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': 'foo bar foo'.split(),
'B': 'one one two'.split(),
'C': np.arange(3),
'D': np.arange(3) * 2
})
j = [
{ 'bgcolor': '#55aa2a'},
{ 'bgcolor': '#d42a2a'},
{ 'bgcolor': '#d42a2a'},
]
df2 = pd.DataFrame({
'E': j,
'F': j,
'G': j,
'H': j
})
df2 = df2.applymap(lambda x: 'background-color: {}'.format(x.get('bgcolor')))
def highlight(x):
return pd.DataFrame(df2.values, columns = x.columns)
df.style.apply(highlight, axis=None)
Thanks in advance!

Related

python pandas dataframe add colour to adjusted and inserted row

I have the following data-frame
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)
If the running balance of an individuals hours reach 38 an adjustment to the cell that reached the 38th hour is made, a duplicate row is inserted and the balance of hours is added to the following row. The following code performs this and the difference in output of original data to adjusted data can be seen.
s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)
I then output to csv with the following.
out.to_csv('Output.csv', index = False)
I need to have the row that got adjusted and the row that got inserted highlighted in a color (any color) when it is exported to csv.
UPDATE: as csv does not accept colours to output, any way to tag the adjusted and insert rows is acceptable

You can't add any kind of formatting, including colors, to a CSV. You can however color records in a dataframe.
# single-index:
# Load a dataset
import seaborn as sns
df = sns.load_dataset('planets')# Now let's group the data
groups = df.groupby('method').mean()
groups
# Highlight the Maximum values
groups.style.highlight_max(color = 'lightgreen')
# multi-index:
import pandas as pd
df = pd.DataFrame([['one', 'A', 100,3], ['two', 'A', 101, 4],
['three', 'A', 102, 6], ['one', 'B', 103, 6],
['two', 'B', 104, 0], ['three', 'B', 105, 3]],
columns=['c1', 'c2', 'c3', 'c4']).set_index(['c1', 'c2']).sort_index()
print(df)
def highlight_min(data):
color= 'red'
attr = 'background-color: {}'.format(color)
if data.ndim == 1: # Series from .apply(axis=0) or axis=1
is_min = data == data.min()
return [attr if v else '' for v in is_min]
else:
is_min = data.groupby(level=0).transform('min') == data
return pd.DataFrame(np.where(is_min, attr, ''),
index=data.index, columns=data.columns)
df = df.apply(highlight_min, axis=0)
df

How to style pandas dataframe using for loop

I have a dataset where I need to display different values with different colors. Not all the cells in the data are highlighted and only some of the data is highlighted.
Here are some of the colors:
dict_colors = {'a': 'red', 'b': 'blue','e':'tomato'}
How can I highlight all these cells with given colors?
MWE
# data
import pandas as pd
df = pd.DataFrame({'A': list('abcdef'), 'B': list('aabbcc'), 'C': list('aaabbb')})
# without for loop
(df.style
.apply(lambda dfx: ['background: red' if val == 'a' else '' for val in dfx], axis = 1)
.apply(lambda dfx: ['background: blue' if val == 'b' else '' for val in dfx], axis = 1)
)
# How to do this using for loop (I have so many values and different colors for them)
# My attempt
dict_colors = {'a': 'red', 'b': 'blue','e':'tomato'}
s = df.style
for key,color in dict_colors.items():
s = s.apply(lambda dfx: [f'background: {color}' if cell == key else '' for cell in dfx], axis = 1)
display(s)

You can try that:
import pandas as pd
df = pd.DataFrame({'A': list('abcdef'), 'B': list('aabbcc'), 'C': list('aaabbb')})
dict_colors = {'a': 'red', 'b': 'blue', 'e':'tomato'}
# create a Styler object for the DataFrame
s = df.style
def apply_color(val):
if val in dict_colors:
return f'background: {dict_colors[val]}'
return ''
# apply the style to each cell
s = s.applymap(apply_color)
# display the styled DataFrame
display(s)

I found a way using eval method, it is not the most elegant method but it works.
import pandas as pd
df = pd.DataFrame({'A': list('abcdef'), 'B': list('aabbcc'), 'C': list('aaabbb')})
dict_colors = {'a': 'red', 'b': 'blue','e':'tomato'}
lst = [ 'df.style']
for key,color in dict_colors.items():
text = f".apply(lambda dfx: ['background: {color}' if cell == '{key}' else '' for cell in dfx], axis = 1)"
lst.append(text)
s = ''.join(lst)
display(eval(s))

Find columns in Pandas DataFrame containing dicts

I have a pandas DataFrame with several columns containing dicts. I am trying to identify columns that contain at least 1 dict.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'i': [0, 1, 2, 3],
'd': [np.nan, {'p':1}, {'q':2}, np.nan],
't': [np.nan, {'u':1}, {'v':2}, np.nan]
})
# Iterate over cols to find dicts
cdict = [i for i in df.columns if isinstance(df[i][0],dict)]
cdict
[]
How do I find cols with dicts? Is there a solution to find cols with dicts without iterating over every cell / value of columns?

You can do :
s = df.applymap(lambda x:isinstance(x, dict)).any()
dict_cols = s[s].index.tolist()
print(dict_cols)
['d', 't']

We can apply over the columns although this still is iterating but making use of apply.
df.apply(lambda x: [any(isinstance(y, dict) for y in x)], axis=0)
EDIT: I think using applymap is more direct. However, we can use our boolean result to get the column names
any_dct = df.apply(lambda x: [any(isinstance(y, dict) for y in
x)], axis=0, result_type="expand")
df.iloc[:,any_dct.iloc[0,:].tolist()].columns.values

How to print individual rows of a pandas dataframe using Python?

newbie to Python.
I'm trying to extract data from a dataframe and put it into a string to print to a docx file.
This is my current code:
add_run("Placeholder A").italic= True
for i in range(0, list(df.shape)[0]):
A = df.iloc[findings][0]
B = df.iloc[findings][1]
C =df.iloc[findings][2]
output = ('The value of A: {}, B: {}, C: {}').format(A,B,C)
doc.add_paragraph(output)
The output I am after is:
Placeholder A
print output for row 1 of DF
Placeholder A
print output for row 2 of DF
Currently it is printing all the outputs of the dataframe under Placeholder A.
Any ideas where I am going wrong?

Here (stackoverflow - How to iterate over rows in a DataFrame in Pandas?) you can find help with iterating over pandas dataframe rows. Rest to do is just to print(row) :)
edit:
Here is example (made based on answer from link) of code that prints rows from previously created dataframe:
import pandas as pd
inp = [{'c1': 10, 'c2': 100, 'c3': 100}, {'c1': 11, 'c2': 110, 'c3': 100}, {'c1': 12, 'c2': 120, 'c3': 100}]
df = pd.DataFrame(inp)
for index, row in df.iterrows():
A = row["c1"]
B = row["c2"]
C = row["c3"]
print('The value of A: {}, B: {}, C: {}'.format(A, B, C))

Get values from one dataframe where they are between an interval in another dataframe

Given a dataframe containing a numeric (float) series and a categorical ID (df). How can I create a dictionary in the form 'key': [] where the key is an ID from the dataframe and the list contains the difference between the numbers in the separate dataframes?
I have managed this using loops though I am looking for a more pandas way of doing this.
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'a': [0.75435, 0.74897, 0.60949,
0.87438, 0.90885, 0.28547,
0.27327, 0.31078, 0.15576,
0.58139],
'id': list('aaaxxbbyyy')})
rl = pd.DataFrame({'b': [0.51, 0.30], 'id': ['aaa', 'bbb']})
interval = 0.1
d = defaultdict(list)
for index, row in rl.iterrows():
before = df[df['a'].between(row['b'] - interval, row['b'], inclusive=False)]
after = df[df['a'].between(row['b'], row['b'] + interval, inclusive=True)]
for x, b_row in before.iterrows():
d[b_row['id']].append((b_row['a'] - row['b']))
for x, a_row in after.iterrows():
d[a_row['id']].append((a_row['a'] - row['b']))
for k, v in d.items():
print('{k}\t{v}'.format(k=k, v=len(v)))
a 1
y 2
b 2
d
defaultdict(list,
{'a': [0.09948],
'b': [-0.01452, -0.02672],
'y': [0.07138, 0.01078]})

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dataframe styling in Jupyter Notebook - python

Related

python pandas dataframe add colour to adjusted and inserted row

How to style pandas dataframe using for loop

Find columns in Pandas DataFrame containing dicts

How to print individual rows of a pandas dataframe using Python?

Get values from one dataframe where they are between an interval in another dataframe

Categories

Resources