I have a pandas dataframe to which I applied cell coloring based on the values in a second dataframe. (The 2 dataframes are the same size). I did this based on this SO answer shown here:
Now that I've colored the dataframe, the cell outlines have disappeared. I saw a suggestion to use the following to add cell outlines:
df_navigator = df_navigator.data.style.set_properties(**{'text-align': 'left','border-color':'Black','border-width':'thin','border-style':'dotted'})
If I do that, then the cell coloring disappears.
How can I keep the custom cell coloring while adding the black borders back in to the displayed dataframe?
Adding the full code for reproducing:
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': 'foo bar foo'.split(),
'B': 'one one two'.split(),
'C': np.arange(3),
'D': np.arange(3) * 2
})
j = [
{ 'bgcolor': '#55aa2a'},
{ 'bgcolor': '#d42a2a'},
{ 'bgcolor': '#d42a2a'},
]
df2 = pd.DataFrame({
'E': j,
'F': j,
'G': j,
'H': j
})
df2 = df2.applymap(lambda x: 'background-color: {}'.format(x.get('bgcolor')))
def highlight(x):
return pd.DataFrame(df2.values, columns = x.columns)
df.style.apply(highlight, axis=None)
Thanks in advance!
Related
I have the following data-frame
import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)
If the running balance of an individuals hours reach 38 an adjustment to the cell that reached the 38th hour is made, a duplicate row is inserted and the balance of hours is added to the following row. The following code performs this and the difference in output of original data to adjusted data can be seen.
s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)
I then output to csv with the following.
out.to_csv('Output.csv', index = False)
I need to have the row that got adjusted and the row that got inserted highlighted in a color (any color) when it is exported to csv.
UPDATE: as csv does not accept colours to output, any way to tag the adjusted and insert rows is acceptable
You can't add any kind of formatting, including colors, to a CSV. You can however color records in a dataframe.
# single-index:
# Load a dataset
import seaborn as sns
df = sns.load_dataset('planets')# Now let's group the data
groups = df.groupby('method').mean()
groups
# Highlight the Maximum values
groups.style.highlight_max(color = 'lightgreen')
# multi-index:
import pandas as pd
df = pd.DataFrame([['one', 'A', 100,3], ['two', 'A', 101, 4],
['three', 'A', 102, 6], ['one', 'B', 103, 6],
['two', 'B', 104, 0], ['three', 'B', 105, 3]],
columns=['c1', 'c2', 'c3', 'c4']).set_index(['c1', 'c2']).sort_index()
print(df)
def highlight_min(data):
color= 'red'
attr = 'background-color: {}'.format(color)
if data.ndim == 1: # Series from .apply(axis=0) or axis=1
is_min = data == data.min()
return [attr if v else '' for v in is_min]
else:
is_min = data.groupby(level=0).transform('min') == data
return pd.DataFrame(np.where(is_min, attr, ''),
index=data.index, columns=data.columns)
df = df.apply(highlight_min, axis=0)
df
I have a dataset where I need to display different values with different colors. Not all the cells in the data are highlighted and only some of the data is highlighted.
Here are some of the colors:
dict_colors = {'a': 'red', 'b': 'blue','e':'tomato'}
How can I highlight all these cells with given colors?
MWE
# data
import pandas as pd
df = pd.DataFrame({'A': list('abcdef'), 'B': list('aabbcc'), 'C': list('aaabbb')})
# without for loop
(df.style
.apply(lambda dfx: ['background: red' if val == 'a' else '' for val in dfx], axis = 1)
.apply(lambda dfx: ['background: blue' if val == 'b' else '' for val in dfx], axis = 1)
)
# How to do this using for loop (I have so many values and different colors for them)
# My attempt
dict_colors = {'a': 'red', 'b': 'blue','e':'tomato'}
s = df.style
for key,color in dict_colors.items():
s = s.apply(lambda dfx: [f'background: {color}' if cell == key else '' for cell in dfx], axis = 1)
display(s)
You can try that:
import pandas as pd
df = pd.DataFrame({'A': list('abcdef'), 'B': list('aabbcc'), 'C': list('aaabbb')})
dict_colors = {'a': 'red', 'b': 'blue', 'e':'tomato'}
# create a Styler object for the DataFrame
s = df.style
def apply_color(val):
if val in dict_colors:
return f'background: {dict_colors[val]}'
return ''
# apply the style to each cell
s = s.applymap(apply_color)
# display the styled DataFrame
display(s)
I found a way using eval method, it is not the most elegant method but it works.
import pandas as pd
df = pd.DataFrame({'A': list('abcdef'), 'B': list('aabbcc'), 'C': list('aaabbb')})
dict_colors = {'a': 'red', 'b': 'blue','e':'tomato'}
lst = [ 'df.style']
for key,color in dict_colors.items():
text = f".apply(lambda dfx: ['background: {color}' if cell == '{key}' else '' for cell in dfx], axis = 1)"
lst.append(text)
s = ''.join(lst)
display(eval(s))
I have a pandas DataFrame with several columns containing dicts. I am trying to identify columns that contain at least 1 dict.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'i': [0, 1, 2, 3],
'd': [np.nan, {'p':1}, {'q':2}, np.nan],
't': [np.nan, {'u':1}, {'v':2}, np.nan]
})
# Iterate over cols to find dicts
cdict = [i for i in df.columns if isinstance(df[i][0],dict)]
cdict
[]
How do I find cols with dicts? Is there a solution to find cols with dicts without iterating over every cell / value of columns?
You can do :
s = df.applymap(lambda x:isinstance(x, dict)).any()
dict_cols = s[s].index.tolist()
print(dict_cols)
['d', 't']
We can apply over the columns although this still is iterating but making use of apply.
df.apply(lambda x: [any(isinstance(y, dict) for y in x)], axis=0)
EDIT: I think using applymap is more direct. However, we can use our boolean result to get the column names
any_dct = df.apply(lambda x: [any(isinstance(y, dict) for y in
x)], axis=0, result_type="expand")
df.iloc[:,any_dct.iloc[0,:].tolist()].columns.values
newbie to Python.
I'm trying to extract data from a dataframe and put it into a string to print to a docx file.
This is my current code:
add_run("Placeholder A").italic= True
for i in range(0, list(df.shape)[0]):
A = df.iloc[findings][0]
B = df.iloc[findings][1]
C =df.iloc[findings][2]
output = ('The value of A: {}, B: {}, C: {}').format(A,B,C)
doc.add_paragraph(output)
The output I am after is:
Placeholder A
print output for row 1 of DF
Placeholder A
print output for row 2 of DF
Currently it is printing all the outputs of the dataframe under Placeholder A.
Any ideas where I am going wrong?
Here (stackoverflow - How to iterate over rows in a DataFrame in Pandas?) you can find help with iterating over pandas dataframe rows. Rest to do is just to print(row) :)
edit:
Here is example (made based on answer from link) of code that prints rows from previously created dataframe:
import pandas as pd
inp = [{'c1': 10, 'c2': 100, 'c3': 100}, {'c1': 11, 'c2': 110, 'c3': 100}, {'c1': 12, 'c2': 120, 'c3': 100}]
df = pd.DataFrame(inp)
for index, row in df.iterrows():
A = row["c1"]
B = row["c2"]
C = row["c3"]
print('The value of A: {}, B: {}, C: {}'.format(A, B, C))
Given a dataframe containing a numeric (float) series and a categorical ID (df). How can I create a dictionary in the form 'key': [] where the key is an ID from the dataframe and the list contains the difference between the numbers in the separate dataframes?
I have managed this using loops though I am looking for a more pandas way of doing this.
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'a': [0.75435, 0.74897, 0.60949,
0.87438, 0.90885, 0.28547,
0.27327, 0.31078, 0.15576,
0.58139],
'id': list('aaaxxbbyyy')})
rl = pd.DataFrame({'b': [0.51, 0.30], 'id': ['aaa', 'bbb']})
interval = 0.1
d = defaultdict(list)
for index, row in rl.iterrows():
before = df[df['a'].between(row['b'] - interval, row['b'], inclusive=False)]
after = df[df['a'].between(row['b'], row['b'] + interval, inclusive=True)]
for x, b_row in before.iterrows():
d[b_row['id']].append((b_row['a'] - row['b']))
for x, a_row in after.iterrows():
d[a_row['id']].append((a_row['a'] - row['b']))
for k, v in d.items():
print('{k}\t{v}'.format(k=k, v=len(v)))
a 1
y 2
b 2
d
defaultdict(list,
{'a': [0.09948],
'b': [-0.01452, -0.02672],
'y': [0.07138, 0.01078]})