What is wrong here in colouring the Excel sheet? - python

Here I need to colour 'red' for rows with Age<13 and colur 'green' for rows with Age>=13. But the final 'Report.xlsx' isn't getting coloured. What is wrong here?
import pandas as pd
data = [['tom', 10], ['nick', 12], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
df_styled = df.style.applymap(lambda x: 'background:red' if x < 13 else 'background:green', subset=['Age'])
df_styled.to_excel('Report.xlsx',engine='openpyxl',index=False)

Related

pandas remove equal rows by comparing columns in two dataframes

df1 = [['tom', 10, 1.2], ['nick', 15, 1.3], ['juli', 14, 1.4]]
df1 = [['tom', 10, 1.2], ['nick', 15, 1.3], ['juli', 100, 1.4]]
When I am trying compare and remove equal using below code
diff = df1.compare(df2, align_axis=1, keep_equal=True, keep_shape=True).drop_duplicates(
keep=False).rename(index={'self': 'df1', 'other': 'df2'}, level=-1)
I am getting
I want to keep only that row which has any unequal records and remove remaining. It means only last row should be present in output not all rows like blow. Please suggest changes.
Assuming you want everything from df1 that does not matches df2
n_columns = len(df1.columns)
df1[(df1 == df2).apply(sum, axis=1).apply(lambda x: x != n_columns)]

Weighted Pie Chart Pandas

I'd like to create a weighted pie chart using pandas. Here is a simple example to build off of.
import pandas as pd
data = [['red', 10], ['orange', 15], ['blue', 14], ['red', 8],
['orange', 11], ['blue', 20]]
df = pd.DataFrame(data, columns = ['color', 'weight'])
Simplest solution is to create a new "totals" column and create the pie chart from there.
new_df = df.groupby(['color'])[['weight']].sum()
new_df = new_df.reset_index()
new_df.columns = ['color', 'total']
From there I prefer to use plotly express
import plotly.express as px
fig = px.pie(new_df, values='total', names='color', title='...')

Loosing column names converting back to dataframe from list

I have created a dataframe,i need to do two operations:
Converting to a list
converting the same list back to the dataframe with original column names.
Issue: i am loosing the column names when i first convert to a list and when i convert back to dataframe i am not getting those column names
Please help!
import pandas as pd
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
#convert df to list
a=df.values.tolist()
#convert back to original dataframe
df1 = pd.DataFrame(a)
print(df1)
Current output
i am unable to get column names
You need pass columns names by df.columns, also if not default index is necessary pass it too:
df1 = pd.DataFrame(a, columns=df.columns, index=df.index)
If default RangeIndex in original DataFrame:
df1 = pd.DataFrame(a, columns=df.columns)
EDIT:
If need some similar structure use DataFrame.to_dict with orient='split' there are converted DataFrame to dictionary of columnsnames, index and data like:
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
d = df.to_dict(orient='split')
print (d)
{'index': [0, 1, 2],
'columns': ['Name', 'Age'],
'data': [['tom', 10], ['nick', 15], ['juli', 14]]}
And for original DataFrame use:
df2 = pd.DataFrame(d['data'], index=d['index'], columns=d['columns'])
print (df2)
Name Age
0 tom 10
1 nick 15
2 juli 14

Pandas get cell value by row NUMBER (NOT row index) and column NAME

data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'], index = [7,3,9])
display(df)
df.iat[0,0]
I'd like to return the Age in first row (basically something like df.iat[0,'Age']. Expected result = 10
Thanks for your help!
df['Age'].iloc[0] works too, similar to what Chris had answered.
Use iloc and Index.get_loc:
df.iloc[0, df.columns.get_loc("Age")]
Output:
10

Find rows in dataframe that contain words that are bigrams/trigrams

This example is for finding bigrams:
Given:
import pandas as pd
data = [['tom', 10], ['jobs', 15], ['phone', 14],['pop', 16], ['they_said', 11], ['this_example', 22],['lights', 14]]
test = pd.DataFrame(data, columns = ['Words', 'Freqeuncy'])
test
I'd like to write a query to only find words that are separated by a "_" such that the returning df would look like this:
data2 = [['they_said', 11], ['this_example', 22]]
test2 = pd.DataFrame(data2, columns = ['Words', 'Freqeuncy'])
test2
I'm wondering why something like this doesn't work.. data[data['Words'] == (len> 3)]
To use a function you need to use apply:
df[df.apply(lambda x: len(x['Words']), axis=1)> 3]
The pandas way of doing it is like this:
import pandas as pd
data = [['tom', 10], ['jobs', 15], ['phone', 14],['pop', 16], ['they_said', 11], ['this_example', 22],['lights', 14]]
test = pd.DataFrame(data, columns = ['Words', 'Freqeuncy'])
test = test[test.Words.str.contains('_')]
test
To do the opposite, you can do:
test = test[~test.Words.str.contains('_')]

Categories

Resources