Exporting a python dataframe + text element to an image - python

I want to print a data frame as a png image, and followed the following approach.
import pandas as pd
import dataframe_image as dfi
data = {'Type': ['Type 1', 'Type 2', 'Type 3', 'Total'], 'Value': [20, 21, 19, 60]}
df = pd.DataFrame(data)
dfi.export(df, 'table.png')
I however want to also print a date stamp above the table on the image - with the intention of creating a series of images on consecutive days. If possible I would also like to format the table with a horizontal line indicating the summation of values for the final 'Total' row.
Is this possible with the above package? Or is there a better approach to do this?

You can add the line df.index.name = pd.Timestamp('now').replace(microsecond=0) to add the timestamp on the first row:
To add the line you can use .style.set_table_styles:
data = {'Type': ['Type 1', 'Type 2', 'Type 3'], 'Value': [20, 21, 19]}
df = pd.DataFrame(data)
df.index.name = pd.Timestamp('now').replace(microsecond=0)
df.loc[len(df)] = ['Total',df['Value'].sum()]
test = df.style.set_table_styles([{'selector' : '.row3','props' : [('border-top','3px solid black')]}])
dfi.export(test, 'table.png')

Related

Change column names except for certain columns

Assuming I have the following dataframe
df = pd.DataFrame(
{
'ID': ['AB01'],
'Col A': ["Yes"],
'Col B': ["L"],
'Col C': ["Yes"],
'Col D': ["L"],
'Col E': ["Yes"],
'Col F': ["L"],
'Type': [85]
}
)
I want to change all column names by changing it lowercase, replace space with underscore and adding string _filled to the end of name, except for columns named in list skip = ['ID', 'Type'].
How can I achieve this? I want the end resulting dataframe to have column names as ID, col_a_filled, col_b_filled......,Type
You can use df.rename along with a dict comprehension to get a nice one-liner:
df = df.rename(columns={col:col.lower().replace(" ", "_")+"_filled" for col in df.columns if col not in skip})

How can I force Pandas dataframe column query to return only str and not interpret what it is seeing?

I have an excel spreadsheet that has four column names: "Column One, Column Two, Jun-17, and Column Three"
When I display my column names after reading in the data I get something very different from the "Jun-17" text I was hoping to receive. What should I be doing differently?
import pandas as pd
df = pd.read_excel('Sample.xlsx', sheet_name='Sheet1')
print("Column headings:")
print(df.columns.tolist())
Column headings:
['Column One', 'Column Two', datetime.datetime(2018, 6, 17, 0, 0), 'Column Three']
One of your column names is a datetime object. You can rename it to a string using datetime.strftime. Example below.
import datetime
import pandas as pd, numpy as np
df = pd.DataFrame(columns=['Column One', 'Column Two',
datetime.datetime(2018, 6, 17, 0, 0), 'Column Three'])
df.columns.values[2] = df.columns[2].strftime('%b-%d')
# alternatively:
# df = df.rename(columns={df.columns[2]: df.columns[2].strftime('%b-%d')})
df.columns
# Index(['Column One', 'Column Two', 'Jun-17', 'Column Three'], dtype='object')
If you see this problem repeatedly, wrap your dataframe in a function:
def normalise_columns(df):
df.columns = [i.strftime('%b-%d') if isinstance(i, datetime.datetime) \
else i for i in df.columns]
return df
normalise_columns(df).columns

Iterative loop in dataframe - creating data dictionary dynamically

I have thousands of row in given block structure. In this structure First row - Response Comments, Second row- Customer name and Last row - Recommended are fixed. Rest of the fields/rows are not mandatory.
I am trying to write a code where I am reading Column Name = 'Response Comments' then Key = Column Values of next row (Customer Name).
This should be done from Row - Response Comments to Recommended,
Then breaking a loop and having new key value.
The data is from an Excel file:
from pandas import DataFrame
import pandas as pd
import os
import numpy as np
xl = pd.ExcelFile('Filepath')
df = xl.parse('Reviews_Structured')
print(type (df))
RowNum Column Name Column Values Key
1 Response Comments they have been unresponsive
2 Customer Name Brian
.
.
.
.
13 Recommended no
Any help regarding this loop code will be appreciated.
One way to implement your logic is using collections.defaultdict and a nested dictionary structure. Below is an example:
from collections import defaultdict
import pandas as pd
# input data
df = pd.DataFrame([[1, 'Response Comments', 'they have been unresponsive'],
[2, 'Customer Name', 'Brian'],
.....
[9, 'Recommended', 'yes']],
columns=['RowNum', 'Column Name', 'Column Values'])
# fill Key columns
df['Key'] = df['Column Values'].shift(-1)
df.loc[df['Column Name'] != 'Response Comments', 'Key'] = np.nan
df['Key'] = df['Key'].ffill()
# create defaultdict of dict
d = defaultdict(dict)
# iterate dataframe
for row in df.itertuples():
d[row[4]].update({row[2]: row[3]})
# defaultdict(dict,
# {'April': {'Customer Name': 'April',
# 'Recommended': 'yes',
# 'Response Comments': 'they have been responsive'},
# 'Brian': {'Customer Name': 'Brian',
# 'Recommended': 'no',
# 'Response Comments': 'they have been unresponsive'},
# 'John': {'Customer Name': 'John',
# 'Recommended': 'yes',
# 'Response Comments': 'they have been very responsive'}})
Am I understanding this correctly, that you want a new DataFrame with
columns = ['Response Comments', 'Customer name', ...]
to reshape your data from the parsed excel file?
Create an empty DataFrame from the known, mandatory column names, e.g
df_new = pd.DataFrame(columns=['Response Comments', 'Customer name', ...])
index = 0
iterate over the parsed excel file row by row and assign your values
for k, row in df.iterrows():
index += 1
if row['Column Name'] in df_new:
df_new.at[index, row['Column Name']] = row['Column Values']
if row['Column Name'] == 'Recommended':
continue
Not a beauty, but I'm not quite sure what exactly you're trying to achieve :)

How can i graph this as a stacked bar chart?

I have this code below, it produces my data frame exactly how i want it, but i can't seem to graph it via a grouped bar chart,
I'd like to have the department on the X axis and on the Y axis have completed with the remaining information on top
import pandas as pd
import matplotlib
data = pd.read_excel('audit.xls', skiprows=2, index_col = 'Employee Department')
data.rename(columns = {'Curriculum Name':'Curriculum','Organization Employee Number':'Employee_Number', 'Employee Name': 'Employee','Employee Email':'Email', 'Employee Status':'Status', 'Date Assigned':'Assigned','Completion Date':'Completed'}, inplace=True)
data.drop(['Employee_Number', 'Employee','Assigned', 'Status', 'Manager Name', 'Manager Email', 'Completion Status','Unnamed: 1', 'Unnamed: 5', 'Unnamed: 6'], axis=1, inplace=True)
new_data = data.query('Curriculum ==["CARB Security Training","OIS Support Training","Legal EO Training"]')
new_data2 = new_data.groupby('Employee Department').count().eval('Remaining = Email - Completed', inplace=False)
new_data2
I assume i need to convert it to a pivot table somehow since that's how it is in excel
Have you tried something like this: new_data2[['Completed','Remaining']].plot.bar(stacked=True)
The following example works for me:
df = pd.DataFrame(np.arange(1,10).reshape(3,3), columns=['Email', 'Completed', 'Remaining'], index=['A', 'B', 'C'])
df[['Completed', 'Remaining']].plot.bar(stacked=True)

Pandas indexed dataframe display: use top left empty box

Is there a way to put text in the top left box of a dataframe display? Does that field have a name? See below:
import pandas as pd
raw_data = {'Regiment': ['Nighthawks', 'Raptors'],
'Company': ['1st', '2nd'],
'preTestScore': [4, 24],
'postTestScore': [25, 94]}
pd.DataFrame(raw_data, columns = ['Regiment', 'Company', 'preTestScore', 'postTestScore']).set_index('Regiment')
Yes. That space is used for the name of the columns. It can be filled in by doing
df.columns.name = 'your name'

Categories

Resources