left and right justify columns correctly in pandas dataframe

left and right justify columns correctly in pandas dataframe - python

I have a following dictionary where each key in the dictionary is associated with a dataframe.
data['total_brands'] = pd.DataFrame({'total_brands': {0: 164}})
data['new_portfolios_added'] = pd.DataFrame({'new_portfolios_added': {0: 3}})
data['total_updated_portfolios'] = pd.DataFrame({'total_updated_portfolios': {0: 1}})
data['family_per_brand'] = pd.DataFrame({'brand_name': {0: 'Morningstar',
1: 'Vanguard',
2: 'WisdomTree',
3: 'State Street',
4: 'First Trust',
5: 'Franklin Templeton Investments',
6: 'Logicly',
7: 'Nuveen',
8: 'Scott Burns',
9: 'Paul Merriman',
10: 'Fidelity',
11: 'FlexShares',
12: 'Alpha Architect',
13: 'Rick Ferri',
14: 'Craig Israelsen',
15: 'Rajan Subramanian',
16: 'Goldman Sachs',
17: 'JPMorgan',
18: 'Xtrackers',
19: 'PIMCO',
20: 'John Hancock',
21: 'Hartford',
22: 'Invesco',
23: 'Schwab'},
'family_per_brand': {0: 7,
1: 6,
2: 5,
3: 5,
4: 4,
5: 4,
6: 3,
7: 3,
8: 2,
9: 2,
10: 2,
11: 1,
12: 1,
13: 1,
14: 1,
15: 1,
16: 0,
17: 0,
18: 0,
19: 0,
20: 0,
21: 0,
22: 0,
23: 0}})
Now, i want to send all my data to an email in text format with in the body of the email with the data frames looking presentable. I searched around stack overflow and found these functions to help with my case:
blanks = r'^ *([a-zA-Z_0-9-]*) .*$'
blanks_comp = re.compile(blanks)
def find_index_in_line(line):
index = 0
spaces = False
for ch in line:
if ch == ' ':
spaces = True
elif spaces:
break
index += 1
return index
def pretty_to_string(df):
lines = df.to_string().split('\n')
header = lines[0]
m = blanks_comp.match(header)
indices = []
if m:
st_index = m.start(1)
indices.append(st_index)
non_header_lines = lines[1:len(lines)]
for line in non_header_lines:
index = find_index_in_line(line)
indices.append(index)
mn = np.min(indices)
newlines = []
for l in lines:
newlines.append(l[mn:len(l)])
return '\n'.join(newlines) if df.shape[0] > 1 else ':'.join(newlines)
Then I tried:
final = "\n".join(pretty_to_string(data[key]) for key in data.keys())
print(final)
Gives me the following output which is visually not appealing as you can see from the attachment.
Ideally i would want, 164 under total_brands, 3 under new_portfolios_added and 1 in total_updated_portfolios all aligned to the right
Ideally I would want the dataframe with the column "brand_name" aligned below the "total_updated_portfolios" tab

Perhaps saving to a csv, then opening in excel, copying the table into email would be fastest / easiest. That method often preserves the formatting you select, depending on your email client.
data.to_csv('newfilename.csv')

Related

Python nested for loop and if statement trouble

I am looking to build a dictionary of list that meet the following criteria:
Item0 in list1 == Item0 in list2 and Item1 in list1 == Item1 in list2 and Date2 in list1 < Date2 in list2.
Running the code as is gives me one list in the dict. The one list is the same even if I change the if statement to > instead of <.
Everything prior to this(see below) looks correct:
"for li in Liststr2:
for lr in Liststr1A:
if lr[0] == li[0] and lr[1] == li[1] and lr[2] > li[2]:"
Also "lr[2] & li[2]" are dtype <M8[ns] if that makes a difference.
df = {'Position': {0: 1, 1: 2, 2: 1, 3: 2, 4: 1, 5: 2, 6: 1, 7: 2, 8: 1, 9: 1, 10: 1, 11: 2, 12: 1, 13: 2, 14: 1, 15: 2, 16: 1, 17: 2, 18: 1, 19: 2, 20: 1}, 'Location': {0: 'AB1', 1: 'AB2', 2: 'AB3', 3: 'AB4', 4: 'AB4', 5: 'AB4', 6: 'AB4', 7: 'AB4', 8: 'AB4', 9: 'AB2', 10: 'AB5', 11: 'AB4', 12: 'AB4', 13: 'AB6', 14: 'AB6', 15: 'AB6', 16: 'AB6', 17: 'AB6', 18: 'AB6', 19: 'AB1', 20: 'AB1'}, 'DATE': {0: Timestamp('2021-05-22 18:00:00'), 1: Timestamp('2021-05-21 13:00:00'), 2: Timestamp('2021-05-24 12:23:00'), 3: Timestamp('2021-05-23 12:25:00'), 4: Timestamp('2021-05-23 12:25:00'), 5: Timestamp('2021-05-23 12:25:00'), 6: Timestamp('2021-05-23 12:25:00'), 7: Timestamp('2021-05-23 12:25:00'), 8: Timestamp('2021-05-23 12:25:00'), 9: Timestamp('2021-05-21 18:00:00'), 10: Timestamp('2021-05-21 18:00:00'), 11: Timestamp('2021-05-24 14:08:00'), 12: Timestamp('2021-05-24 14:08:00'), 13: Timestamp('2021-05-24 16:35:00'), 14: Timestamp('2021-05-24 16:35:00'), 15: Timestamp('2021-05-24 16:35:00'), 16: Timestamp('2021-05-24 16:35:00'), 17: Timestamp('2021-05-24 19:48:00'), 18: Timestamp('2021-05-24 19:48:00'), 19: Timestamp('2021-05-25 23:45:00'), 20: Timestamp('2021-05-25 23:45:00')}, 'Item Numbers': {0: '788-33', 1: '07-1', 2: '5214-3', 3: '003', 4: '003', 5: '009J', 6: '009J', 7: '009J', 8: '009J', 9: '07-1', 10: '68-302', 11: '6-5213', 12: '6-5214', 13: '1-801', 14: '1-801', 15: '1-801', 16: '1-801', 17: '4-008', 18: '4-008', 19: 'A-001', 20: 'A-001'}}
finaltemp = []
Finallist = {}
str1Temp = []
str2Temp = []
NaValues = []
Liststr2 = []
Liststr1 = []
Listna = []
n = 0
for col, row in df.iterrows() :
col1Temp = row['col1']
col2Temp = row['col2']
col3temp = row['col3']
col4Temp = row['col4']
if col4Temp == None:
NaValues = [col1Temp, col3temp, col2Temp]
Listna.append(NaValues)
if col4Temp == 'str1':
str1Temp = [col1Temp, col3temp, col2Temp]
Liststr1.append(str1Temp)
if col4Temp == 'str2':
str2Temp = [col1Temp, col3temp, col2Temp]
Liststr2.append(str2Temp)
for li in Liststr2:
for lr in Liststr1:
if lr[0] == li[0] and lr[1] == li[1] and lr[2] > li[2]:
finaltemp = [lr[0], lr[1], lr[2]]
n = +1
key = 'Bad' + str(n)
def t() : return {key : finaltemp}
Finallist.update(t())
print(Finallist)

This simplifies your final loop, which as I said should be at the left margin, not indented one step:
Liststr1A = Liststr1[:10]
for li in Liststr2:
for lr in Liststr1A:
if lr[0] == li[0] and lr[1] == li[1] and lr[2] > li[2]:
Finallist['Bad'+str(len(FinalList)+1)] = lr[:]
print(Finallist)
It's not clear to me why you want Finallist to be a dictionary, since you want incrementing keys. Why not just make it a list and use Finallist.append?

In python pandas, count the integers in a particular column and also count all the elements in particular column

There is a huge df with multiple columns but want to read only specific column that is interested to me:
in the below data, I would like to read only the column 'Type 1'
import numpy as np
import pandas as pd
data = {'Type 1': {0: 1, 1: 3, 2: 5, 3: 'HH', 4: 9, 5: 11, 6: 13, 7: 15, 8: 17},
'Type 2': {0: 'AA',
1: 'BB',
2: 'np.NaN',
3: '55',
4: '3.14',
5: '-96',
6: 'String',
7: 'FFFFFF',
8: 'FEEE'},
'Type 3': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0},
'Type 4': {0: '23',
1: 'fefe',
2: 'abcd',
3: 'dddd',
4: 'dad',
5: 'cfe',
6: 'cf42',
7: '321',
8: '0'},
'Type 5': {0: -120,
1: -120,
2: -120,
3: -120,
4: -120,
5: -120,
6: -120,
7: -120,
8: -120}}
df = pd.DataFrame(data)
df
int_count = df['Type 1'].count(0,numeric_only = True) # should count only cells that contain integers and return 8
total_count = df['Type 1'].count(0,numeric_only = False) # should count all the cells and return 9
I want something like count only the numeric values in particular column
eg: df['Type 1'].count(0,numeric_only = True) should return 8 (exclude counting the string 'HH' in Type 1 column)
df['Type 1'].count(0,numeric_only = False) should return 9 (total number of cells in the particular column)
but "df['Type 1'].count(0,numeric_only = True/False)" this is not working as I expect...

I would suggest the below:
int_count = len(df.loc[df['Type 1'].astype(str).str.isnumeric()])
total_count = len(df)

Python: looping through 2 dataframes having thresholds and calculating revenue, stuck

I am trying to solve a business problem using Python but have difficulties to come up with a script to solve it. I have tried to loop through the dataframe using df.iterrows() but I am totally stuck because I just don't know how to proceed.
We process volumes in production orders of 1 type of resource that we need to process FIFO (first in first out). Each lot has a certain volume and price, after using up a lot we start with the next lot (FIFO).
Question: How can I automate the calculation of column Revenu ? Can you come up with some Python code that I can use to automate this process? Would you use a while or for loop, and would you iterate through the dataframe?
Herebelow I posted a print screen of the solution, on the left the production orders and on the right the volume and price per lot.
Below the image I posted 2 dictionaries containing the data of the screenshot.
Would really appreciate your help...
{'Productionorder': {0: 'Productionorder 1',
1: 'Productionorder 2',
2: 'Productionorder 3',
3: 'Productionorder 4',
4: 'Productionorder 5',
5: 'Productionorder 6',
6: 'Productionorder 7',
7: 'Productionorder 8',
8: 'Productionorder 9',
9: 'Productionorder 10',
10: 'Productionorder 11',
11: 'Productionorder 12',
12: 'Productionorder 13',
13: 'Productionorder 14',
14: 'Productionorder 15',
15: 'Productionorder 16',
16: 'Productionorder 17',
17: 'Productionorder 18',
18: 'Productionorder 19',
19: 'Productionorder 20',
20: 'Productionorder 21',
21: 'Productionorder 22'},
'Processed volume': {0: 810,
1: 3240,
2: 3177,
3: 1620,
4: 6480,
5: 5120,
6: 10880,
7: 13770,
8: 21060,
9: 4860,
10: 810,
11: 1620,
12: 15390,
13: 15390,
14: 6800,
15: 4480,
16: 10200,
17: 16650,
18: 2550,
19: 9050,
20: 9900,
21: 3200},
'Lotno.': {0: 1,
1: 1,
2: 1,
3: 1,
4: 2,
5: 2,
6: 2,
7: 2,
8: 2,
9: 2,
10: 2,
11: 2,
12: 2,
13: 3,
14: 3,
15: 3,
16: 3,
17: 3,
18: 3,
19: 3,
20: 4,
21: 4},
'Left of Lotno.': {0: 8490,
1: 5250,
2: 2073,
3: 453,
4: 75973,
5: 70853,
6: 59973,
7: 46203,
8: 25143,
9: 20283,
10: 19473,
11: 17853,
12: 2463,
13: 52073,
14: 45273,
15: 40793,
16: 30593,
17: 13943,
18: 11393,
19: 2343,
20: 38443,
21: 35243},
'Revenu': {0: 1741.5,
1: 6966.0,
2: 6830.549999999999,
3: 3483.0,
4: 10315.800000000001,
5: 7936.0,
6: 16864.0,
7: 21343.5,
8: 32643.0,
9: 7533.0,
10: 1255.5,
11: 2511.0,
12: 23854.5,
13: 20622.750000000004,
14: 8840.0,
15: 5824.0,
16: 13260.0,
17: 21645.0,
18: 3315.0,
19: 11765.0,
20: 12492.15,
21: 4000.0}}
{'Date': {0: Timestamp('2021-01-01 00:00:00'),
1: Timestamp('2021-01-02 00:00:00'),
2: Timestamp('2021-01-03 00:00:00'),
3: Timestamp('2021-01-04 00:00:00')},
'Lotno.': {0: 1, 1: 2, 2: 3, 3: 4},
'Volume': {0: 9300, 1: 82000, 2: 65000, 3: 46000},
'Price': {0: 2.15, 1: 1.55, 2: 1.3, 3: 1.25}}

Assuming you have two dataframes:
One for the Production Orders
And another for the Lot Details
The following function should allow you to calculate the Revenues (Along with the 'Lotno.' and 'Left of Lotno.' intermediary columns)
Requirements for each dataframe:
The Production Orders DataFrame must:
contain a column with the title 'Processed volume'
the index should be of consecutive integers starting at 0.
The Lot Details must:
contain the Columns ['Lotno.', 'Volume', 'Price']
have at least one row
rows should be ordered in the order of expected depletion.
In the event that the Quantity available in the lot is depleted, no additional revenue will be generated.
def fill_revenue(df1_orig, df2):
"""
df1_orig is the Production Orders DataFrame
df2 is the Lot Details DataFrame
The returned DataFrame is based on a copy of the df1_orig
"""
df1 = df1_orig.copy()
# Create Empty Columns for calculated fields
df1['Lotno.'] = None
df1['Left of Lotno.'] = None
df1['Revenu'] = None
def recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict=None):
"""A function used to update the new values of a row"""
if return_dict is None:
return_dict = {'Revenu': 0}
return_dict.update({'Lotno.': current_lot, 'Left of Lotno.': current_lot_quantity})
lot_info = df2.loc[df2['Lotno.'] == current_lot].iloc[0]
# start calculation
if current_lot_quantity > order_volume:
return_dict['Revenu'] += order_volume * lot_info['Price']
current_lot_quantity -= order_volume
order_volume = 0
return_dict['Left of Lotno.'] = current_lot_quantity
else:
return_dict['Revenu'] += current_lot_quantity * lot_info['Price']
order_volume -= current_lot_quantity
try:
lot_info = df2.iloc[df2.index.get_loc(lot_info.name) + 1]
except IndexError:
return_dict['Left of Lotno.'] = 0
return return_dict
current_lot = lot_info['Lotno.']
current_lot_quantity = lot_info['Volume']
recursive_revenu_calc(order_volume, current_lot, current_lot_quantity, return_dict)
return return_dict
# updating each row of the Production Orders DataFrame
for idx, row in df1.iterrows():
order_volume = row['Processed volume']
current_lot = df2.iloc[0]['Lotno.'] if idx == 0 else df1.iloc[idx - 1]['Lotno.']
current_lot_quantity = df2.iloc[0]['Volume'] if idx == 0 else df1.iloc[idx - 1]['Left of Lotno.']
update_dict = recursive_revenu_calc(order_volume, current_lot, current_lot_quantity)
for key, value in update_dict.items():
df1.loc[idx, key] = value
return df1

How to remove duplicates based on lower frequency [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 2 years ago.
I have a table that looks like this
I want to be able to keep ids for brands that have highest freq. For example in case of audi both ids have same frequencies so keep only one. In case of mercedes-benz keep the latter one since it has frequency 7.
This is my dataframe:
{'Brand':
{0: 'audi',
1: 'audi',
2: 'bmw',
3: 'dacia',
4: 'fiat',
5: 'ford',
6: 'ford',
7: 'honda',
8: 'honda',
9: 'hyundai',
10: 'kia',
11: 'mercedes-benz',
12: 'mercedes-benz',
13: 'nissan',
14: 'nissan',
15: 'opel',
16: 'renault',
17: 'renault',
18: 'renault',
19: 'renault',
20: 'toyota',
21: 'toyota',
22: 'volvo',
23: 'vw',
24: 'vw',
25: 'vw',
26: 'vw'},
'id':
{0: 'audi_a4_dynamic_2016_otomatik',
1: 'audi_a6_standart_2015_otomatik',
2: 'bmw_5 series_executive_2016_otomatik',
3: 'dacia_duster_laureate_2017_manuel',
4: 'fiat_egea_easy_2017_manuel',
5: 'ford_focus_trend x_2015_manuel',
6: 'ford_focus_trend x_2015_otomatik',
7: 'honda_civic_eco elegance_2017_otomatik',
8: 'honda_cr-v_executive_2018_otomatik',
9: 'hyundai_tucson_elite plus_2017_otomatik',
10: 'kia_sportage_concept plus_2015_otomatik',
11: 'mercedes-benz_c-class_amg_2016_otomatik',
12: 'mercedes-benz_e-class_edition e_2015_otomatik',
13: 'nissan_qashqai_black edition_2014_manuel',
14: 'nissan_qashqai_sky pack_2015_otomatik',
15: 'opel_astra_edition_2016_manuel',
16: 'renault_clio_joy_2016_manuel',
17: 'renault_kadjar_icon_2015_otomatik',
18: 'renault_kadjar_icon_2016_otomatik',
19: 'renault_mégane_touch_2017_otomatik',
20: 'toyota_corolla_touch_2015_otomatik',
21: 'toyota_corolla_touch_2016_otomatik',
22: 'volvo_s60_advance_2018_otomatik',
23: 'vw_jetta_comfortline_2013_otomatik',
24: 'vw_passat_highline_2017_otomatik',
25: 'vw_tiguan_sport&style_2012_manuel',
26: 'vw_tiguan_sport&style_2013_manuel'},
'freq': {0: 4,
1: 4,
2: 7,
3: 4,
4: 4,
5: 4,
6: 4,
7: 4,
8: 4,
9: 4,
10: 4,
11: 4,
12: 7,
13: 4,
14: 4,
15: 4,
16: 4,
17: 4,
18: 4,
19: 4,
20: 4,
21: 4,
22: 4,
23: 4,
24: 7,
25: 4,
26: 4}}
Edit: tried one of the answers and got an extra level of header

You need to pandas.groupby Brand and then aggregate with respect to the maximal frequency.
Something like this should work:
df.groupby('Brand')[['id', 'freq']].agg({'freq': 'max'})

To get your result, run:
result = df.groupby('Brand', as_index=False).apply(
lambda grp: grp[grp.freq == grp.freq.max()].iloc[0])

Function to merge pandas dataframes based on different keywords

I am trying to create a function that creates a dataframe based on different lists of words that come up in a certain column of another dataframe.
In my example, I want a dataframe created on the basis of the words "chandos" and "electronics" coming up in the "description" column of the "uncategorised" dataframe.
The point of the function is that I want to be able to run this on different lists of words so I end up with different dataframes containing just the words I want.
words_Telephone = ["tfl", "electronics"]
df_Telephone = pd.DataFrame(columns=['date','description','paid out'])
def categorise(word_list, df_name):
""" takes the denoted terms from the "uncategorised" df and puts it into new df"""
for word in word_list:
df_name = uncategorised[uncategorised['description'].str.contains(word)]
return(df_name)
#apply the function
categorise(words_Telephone, df_Telephone)
I am expecting a dataframe that contains:
d = {'date': {0: '05/04/2017',
1: '06/04/2017',
2: '08/04/2017',
3: '08/04/2017',
4: '08/04/2017',
5: '10/04/2017',
6: '10/04/2017',
7: '10/04/2017'},
'description': {0: 'tfl',
1: 'tfl',
2: 'tfl',
3: 'tfl',
4: 'ac electronics ',
5: 'ac electronics ',},
'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
'paid out': {0: 3.0,
1: 4.3,
2: 6.1,
3: 1.5,
4: 16.39,
5: 20.4,}}
Reproducible df:
d = {'date': {0: '05/04/2017',
1: '06/04/2017',
2: '06/04/2017',
3: '08/04/2017',
4: '08/04/2017',
5: '08/04/2017',
6: '10/04/2017',
7: '10/04/2017',
8: '10/04/2017'},
'description': {0: 'tfl',
1: 'mu subscription',
2: 'tfl',
3: 'tfl',
4: 'tfl',
5: 'ac electronics ',
6: 'itunes',
7: 'ac electronics ',
8: 'google adwords'},
'index': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10},
'paid out': {0: 3.0,
1: 16.9,
2: 4.3,
3: 6.1,
4: 1.5,
5: 16.39,
6: 12.99,
7: 20.4,
8: 39.68}}
SOLUTION:
def categorise(word_list):
""" takes the denoted terms from the "uncategorised" df and puts it into new df then deletes from the uncategorised df"""
global uncategorised
new_dfs = []
for word in word_list:
new_dfs.append(uncategorised[uncategorised['description'].str.contains(word)])
uncategorised= uncategorised[ ~uncategorised['description'].str.contains(word)]
return (uncategorised)
return (pd.concat(new_dfs).reset_index())
#apply the function
df_Telephone = categorise(words_Telephone)
df_Telephone

words_Telephone = ["tfl", "electronics"]
original_df = pd.DataFrame().from_dict({'date': ['05/04/2017','06/04/2017','06/04/2017','08/04/2017','08/04/2017','08/04/2017','10/04/2017','10/04/2017','10/04/2017'], 'description': ['tfl','mu subscription','tfl','tfl','tfl','ac electronics','itunes','ac electronics','google adwords'], 'paid out' :[ 3.0,16.9, 4.3,6.1,1.5,16.39,12.99,20.4,39.68]})
def categorise(word_list, original_df):
""" takes the denoted terms from the "uncategorised" df and puts it into new df"""
new_dfs = []
for word in word_list:
new_dfs.append(original_df[original_df['description'].str.contains(word)])
return pd.concat(new_dfs).reset_index()
#apply the function
df_Telephone = categorise(words_Telephone, original_df)
print(df_Telephone)
date description paid out
0 05/04/2017 tfl 3.00
1 06/04/2017 tfl 4.30
2 08/04/2017 tfl 6.10
3 08/04/2017 tfl 1.50
4 08/04/2017 ac electronics 16.39
5 10/04/2017 ac electronics 20.40

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

left and right justify columns correctly in pandas dataframe - python

Perhaps saving to a csv, then opening in excel, copying the table into email would be fastest / easiest. That method often preserves the formatting you select, depending on your email client. data.to_csv('newfilename.csv')

Related

Python nested for loop and if statement trouble

In python pandas, count the integers in a particular column and also count all the elements in particular column

Python: looping through 2 dataframes having thresholds and calculating revenue, stuck

How to remove duplicates based on lower frequency [duplicate]

Function to merge pandas dataframes based on different keywords

Categories

Resources