Pandas include columns where column names include specific string in IF statement?

Pandas include columns where column names include specific string in IF statement? - python

The following dataframe called df_merged, is a snippet of a larger dataframe with 30+ commoditites like oil, gold, silver, etc.
Date CRUDE_OIL CRUDE_OIL_SMA200 GOLD GOLD_SMA200 SILVER SILVER_SMA200
0 2021-04-26 61.91 48.415 1779.199951 1853.216498 26.211 25.269005
1 2021-04-27 62.939999 48.5252 1778 1853.028998 26.412001 25.30566
2 2021-04-28 63.860001 48.6464 1773.199951 1852.898998 26.080999 25.341655
3 2021-04-29 65.010002 48.7687 1768.099976 1852.748498 26.052999 25.377005
4 2021-04-30 63.580002 48.8861 1767.300049 1852.529998 25.853001 25.407725
How can I implement a way to compare the regular commodity price with the SMA200 equivalent in an IF statement?
My current setup includes the below if statement for 30+ columns but I believe this can be done in a function.
comm_name = []
comm_averages = []
if (df_merged.CRUDE_OIL.tail(1) > df_merged.CRUDE_OIL_SMA200.tail(1)).any():
print("CRUDE_OIL above 200SMA")
comm_name.append("CRUDE_OIL")
comm_averages.append(1)
else:
print("CRUDE_OIL under 200SMA")
comm_name.append("CRUDE_OIL")
comm_averages.append(0)
if (df_merged.GOLD.tail(1) > df_merged.GOLD_SMA200.tail(1)).any():
print("GOLD above 200SMA")
comm_name.append("GOLD")
comm_averages.append(1)
else:
print("GOLD under 200SMA")
comm_name.append("GOLD")
comm_averages.append(0)
if (df_merged.SILVER.tail(1) > df_merged.SILVER_SMA200.tail(1)).any():
print("SILVER above 200SMA")
comm_name.append("SILVER")
comm_averages.append(1)
else:
print("SILVER under 200SMA")
comm_name.append("SILVER")
comm_averages.append(0)
comm_signals = pd.DataFrame(
{'Name': comm_name,
'Signal': comm_averages
})
comm_signals
output of comm_signals:
Name Signal
0 CRUDE_OIL 1
1 GOLD 0
2 SILVER 1
I looked through this SO thread but couldn't figure out how to implement: Find column whose name contains a specific string
I guess the goal is a function like this:
comm_name = []
comm_averages = []
def comm_compare(df):
if (df_merged["X"].tail(1) > df_merged["X" + "_SMA200"].tail(1)).any():
print(X + "above 200SMA")
comm_name.append(X)
comm_averages.append(1)
else:
print(X + "under 200SMA")
comm_name.append(X)
comm_averages.append(0)
comm_signals = pd.DataFrame(
{'Name': comm_name,
'Signal': comm_averages
})
print(comm_signals)
Name Signal
0 CRUDE_OIL 1
1 GOLD 0
2 SILVER 1

Try stack + groupby diff
import numpy as np
import pandas as pd
df = pd.DataFrame({
'Date': ['2021-04-26', '2021-04-27', '2021-04-28', '2021-04-29',
'2021-04-30'],
'CRUDE_OIL': [61.91, 62.939999, 63.860001, 65.010002, 63.580002],
'CRUDE_OIL_SMA200': [48.415, 48.5252, 48.6464, 48.7687, 48.8861],
'GOLD': [1779.199951, 1778.0, 1773.199951, 1768.099976, 1767.300049],
'GOLD_SMA200': [1853.216498, 1853.028998, 1852.898998, 1852.748498,
1852.529998],
'SILVER': [26.211, 26.412001, 26.080999, 26.052999, 25.853001],
'SILVER_SMA200': [25.269005, 25.30566, 25.341655, 25.377005, 25.407725]
})
# Grab tail(1) and only numeric columns
# Replace this with more specific select or use filter if not all
# number columns are needed
s = df.tail(1).select_dtypes('number')
# Remove Suffix from all columns
s.columns = s.columns.str.rstrip('_SMA200')
s = (
# Subtract and subtract each group and compare to 0
(s.stack().groupby(level=1).diff().dropna() < 0)
.astype(int) # Convert to Int
.droplevel(0) # Cleanup levels index and column names
.reset_index()
.rename(columns={'index': 'Name', 0: 'Signal'})
)
print(s)
s:
Name Signal
0 CRUDE_OIL 1
1 GOLD 0
2 SILVER 1

To get the output per day you can do:
# to make connection between 'equity' and 'equity_sma200':
df.columns = df.columns.str.split("_", expand=True)
# we split dates to index - so we have only prices in the table:
df.set_index([("Date",)], append=False, inplace=True)
# you might not need casting
df = df.T.astype(float)
# since we have only 2 lines per day, per equity - we can just take negative sma, and cross-check aggregate sum:
mask = df.index.get_level_values(1) == "SMA200"
df.loc[mask] = -df.loc[mask]
df = df.groupby(level=0)[df.columns].sum().gt(0)
# moving columns back to human format:
df.columns = map(lambda x: x[0], df.columns)
Output for the sample data:
2021-04-26 2021-04-27 ... 2021-04-29 2021-04-30
CRUDE True True ... True True
GOLD False False ... False False
SILVER True True ... True True

Related

Pyhthon: Getting "list index out of range" error; I know why but don't know how to resolve this

I am currently working on a data science project. The Idea is to clean the data from "glassdoor_jobs.csv", and present it in a much more understandable manner.
import pandas as pd
df = pd.read_csv('glassdoor_jobs.csv')
#salary parsing
#Removing "-1" Ratings
#Clean up "Founded"
#state field
#Parse out job description
df['hourly'] = df['Salary Estimate'].apply(lambda x: 1 if 'per hour' in x.lower() else 0)
df['employer_provided'] = df['Salary Estimate'].apply(lambda x: 1 if 'employer provided salary' in x.lower() else 0)
df = df[df['Salary Estimate'] != '-1']
Salary = df['Salary Estimate'].apply(lambda x: x.split('(')[0])
minus_Kd = Salary.apply(lambda x: x.replace('K', '').replace('$',''))
minus_hr = minus_Kd.apply(lambda x: x.lower().replace('per hour', '').replace('employer provided salary:', ''))
df['min_salary'] = minus_hr.apply(lambda x: int(x.split('-')[0]))
df['max_salary'] = minus_hr.apply(lambda x: int(x.split('-')[1]))
I am getting the error at that last line. After digging a bit, I found out in minus_hr, some of the 'Salary Estimate' only has one number instead of range:
index
Salary Estimate
0
150
1
58
2
130
3
125-150
4
110-140
5
200
6
67- 77
And so on. Now I'm trying to figure out how to work around the "list index out of range", and make max_salary the same as the min_salary for the cells with only one value.
I am also trying to get average between the min and max salary, and if the cell only has a single value, make that value the average
So in the end, something like index 0 would look like:
index
min
max
average
0
150
150
150

You'll have to add in a conditional statement somewhere.
df['min_salary'] = minus_hr.apply(lambda x: int(x.split('-')[0]) if '-' in x else x)
The above might do it, or you can define a function.
def max_salary(cell_value):
if '-' in cell_value:
max_salary = split(cell_value, '-')[1]
else:
max_salary = cell_value
return max_salary
df['max_salary'] = minus_hr.apply(lambda x: max_salary(x))
def avg_salary(cell_value):
if '-' in cell_value:
salaries = split(cell_value,'-')
avg = sum(salaries)/len(salaries)
else:
avg = cell_value
return avg
df['avg_salary'] = minus_hr.apply(lambda x: avg_salary(x))
Swap in min_salary and repeat

Test the length of x.split('-') before accessing the elements.
salaries = x.split('-')
if len(salaries) == 1:
# only one salary number is given, so assign the same value to min and max
df['min_salary'] = df['max_salary'] = minus_hr.apply(lambda x: int(salaries[0]))
else:
# two salary numbers are given
df['min_salary'] = minus_hr.apply(lambda x: int(salaries[0]))
df['max_salary'] = minus_hr.apply(lambda x: int(salaries[1]))

If you want to avoid .apply()...
Try:
import numpy as np
# extract the two numbers (if there are two numbers) from the 'Salary Estimate' column
sals = df['Salary Estimate'].str.extractall(r'(?P<min_salary>\d+)[^0-9]*(?P<max_salary>\d*)?')
# reset the new frame's index
sals = sals.reset_index()
# join the extracted min/max salary columns to the original dataframe and fill any blanks with nan
df = df.join(sals[['min_salary', 'max_salary']].fillna(np.nan))
# fill any nan values in the 'max_salary' column with values from the 'min_salary' column
df['max_salary'] = df['max_salary'].fillna(df['min_salary'])
# set the type of the columns to int
df['min_salary'] = df['min_salary'].astype(int)
df['max_salary'] = df['max_salary'].astype(int)
# calculate the average
df['average_salary'] = df.loc[:,['min_salary', 'max_salary']].mean(axis=1).astype(int)
# see what you've got
print(df)
Or without using regex:
import numpy as np
# extract the two numbers (if there are two numbers) from the 'Salary Estimate' column
df['sals'] = df['Salary Estimate'].str.split('-')
# expand the list in sals to two columns filling with nan
df[['min_salary', 'max_salary']] = pd.DataFrame(df.sals.tolist()).fillna(np.nan)
# delete the sals column
del df['sals']
# # fill any nan values in the 'max_salary' column with values from the 'min_salary' column
df['max_salary'] = df['max_salary'].fillna(df['min_salary'])
# # set the type of the columns to int
df['min_salary'] = df['min_salary'].astype(int)
df['max_salary'] = df['max_salary'].astype(int)
# # calculate the average
df['average_salary'] = df.loc[:,['min_salary', 'max_salary']].mean(axis=1).astype(int)
# see you've got
print(df)
Output:
Salary Estimate min_salary max_salary average_salary
0 150 150 150 150
1 58 58 58 58
2 130 130 130 130
3 125-150 125 150 137
4 110-140 110 140 125
5 200 200 200 200
6 67- 77 67 77 72

How to append value to a cell in pandas dataframe

I have a dataframe where I am creating a new column and populating its value. Based on the condition, the new column needs to have some values appended to it if that row is encountered again.
So for example for a given dataframe:
df
id Stores is_open
1 'Walmart', 'Target' true
2 'Best Buy' false
3 'Target' true
4 'Home Depot' true
Now If I want to add a new column as a Ticker that can be a comma-separated string of tickers or list (whichever is preferable and easier. No preference on my end) for the given comma separated stores.
So for example ticker of Walmart is wmt and target is tgt. The wmt and tgt data I am getting from another dataframe based on matching key so I tried to add as follows but not all of them are assigned even though they have values and only one value followed by a comma is assigned to Tickers column and not multiple:
df['Tickers'] = ''
for _, row in df.iterrows():
stores = row['Stores']
list_stores = stores(',')
if len(list_stores) > 1:
for store in list_stores:
tmp_df = second_df[second_df['store_id'] == store]
ticker = tmp_df['Ticker'].values[0] if len(tmp_df['Ticker'].values) > 0 else None
if ticker:
df.loc[
df['Stores'].astype(str).str.contains(store), 'Ticker'] += '{},'.format(ticker)
Expected output:
id Stores is_open Ticker
1 'Walmart', 'Target' true wmt, tgt
2 'Best Buy' false bby
3 'Target' true tgt
4 'Home Depot' true nan
I would really appreciate if someone could help me out here.

You can use the apply method with axis=1 to pass the row and perform your calculations. See the code below:
import pandas as pd
mydict = {'id':[1,2],'Store':["'Walmart','Target'","'Best Buy'"], 'is_open':['true', 'false']}
df = pd.DataFrame(mydict, index=[0,1])
df.set_index('id',drop=True, inplace=True)
The df so far:
Store is_open
id
1 'Walmart','Target' true
2 'Best Buy' false
The lookup dataframe:
df2 = pd.DataFrame({'Store':['Walmart', 'Target','Best Buy'], 'Ticker':['wmt','tgt','bby']})
Store Ticker
0 Walmart wmt
1 Target tgt
2 Best Buy bby
here is the code for adding the column:
def add_column(row):
items = row['Store'].split(',')
tkr_list = []
for string in items:
mystr = string.replace("'","")
tkr = df2.loc[df2['Store']==mystr,'Ticker'].values[0]
tkr_list.append(tkr)
return tkr_list
df['Ticker']=df.apply(add_column, axis=1)
and this is the result for df:
Store is_open Ticker
id
1 'Walmart','Target' true [wmt, tgt]
2 'Best Buy' false [bby]

Python compare and count row values

I'd like to compare two columns row by row and count when a specific value in each row is not correct. For instance:
group landing_page
control new_page
control old_page
treatment new_page
treatment old_page
control old_page
I'd like to count the number of times treatment is not equal to new_page or control is not equal to old_page. It could be the opposite too I guess, aka treatment is equal to new_page.

Use pandas groupby to find the counts of group/landing page pairs.
Use groupby again to find the group counts.
To find the count of other landing pages within each group, subtract each
landing page count from the group count.
df = pd.DataFrame({'group': ['control', 'control', 'treatment',
'treatment', 'control'],
'landing_page': ['new_page', 'old_page', 'new_page',
'old_page', 'old_page']})
# find counts per pairing
df_out = df.groupby(['group', 'landing_page'])['landing_page'].count().to_frame() \
.rename(columns={'landing_page': 'count'}).reset_index()
# find totals for groups
df_out['grp_total'] = df_out.groupby('group')['count'].transform('sum')
# find count not equal to landing page
df_out['inverse_count'] = df_out['grp_total'] - df_out['count']
print(df_out)
group landing_page count grp_total inverse_count
0 control new_page 1 3 2
1 control old_page 2 3 1
2 treatment new_page 1 2 1
3 treatment old_page 1 2 1

This sounds like a job for the zip() function.
First, setup your inputs and counters:
group = ["control", "control", "treatment", "treatment", "control"]
landingPage = ["new_page", "old_page", "new_page", "old_page", "old_page"]
treatmentNotNew = 0
controlNotOld = 0
Then zip the two inputs you are comparing into an iterator of tuples:
zipped = zip(group, landingPage)
Now you can iterate over the tuple values a (group) and b (landing apge) while counting each time that treatment != new_page and control != old_page:
for a, b in zipped:
if((a == "treatment") and (not b == "new_page")):
treatmentNotNew += 1
if((a == "control") and (not b == "old_page")):
controlNotOld += 1
Finally, print your result!
print("treatmentNotNew = " + str(treatmentNotNew))
print("controlNotOld = " + str(controlNotOld))
>> treatmentNotNew = 1
>> controlNotOld = 1

I would create a new column with map that maps your desired output given the input. Then you can easily test if the new mapping column equals the landing_page column.
df = pd.DataFrame({
'group': ['control', 'control', 'treatment', 'treatment', 'control'],
'landing_page': ['old_page', 'old_page', 'new_page', 'old_page', 'new_page']
})
df['mapping'] = df.group.map({'control': 'old_page', 'treatment': 'new_page'})
(df['landing_page'] != df['mapping']).sum()
# 2

iterations over list in dataframe

I have the following issue:
I have a dataframe with 3 columns :
The first is userID, the second is invoiceType and the third the time of creation of the invoice.
df = pd.read_csv('invoice.csv')
Output: UserID InvoiceType CreateTime
1 a 2018-01-01 12:31:00
2 b 2018-01-01 12:34:12
3 a 2018-01-01 12:40:13
1 c 2018-01-09 14:12:25
2 a 2018-01-12 14:12:29
1 b 2018-02-08 11:15:00
2 c 2018-02-12 10:12:12
I am trying to plot the invoice cycle for each user. I need to create2 new columns, time_diff, and time_diff_wrt_first_invoice. time_diff will represent the time difference between each invoice for each user and time_diff_wrt_first_invoice will represent the time difference between all the invoices and the first invoice, which will be interesting for ploting purposes. This is my code:
"""
********** Exploding a variable that is a list in each dataframe cell
"""
def explode_list(df,x):
return (df[x].apply(pd.Series)
.stack()
.reset_index(level = 1, drop=True)
.to_frame(x))
"""
****** applying explode_list to all the columns ******
"""
def explode_listDF(df):
exploaded_df = pd.DataFrame()
for x in df.columns.tolist():
exploaded_df = pd.concat([exploaded_df, explode_list(df,x)],
axis = 1)
return exploaded_df
"""
******** Getting the time difference column in pivot table format
"""
def pivoted_diffTime(df1, _freq=60):
# _ freq is 1 for minutes frequency
# _freq is 60 for hour frequency
# _ freq is 60*24 for daily frequency
# _freq is 60*24*30 for monthly frequency
df = df.sort_values(['UserID', 'CreateTime'])
df_pivot = df.pivot_table(index = 'UserID',
aggfunc= lambda x : list(v for v in x)
)
df_pivot['time_diff'] = [[0]]*len(df_pivot)
for user in df_pivot.index:
try:
_list = [0]+[math.floor((x - y).total_seconds()/(60*_freq))
for x,y in zip(df_pivot.loc[user, 'CreateTime'][1:],
df_pivot.loc[user, 'CreateTime'][:-1])]
df_pivot.loc[user, 'time_diff'] = _list
except:
print('There is a prob here :', user)
return df_pivot
"""
***** Pipelining the two functions to obtain an exploaded dataframe
with time difference ******
"""
def get_timeDiff(df, _frequency):
df = explode_listDF(pivoted_diffTime(df, _freq=_frequency))
return df
And once I have time_diff, I am creating time_diff_wrt_first_variable this way:
# We initialize this variable
df_with_timeDiff['time_diff_wrt_first_invoice'] =
[[0]]*len(df_with_timeDiff)
# Then we loop over users and we apply a cumulative sum over time_diff
for user in df_with_timeDiff.UserID.unique():
df_with_timeDiff.loc[df_with_timeDiff.UserID==user,'time_diff_wrt_first_i nvoice'] = np.cumsum(df_with_timeDiff.loc[df_with_timeDiff.UserID==user,'time_diff'])
The problem is that I have a dataframe with hundreds of thousands of users and it's so time consuming. I am wondering if there is a solution that fits better my need.

Check out .loc[] for pandas.
df_1 = pd.DataFrame(some_stuff)
df_2 = df_1.loc[tickers['column'] >= some-condition, 'specific-column']
you can access specific columns, run a loop to check for certain types of conditions, and if you add a comma after the condition and put in a specific column name it'll only return that column.
I'm not 100% sure if that answers whatever question you're asking, cause I didn't actually see one, but it seemed like you were running a lot of for loops and stuff to isolate columns, which is what .loc[] is for.

I have found a better solution. Here's my code :
def next_diff(x):
return ([0]+[(b-a).total_seconds()/3600 for b,a in zip(x[1:], x[:-1])])
def create_timediff(df):
df.sort_values(['UserID', 'CreateTime'], inplace=True)
a = df.groupby('UserID').agg({'CreateTime' :lambda x : list(v for v in x)}).CreateTime.apply(next_diff)
b = a.apply(np.cumsum)
a = a.reset_index()
b = b.reset_index()
# Here I explode the lists inside the cell
rows1= []
_ = a.apply(lambda row: [rows1.append([row['UserID'], nn])
for nn in row.CreateTime], axis=1)
rows2 = []
__ = b.apply(lambda row: [rows2.append([row['UserID'], nn])
for nn in row.CreateTime], axis=1)
df1_new = pd.DataFrame(rows1, columns=a.columns).set_index(['UserID'])
df2_new = pd.DataFrame(rows2, columns=b.columns).set_index(['UserID'])
df = df.set_index('UserID')
df['time_diff']= df1_new['CreateTime']
df['time_diff_wrt_first_invoice'] = df2_new['CreateTime']
df.reset_index(inplace=True)
return df

How to store values from loop to a dataframe?

I have created a loop that generates some values. I want to store those values in a data frame. For example, completed one loop, append to the first row.
def calculate (allFiles):
result = pd.DataFrame(columns = ['Date','Mid Ebb Total','Mid Flood Total','Mid Ebb Control','Mid Flood Control'])
total_Mid_Ebb = 0
total_Mid_Flood = 0
total_Mid_EbbControl = 0
total_Mid_FloodControl = 0
for file_ in allFiles:
xls = pd.ExcelFile(file_)
df = xls.parse('General Impact')
Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter
Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
total_Mid_Ebb += Mid_Ebb.Station.nunique() #count unique stations = sample number
total_Mid_Flood += Mid_Flood.Station.nunique()
total_Mid_EbbControl += Mid_Ebb_control.Station.nunique()
total_Mid_FloodControl += Mid_Flood_control.Station.nunique()
Mid_Ebb_withoutControl = total_Mid_Ebb - total_Mid_EbbControl
Mid_Flood_withoutControl = total_Mid_Flood - total_Mid_FloodControl
print('Ebb Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl))
print('Flood Tide: The total number of sample is {}. Number of sample without control station is {}. Number of sample in control station is {}'.format(total_Mid_Flood, Mid_Flood_withoutControl, total_Mid_FloodControl))
The dataframe result contains 4 columns. The date is fixed. I would like to put total_Mid_Ebb, Mid_Ebb_withoutControl, total_Mid_EbbControl to the dataframe.

I believe you need append scalars in loop to list of tuples and then use DataFrame constructor. Last count differences in result DataFrame:
def calculate (allFiles):
data = []
for file_ in allFiles:
xls = pd.ExcelFile(file_)
df = xls.parse('General Impact')
Mid_Ebb = df[df['Tidal Mode'] == "Mid-Ebb"] #filter
Mid_Ebb_control = df[df['Station'].isin(['C1','C2','C3'])] #filter control
Mid_Flood = df[df['Tidal Mode'] == "Mid-Flood"] #filter
Mid_Flood_control = df[df['Station'].isin(['C1','C2','C3', 'SR2'])] #filter control
total_Mid_Ebb = Mid_Ebb.Station.nunique() #count unique stations = sample number
total_Mid_Flood = Mid_Flood.Station.nunique()
total_Mid_EbbControl = Mid_Ebb_control.Station.nunique()
total_Mid_FloodControl = Mid_Flood_control.Station.nunique()
data.append((total_Mid_Ebb,
total_Mid_Flood,
total_Mid_EbbControl,
total_Mid_FloodControl))
cols=['total_Mid_Ebb','total_Mid_Flood','total_Mid_EbbControl','total_Mid_FloodControl']
result = pd.DataFrame(data, columns=cols)
result['Mid_Ebb_withoutControl'] = result.total_Mid_Ebb - result.total_Mid_EbbControl
result['Mid_Flood_withoutControl']=result.total_Mid_Flood-result.total_Mid_FloodControl
#if want check all totals
total = result.sum()
print (total)
return result

Here is an example of loading data per column in a dataframe after each iteration of a loop. While this is not THE only method, it's one that helps understand concept better.
Necessary imports
import pandas as pd
from random import randint
First define an empty data-frame of 5 columns to match your problem
df = pd.DataFrame(columns=['A','B','C','D','E'])
Next we iterate through for loop and generate value using randint() and add one value at a time to each column Staring with 'A' all the way to 'E',
for i in range(5): #add 5 rows of data
df.loc[i, ['A']] = randint(0,99)
df.loc[i, ['B']] = randint(0,99)
df.loc[i, ['C']] = randint(0,99)
df.loc[i, ['D']] = randint(0,99)
df.loc[i, ['E']] = randint(0,99)
We get a DF whose 5 rows are populated.
>>> df
A B C D E
0 4 74 71 37 90
1 41 80 77 81 8
2 14 16 82 98 89
3 1 77 3 56 91
4 34 9 85 44 19
Hope above helps and you are able to tailor to your needs.

Note this does not produce a row per file as requested, but it more of a comment about general use of Pandas for problems like this - it is often easier to read all the data then process using the pandas files than to write your own loops over different cases.
I think you are not using pandas in the idiomatic way here. I think you will save a lot of code and get a more understandable result if you do it this way:
controlstations = ['C1', 'C2', 'C3', 'SR2']
df = pd.concat(pd.read_excel(file_, sheetname='General Impact') for file_ in files)
df['Control'] = df.Station.isin(controlstations)
counts = df.groupby(['Control', 'Tidal Mode']).Station.agg('nunique')
So here you are reading all the excel files into a single dataframe first, then adding a column to indicate if that is a control station or not, then using groupby to count the different combinations.
counts is a series with a two-dimensional index (for some made up data):
Control Tidal Mode
False Mid-Ebb 2
Mid-Flood 2
True Mid-Ebb 2
Mid-Flood 2
You can access the values you have in your function like this:
total_Mid_Ebb = counts['Mid-Ebb'].sum()
total_Mid_Ebb_Control = counts['Mid-Ebb', True]
total_Mid_Flood = counts['Mid-Flood'].sum()
total_Mid_Flood_Control = counts['Mid-Flood', True]
After which you can easily add them to a DataFrame:
import datetime
today = datetime.datetime.today()
totals = [total_Mid_Ebb, total_Mid_Flood, total_Mid_Ebb_Control, total_Mid_Flood_Control]
result = pd.DataFrame(data=[totals], columns=['Mid Ebb Total', 'Mid Flood Total', 'Mid Ebb Control', 'Mid Flood Control'],
index=[today])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas include columns where column names include specific string in IF statement? - python

Related

Pyhthon: Getting "list index out of range" error; I know why but don't know how to resolve this

How to append value to a cell in pandas dataframe

Python compare and count row values

iterations over list in dataframe

How to store values from loop to a dataframe?

Categories

Resources