How can I sum a column given by a specific index? - python

how can i sum a column given by index? i tried to use 'for row in list', but it results a TypeError.
function
index
sample

You should do as you say. However, the sum of the names is not possible if you initialize the column_sum to 0 (which is an integer).
The sum of the names does not seem relevant. But if you want the function get_total to work on any column, you should first check if the variables are integers. If not, then return 0 for instance.
def get_total(index, stations):
if not stations:
# The list is empty
return 0
if type(stations[0][index]) is int:
# The sum is possible
return sum([station[index] for station in stations])
return 0

Related

How do you sum a dataframe based off a grouping in Python pandas?

I have a for loop with the intent of checking for values greater than zero.
Problem is, I only want each iteration to check the sum of a group of ID’s.
The grouping would be a match of the first 8 characters of the ID string.
I have that grouping taking place before the loop but the loop still appears to search the entire df instead of each group.
LeftGroup = newDF.groupby(‘ID_Left_8’)
for g in LeftGroup.groups:
if sum(newDF[‘Hours_Calc’] > 0):
print(g)
Is there a way to filter that sum to each grouping of leftmost 8 characters?
I was expecting the .groups function to accomplish this, but it still seems to search every single ID.
Thank you.
def filter_and_sum(group):
return sum(group[group['Hours_Calc'] > 0]['Hours_Calc'])
LeftGroup = newDF.groupby('ID_Left_8')
results = LeftGroup.apply(filter_and_sum)
print(results)
This will compute the sum of the Hours_Calc column for each group, filtered by the condition Hours_Calc > 0. The resulting series will have the leftmost 8 characters as the index, and the sum of the Hours_Calc column as the value.

I want to save the mean (by row) of different set of dataframe columns and store them in a new dataframe

For doing so, I have a list of lists (which are my clusters), for example:
asset_clusts=[[0,1],[3,5],[2,4, 12],...]
and original dataframe(in my code I call it 'x') is as :
return time series of s&p 500 companies
I want to choose column [0,1] of the original dataframe and compute the mean (by row) of them and store it in a new dataframe, then compute the mean of columns [3, 5], and add it to the new dataframe, and so on ...
mu=pd.DataFrame()
for j in range(get_number_of_elements(asset_clusts)):
mu=x.iloc[:,asset_clusts[j]].mean(axis=1)
but, it gives to me only a column and i checked, this one column is the mean of last cluster columns
in case of ambiguity, function of get_number_of_elements is:
def get_number_of_elements(clist):
count = 0
for element in clist:
count += 1
return count
def get_number_of_elements(clust_list):
count = 0
for element in clust_list:
count += 1
return count
I solved it and in case if it would be helpful for others, here is the final function:
def clustered_series(x, org_asset_clust):
"""
x:return data
org_asset_clust: list of clusters
----> mean of each cluster returns by row
"""
def get_number_of_elements(org_asset_clust):
count = 0
for element in org_asset_clust:
count += 1
return count
mu=[]
for j in range(get_number_of_elements(org_asset_clust)):
mu.append(x.iloc[:,org_asset_clust[j]].mean(axis=1))
cluster_mean=pd.concat(mu, axis=1)
return cluster_mean

How to return the n-largest value in a pandas dataframe

I would like to return the second largest value within a dataframe column.
When I use .nlargest(n) you can get the individual value of the highest by setting variable float() but when you increase n above 1 it returns both first and second highest as seen below. I want just the second highest to be set as my variable.
n = 2
largest_ja = narr_df.nlargest(n, columns='Rate_Variance')['Rate_Variance'].to_string(index=False)
The results below when n = 2. I can not set float(largest_ja) as it is a table
2546 46363.899240
9109 9299.873859
You can take the idxmin() of the n largest results, so in your case it would be something along that line:
df.apply(lambda x: x.nlargest(2).idxmin())

First Transition Value of DataFrame Column without Temporary Variables

I am trying to find the first transition value of a dataframe column as efficiently as possible. I would prefer not to have temporary variables. Say I have a dataframe (df) with a column of:
Column1
0
0
0
-1
1
In this case, the value which I'm looking for is -1, which is the first time the value changes. I want to use this in an if statement for whether the value is first transitioning to 1 or -1. The pseudocode being:
if (first transition value == 1):
# Something
elif: (first transition value == -1):
# Something else
General case
You can compare the values in the dataframe to the first one, take only the differing values and use the first value of these.
df[df.Column1 != df.Column1.iloc[0]].Column1.values[0]
Special case
If you always want to find the first differing element from 0 you could just do it like this:
df[df.Column1 != 0].Column1.values[0]

Counting the values in function with python

I have the following function
def sum_NE(data, i, col='VALUES'):
return data.iloc[get_NE(i, len(data))][col].sum()
This works great. But I'd like to do one more thing. Column VALUES includes zeros and values bigger than zero. How do I count all the values bigger than zero, that are used when evaluating sum()?
Function get_NE returns a list. I tried the code below, but it doesn't work.
def sum_NE(data, i, col='VALUES'):
return data.iloc[get_NE(i, len(data))][col].count()
Function get_NE is a function that returns a list. E.g. [5, 6, 8, 12]. These values are rows in data dataframe and with [col] reference i'm looking at certain values in VALUES column. Those values are at first aggregated. Now i want to find out how many of those values are aggregated.
I found a solution:
def sum_NE(data, i, col='VALUES'):
return sum(1 for i in data.iloc[get_NE(i, len(data))][col] if float(i) > 0)

Categories

Resources