I have several dataframes spanning from 2015/09 - 2021/12 as
and each dataframe looks like this
address balance
0 0xb794f5ea0 7504999.894348815
1 0xab7c74abc 1000000.004137971
2 0xdec042a90 5461102.0
3 0xd24400ae8 352884.012859933
4 0x2910543af 217051.397233717
5 0xcafb10ee6 211851.993504593
6 0x167a9333b 164052.961890484
7 0x32be343b9 113179.682105883
8 0xfbb1b73c4 69408.795824975
9 0x7180eb39a 3012.654675749
10 0x0a869d79a 85.171503551
11 0x61edcdf5b 5.0
12 0x2903cadbe 0.985099383
13 0xdd51f01d9 0.002366924
.. .... ....
I want to calculate, let's say, the sum of all balance per date and consolidate into one dataframe as
date balance_sum
2015-09-31 xx
2015-12-31 xx
2016-03-31 xx
...
2021-12-31 xx
Is there a way to do this operation? Thanks a lot in advance!
Here I'm running a for loop over all the dataframes, appending the sum to a list and then creating a dictionary with date as keys and sum as values, finally I'm converting dict to DataFrame.
balance_sum = []
for i in [exchanges_2015_09, exchanges_2015_09 ....]:
balance_sum.append(i['balance_sum'].sum())
data = dict(zip(pd.date_range('2015-09-30', periods=26, freq='Q'), balance_sum))
df = pd.DataFrame(data)
I'm trying something i've never done before and i'm in need of some help.
Basically, i need to filter sections of a pandas dataframe, transpose each filtered section and then concatenate every resulting section together.
Here's a representation of my dataframe:
df:
id | text_field | text_value
1 Date 2021-06-23
1 Hour 10:50
2 Position City
2 Position Countryside
3 Date 2021-06-22
3 Hour 10:45
I can then use some filtering method to isolate parts of my data:
df.groupby('id').filter(lambda x: True)
test = df.query(' id == 1 ')
test = test[["text_field","text_value"]]
test_t = test.set_index("text_field").T
test_t:
text_field | Date | Hour
text_value | 2021-06-23 | 10:50
If repeat the process looking for row with id == 3 and then concatenate the result with test_t, i'll have the following:
text_field | Date | Hour
text_value | 2021-06-23 | 10:50
text_value | 2021-06-22 | 10:45
I'm aware that performing this with rows where id == 2 will give me other columns and that's alright too, it's what a i want as well.
What i can't figure out is how to do this for every "id" in my dataframe. I wasn't able to create a function or for loop that works. Can somebody help me?
To summarize:
1 - I need to separate my dataframe in sections according with values from the "id" column
2 - After that i need to remove the "id" column and transpose the result
3 - I need to concatenate every resulting dataframe into one big dataframe
You can use pivot_table:
df.pivot_table(
index='id', columns='text_field', values='text_value', aggfunc='first')
Output:
text_field Date Hour Position
id
1 2021-06-23 10:50 NaN
2 NaN NaN City
3 2021-06-22 10:45 NaN
It's not exactly clear how you want to deal with repeating values though, would be great to have some description of that (id=2 would make a good example)
Update: If you want to ignore the ids and simply concatenate all the values:
pd.DataFrame(df.groupby('text_field')['text_value'].apply(list).to_dict())
Output:
Date Hour Position
0 2021-06-23 10:50 City
1 2021-06-22 10:45 Countryside
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two excel files
+ File one contains specific data about different customer (like: Sex, Age, Name...) and
+ File two contains different transactions for each customer
I want to create a new Column in File2 containing specific data to each Costumer from File1
file1.csv
customer_id,sex,age,name
af4wf3,m,12,mike
z20ask,f,15,sam
file2.csv
transaction_id,customer_id,amount
12h2j4hk,af4wf3,123.20
12h2j4h1,af4wf3,5.22
12h2j4h2,z20ask,13.20
12h2j4h3,af4wf3,1.20
12h2j4h4,z20ask,2341.12
12h2j4h5,z20ask,235.96
12h2j4h6,af4wf3,999.30
Load and join the dataframes
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
df1.set_index('customer_id', inplace=True)
df2.set_index('transaction_id', inplace=True)
output = df2.join(df1, on='customer_id')
output.to_csv('file2_updated.csv')
file2_updated.csv
transaction_id,customer_id,amount,sex,age,name
12h2j4hk,af4wf3,123.2,m,12,mike
12h2j4h1,af4wf3,5.22,m,12,mike
12h2j4h2,z20ask,13.2,f,15,sam
12h2j4h3,af4wf3,1.2,m,12,mike
12h2j4h4,z20ask,2341.12,f,15,sam
12h2j4h5,z20ask,235.96,f,15,sam
12h2j4h6,af4wf3,999.3,m,12,mike
the same as #jc416 but using pd.merge:
file2.merge(file1, on='customer_id')
transaction_id customer_id amount sex age name
0 12h2j4hk af4wf3 123.2 m 12 mike
1 12h2j4h1 af4wf3 5.22 m 12 mike
2 12h2j4h3 af4wf3 1.2 m 12 mike
3 12h2j4h6 af4wf3 999.3 m 12 mike
4 12h2j4h2 z20ask 13.2 f 15 sam
5 12h2j4h4 z20ask 2341.12 f 15 sam
6 12h2j4h5 z20ask 235.96 f 15 sam
You definitely should read Pandas merging 101
I have a data set which contains samples at the 1 second level from workout data (heart rate, watts, etc.) The data feed is not perfect and sometimes there are gaps. I need to have the dataset at 1 sec intervals with no missing rows.
Once I resample the data it looks along the lines of this:
activity_id watts
t
1 12345 5
2 12345 NaN
3 12345 15
6 98765 NaN
7 98765 10
8 98765 12
After the resample I cant get the interpolate to work properly. The problem is that the interpolation is going across the entire dataframe and I need it to 'reset' for every workout ID within the dataframe. The data should look like this after its working properly:
activity_id watts
t
1 12345 5
2 12345 10
3 12345 15
6 98765 NaN
7 98765 10
8 98765 12
Heres the snippet of code I have tried. It's not throwing any errors but also not doing the interpolation...
seconds = 1
df = df.groupby(['activity_id']).resample(str(seconds) + 'S').mean().reset_index(level='activity_id', drop=True)
df = df.reset_index(drop=False)
df = df.groupby('activity_id').apply(lambda group: group.interpolate(method='linear'))
Marked as correct answer here but not working for me:
Pandas interpolate within a groupby
I have fips codes here: http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt
And a dataset that looks like this:
fips_state fips_county value
1 1 10
1 3 34
1 5 37
1 7 88
1 9 93
How can I get the county name of each row using the data from the link above with pandas?
Simply load both data sets into DataFrames, then set the appropriate index:
df1.set_index(['fips_state', 'fips_county'], inplace=True)
This gives you a MultiIndex by state+county. Once you've done this for both datasets, you can trivially map them, for example:
df1['county_name'] = df2.county_name