Multiple columns into one [duplicate] - python

This question already has answers here:
Convert columns into rows with Pandas
(6 answers)
Closed 2 years ago.
I have a large table with multiple columns as input table in below format:
Col-A Col-B Col-C Col-D Col-E Col-F
001 10 01/01/2020 123456 123123 123321
001 20 01/02/2020 123456 123111
002 10 01/03/2020 111000 111123
And I'd like to write a code such that it will show lines per each Col-A and so that instead of multiple columns Col-D,E,F I will only have Col-D:
Col-A Col-B Col-C Col-D
001 10 01/01/2020 123456
001 10 01/01/2020 123123
001 10 01/01/2020 123321
001 20 01/02/2020 123456
001 20 01/02/2020 123111
002 10 01/03/2020 111000
002 10 01/03/2020 111123
Any ideas will be appreciated,
Thanks,
Nurbek

You can use pd.melt
import pandas as pd
newdf = pd.melt(
df,
id_vars=['Col-A', 'Col-B', 'Col-C'],
value_vars=['Col-D', 'Col-E', 'Col-F']
).dropna()
This will drop 'Col-D', 'Col-E' and 'Col-F', but create two new columns variable and value. Variable column will denote the column from which your value came from. To achieve what you want ultimately, you can drop the variable column and rename the value column to Col-D.
newdf = newdf.drop(['variable'], axis=1)
newdf = newdf.rename(columns={"value":"Col-D"})

What about something like this:
df2 = df[["Col-A","Col-B","Col-C","Col-D"]]
columns = ["Col-E","Col-F",...,"Col-Z"]
for col in columns:
df2.append(df[["Col-A","Col-B","Col-C",col]]).reset_index(drop=True)
You just append the columns you want to your original dataframe

Related

Ways to compute values from multiple dataframes and organize it into one consolidated dataframe?

I have several dataframes spanning from 2015/09 - 2021/12 as
and each dataframe looks like this
address balance
0 0xb794f5ea0 7504999.894348815
1 0xab7c74abc 1000000.004137971
2 0xdec042a90 5461102.0
3 0xd24400ae8 352884.012859933
4 0x2910543af 217051.397233717
5 0xcafb10ee6 211851.993504593
6 0x167a9333b 164052.961890484
7 0x32be343b9 113179.682105883
8 0xfbb1b73c4 69408.795824975
9 0x7180eb39a 3012.654675749
10 0x0a869d79a 85.171503551
11 0x61edcdf5b 5.0
12 0x2903cadbe 0.985099383
13 0xdd51f01d9 0.002366924
.. .... ....
I want to calculate, let's say, the sum of all balance per date and consolidate into one dataframe as
date balance_sum
2015-09-31 xx
2015-12-31 xx
2016-03-31 xx
...
2021-12-31 xx
Is there a way to do this operation? Thanks a lot in advance!
Here I'm running a for loop over all the dataframes, appending the sum to a list and then creating a dictionary with date as keys and sum as values, finally I'm converting dict to DataFrame.
balance_sum = []
for i in [exchanges_2015_09, exchanges_2015_09 ....]:
balance_sum.append(i['balance_sum'].sum())
data = dict(zip(pd.date_range('2015-09-30', periods=26, freq='Q'), balance_sum))
df = pd.DataFrame(data)

Filtering, transposing and concatenating with Pandas

I'm trying something i've never done before and i'm in need of some help.
Basically, i need to filter sections of a pandas dataframe, transpose each filtered section and then concatenate every resulting section together.
Here's a representation of my dataframe:
df:
id | text_field | text_value
1 Date 2021-06-23
1 Hour 10:50
2 Position City
2 Position Countryside
3 Date 2021-06-22
3 Hour 10:45
I can then use some filtering method to isolate parts of my data:
df.groupby('id').filter(lambda x: True)
test = df.query(' id == 1 ')
test = test[["text_field","text_value"]]
test_t = test.set_index("text_field").T
test_t:
text_field | Date | Hour
text_value | 2021-06-23 | 10:50
If repeat the process looking for row with id == 3 and then concatenate the result with test_t, i'll have the following:
text_field | Date | Hour
text_value | 2021-06-23 | 10:50
text_value | 2021-06-22 | 10:45
I'm aware that performing this with rows where id == 2 will give me other columns and that's alright too, it's what a i want as well.
What i can't figure out is how to do this for every "id" in my dataframe. I wasn't able to create a function or for loop that works. Can somebody help me?
To summarize:
1 - I need to separate my dataframe in sections according with values from the "id" column
2 - After that i need to remove the "id" column and transpose the result
3 - I need to concatenate every resulting dataframe into one big dataframe
You can use pivot_table:
df.pivot_table(
index='id', columns='text_field', values='text_value', aggfunc='first')
Output:
text_field Date Hour Position
id
1 2021-06-23 10:50 NaN
2 NaN NaN City
3 2021-06-22 10:45 NaN
It's not exactly clear how you want to deal with repeating values though, would be great to have some description of that (id=2 would make a good example)
Update: If you want to ignore the ids and simply concatenate all the values:
pd.DataFrame(df.groupby('text_field')['text_value'].apply(list).to_dict())
Output:
Date Hour Position
0 2021-06-23 10:50 City
1 2021-06-22 10:45 Countryside

how to map two dataframes with pandas [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have two excel files
+ File one contains specific data about different customer (like: Sex, Age, Name...) and
+ File two contains different transactions for each customer
I want to create a new Column in File2 containing specific data to each Costumer from File1
file1.csv
customer_id,sex,age,name
af4wf3,m,12,mike
z20ask,f,15,sam
file2.csv
transaction_id,customer_id,amount
12h2j4hk,af4wf3,123.20
12h2j4h1,af4wf3,5.22
12h2j4h2,z20ask,13.20
12h2j4h3,af4wf3,1.20
12h2j4h4,z20ask,2341.12
12h2j4h5,z20ask,235.96
12h2j4h6,af4wf3,999.30
Load and join the dataframes
import pandas as pd
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
df1.set_index('customer_id', inplace=True)
df2.set_index('transaction_id', inplace=True)
output = df2.join(df1, on='customer_id')
output.to_csv('file2_updated.csv')
file2_updated.csv
transaction_id,customer_id,amount,sex,age,name
12h2j4hk,af4wf3,123.2,m,12,mike
12h2j4h1,af4wf3,5.22,m,12,mike
12h2j4h2,z20ask,13.2,f,15,sam
12h2j4h3,af4wf3,1.2,m,12,mike
12h2j4h4,z20ask,2341.12,f,15,sam
12h2j4h5,z20ask,235.96,f,15,sam
12h2j4h6,af4wf3,999.3,m,12,mike
the same as #jc416 but using pd.merge:
file2.merge(file1, on='customer_id')
transaction_id customer_id amount sex age name
0 12h2j4hk af4wf3 123.2 m 12 mike
1 12h2j4h1 af4wf3 5.22 m 12 mike
2 12h2j4h3 af4wf3 1.2 m 12 mike
3 12h2j4h6 af4wf3 999.3 m 12 mike
4 12h2j4h2 z20ask 13.2 f 15 sam
5 12h2j4h4 z20ask 2341.12 f 15 sam
6 12h2j4h5 z20ask 235.96 f 15 sam
You definitely should read Pandas merging 101

Python Resample and Interpolate within a group

I have a data set which contains samples at the 1 second level from workout data (heart rate, watts, etc.) The data feed is not perfect and sometimes there are gaps. I need to have the dataset at 1 sec intervals with no missing rows.
Once I resample the data it looks along the lines of this:
activity_id watts
t
1 12345 5
2 12345 NaN
3 12345 15
6 98765 NaN
7 98765 10
8 98765 12
After the resample I cant get the interpolate to work properly. The problem is that the interpolation is going across the entire dataframe and I need it to 'reset' for every workout ID within the dataframe. The data should look like this after its working properly:
activity_id watts
t
1 12345 5
2 12345 10
3 12345 15
6 98765 NaN
7 98765 10
8 98765 12
Heres the snippet of code I have tried. It's not throwing any errors but also not doing the interpolation...
seconds = 1
df = df.groupby(['activity_id']).resample(str(seconds) + 'S').mean().reset_index(level='activity_id', drop=True)
df = df.reset_index(drop=False)
df = df.groupby('activity_id').apply(lambda group: group.interpolate(method='linear'))
Marked as correct answer here but not working for me:
Pandas interpolate within a groupby

Use pandas to get county name using fips codes

I have fips codes here: http://www2.census.gov/geo/docs/reference/codes/files/national_county.txt
And a dataset that looks like this:
fips_state fips_county value
1 1 10
1 3 34
1 5 37
1 7 88
1 9 93
How can I get the county name of each row using the data from the link above with pandas?
Simply load both data sets into DataFrames, then set the appropriate index:
df1.set_index(['fips_state', 'fips_county'], inplace=True)
This gives you a MultiIndex by state+county. Once you've done this for both datasets, you can trivially map them, for example:
df1['county_name'] = df2.county_name

Categories

Resources