Append dataframe columns in a loop to yield a single dataframe

Append dataframe columns in a loop to yield a single dataframe - python

I wrote code to extract data from a csv and put them into a dataframe and sort them after. The code looks as such:
def highest_value_sorter(value):
sorted_df = df_result[value].astype('float64').sort_values(ascending=False)
sorted_df = sorted_df.head(10).to_frame().reset_index()
return sorted_df
sorted_df = pd.DataFrame(data=[values])
for value in values:
sorted_tmp_df = highest_value_sorter(value)
sorted_tmp_df = sorted_tmp_df.drop(columns=['index'])
sorted_tmp_df in my code yields the following result in a loop:
apples
0 922640.524589
1 862396.590682
2 848624.249550
oranges
0 2.394991e+11
1 1.875155e+11
2 6.409508e+10
bananas
0 1.852440e+08
1 6.143871e+07
2 5.757801e+07
my goal is to get all of these into one dataframe as such:
apples oranges
0 922640.524589 862396.590682
1 862396.590682 5.757801e+07
2 5.757801e+07 922640.524589
So far I've tried .join and .append as such: sorted_df = sorted_df.append(sorted_tmp_df)/sorted_df = sorted_df.join(sorted_tmp_df) and neither seem to work. Any tips would help, thanks!

You can use pandas.concat() to concat list of dataframes on columns with axis set to 1.
dfs = []
for value in values:
sorted_tmp_df = highest_value_sorter(value)
sorted_tmp_df = sorted_tmp_df.drop(columns=['index'])
dfs.append(sorted_tmp_df)
df_ = pd.concat(dfs, axis=1)

Related

Extract values within the quotes signs into two separate columns with python

How can i extract the values within the quotes signs into two separate columns with python. The dataframe is given below:
df = pd.DataFrame(["'FRH02';'29290'", "'FRH01';'29300'", "'FRT02';'29310'", "'FRH03';'29340'",
"'FRH05';'29350'", "'FRG02';'29360'"], columns = ['postcode'])
df
postcode
0 'FRH02';'29290'
1 'FRH01';'29300'
2 'FRT02';'29310'
3 'FRH03';'29340'
4 'FRH05';'29350'
5 'FRG02';'29360'
i would like to get an output like the one below:
postcode1 postcode2
FRH02 29290
FRH01 29300
FRT02 29310
FRH03 29340
FRH05 29350
FRG02 29360
i have tried several str.extract codes but havent been able to figure this out. Thanks in advance.

Finishing Quang Hoang's solution that he left in the comments:
import pandas as pd
df = pd.DataFrame(["'FRH02';'29290'",
"'FRH01';'29300'",
"'FRT02';'29310'",
"'FRH03';'29340'",
"'FRH05';'29350'",
"'FRG02';'29360'"],
columns = ['postcode'])
# Remove the quotes and split the strings, which results in a Series made up of 2-element lists
postcodes = df['postcode'].str.replace("'", "").str.split(';')
# Unpack the transposed postcodes into 2 new columns
df['postcode1'], df['postcode2'] = zip(*postcodes)
# Delete the original column
del df['postcode']
print(df)
Output:
postcode1 postcode2
0 FRH02 29290
1 FRH01 29300
2 FRT02 29310
3 FRH03 29340
4 FRH05 29350
5 FRG02 29360

You can use Series.str.split:
p1 = []
p2 = []
for row in df['postcode'].str.split(';'):
p1.append(row[0])
p2.append(row[1])
df2 = pd.DataFrame()
df2["postcode1"] = p1
df2["postcode2"] = p2

How to Convert a text data into DataFrame

How i can convert the below text data into a pandas DataFrame:
(-9.83334315,-5.92063135,-7.83228037,5.55314146), (-5.53137301,-8.31010785,-3.28062536,-6.86067081),
(-11.49239039,-1.68053601,-4.14773043,-3.54143976), (-22.25802006,-10.12843806,-2.9688831,-2.70574665), (-20.3418791,-9.4157625,-3.348587,-7.65474665)
I want to convert this to Data frame with 4 rows and 5 columns. For example, the first row contains the first element of each parenthesis.
Thanks for your contribution.

Try this:
import pandas as pd
with open("file.txt") as f:
file = f.read()
df = pd.DataFrame([{f"name{id}": val.replace("(", "").replace(")", "") for id, val in enumerate(row.split(",")) if val} for row in file.split()])

import re
import pandas as pd
with open('file.txt') as f:
data = [re.findall(r'([\-\d.]+)',data) for data in f.readlines()]
df = pd.DataFrame(data).T.astype(float)
Output:
0 1 2 3 4
0 -9.833343 -5.531373 -11.492390 -22.258020 -20.341879
1 -5.920631 -8.310108 -1.680536 -10.128438 -9.415762
2 -7.832280 -3.280625 -4.147730 -2.968883 -3.348587
3 5.553141 -6.860671 -3.541440 -2.705747 -7.654747

Your data is basically in tuple of tuples forms, hence you can easily use pass a list of tuples instead of a tuple of tuples and get a DataFrame out of it.
Your Sample Data:
text_data = ((-9.83334315,-5.92063135,-7.83228037,5.55314146),(-5.53137301,-8.31010785,-3.28062536,-6.86067081),(-11.49239039,-1.68053601,-4.14773043,-3.54143976),(-22.25802006,-10.12843806,-2.9688831,-2.70574665),(-20.3418791,-9.4157625,-3.348587,-7.65474665))
Result:
As you see it's default takes up to 6 decimal place while you have 7, hence you can use pd.options.display.float_format and set it accordingly.
pd.options.display.float_format = '{:,.8f}'.format
To get your desired data, you simply use transpose altogether to get the desired result.
pd.DataFrame(list(text_data)).T
0 1 2 3 4
0 -9.83334315 -5.53137301 -11.49239039 -22.25802006 -20.34187910
1 -5.92063135 -8.31010785 -1.68053601 -10.12843806 -9.41576250
2 -7.83228037 -3.28062536 -4.14773043 -2.96888310 -3.34858700
3 5.55314146 -6.86067081 -3.54143976 -2.70574665 -7.65474665
OR
Simply, you can use as below as well, where you can create a DataFrame from a list of simple tuples.
data = (-9.83334315,-5.92063135,-7.83228037,5.55314146),(-5.53137301,-8.31010785,-3.28062536,-6.86067081),(-11.49239039,-1.68053601,-4.14773043,-3.54143976),(-22.25802006,-10.12843806,-2.9688831,-2.70574665),(-20.3418791,-9.4157625,-3.348587,-7.65474665)
# data = [(-9.83334315,-5.92063135,-7.83228037,5.55314146),(-5.53137301,-8.31010785,-3.28062536,-6.86067081),(-11.49239039,-1.68053601,-4.14773043,-3.54143976),(-22.25802006,-10.12843806,-2.9688831,-2.70574665),(-20.3418791,-9.4157625,-3.348587,-7.65474665)]
pd.DataFrame(data).T
0 1 2 3 4
0 -9.83334315 -5.53137301 -11.49239039 -22.25802006 -20.34187910
1 -5.92063135 -8.31010785 -1.68053601 -10.12843806 -9.41576250
2 -7.83228037 -3.28062536 -4.14773043 -2.96888310 -3.34858700
3 5.55314146 -6.86067081 -3.54143976 -2.70574665 -7.65474665

wrap the tuples as a list
data=[(-9.83334315,-5.92063135,-7.83228037,5.55314146),
(-5.53137301,-8.31010785,-3.28062536,-6.86067081),
(-11.49239039,-1.68053601,-4.14773043,-3.54143976),
(-22.25802006,-10.12843806,-2.9688831,-2.70574665),
(-20.3418791,-9.4157625,-3.348587,-7.65474665)]
df=pd.DataFrame(data, columns=['A','B','C','D'])
print(df)
output:
A B C D
0 -9.833343 -5.920631 -7.832280 5.553141
1 -5.531373 -8.310108 -3.280625 -6.860671
2 -11.492390 -1.680536 -4.147730 -3.541440
3 -22.258020 -10.128438 -2.968883 -2.705747
4 -20.341879 -9.415762 -3.348587 -7.654747

How to extract values from a list in Python and put into a dataframe

I have trained a model and have asked the model to produce the coefficients:
modelcoeffs = model.fit(X_train, y_train).coef_
coeffslist = list(modelcoeffs)
which yiels me for example:
print(coeffslist):
[0.17005542 0.72965947 0.6833308 0.02509676]
I am trying to split these 4 coefficients out however they dont seem to be individual elements?
does anyone know how to split these into four numbers?
I am trying to get:
df['1'] = coeffslist[0]
df['2'] = coeffslist[1]
df['3'] = coeffslist[2]
df['4'] = coeffslist[3]
But it gives me NaN in the df. Does anyone have any ideas? thanks!
UPDATE
I am basically trying to get the coeffs to append to a df
print(df)
1 2 3 4
.... ..... ..... .....
0.17005542 0.72965947 0.6833308 0.02509676

This coeffslist doesn't look like a valid Python structure, it's missing commas.
But you might try this:
import pandas as pd
df = pd.DataFrame([0.17005542, 0.72965947, 0.6833308, 0.02509676])
print(df)
Output:
0
0 0.170055
1 0.729659
2 0.683331
3 0.025097
To get the coefs as row try this:
import pandas as pd
df = pd.DataFrame(columns=list("1234"))
df.loc[len(df)] = [0.17005542, 0.72965947, 0.6833308, 0.02509676]
print(df)
Output:
1 2 3 4
0 0.170055 0.729659 0.683331 0.025097
And if you want to add another row (append) of coefs, just do this:
df.loc[1] = [0.17005542, 0.72965947, 0.6833308, 0.02509676]
print(df)
Output:
1 2 3 4
0 0.170055 0.729659 0.683331 0.025097
1 0.170055 0.729659 0.683331 0.025097

you can convert [0.17005542 0.72965947 0.6833308 0.02509676] to a sting, split it on space, convert to float again and then append to a dataframe.
str_list= str(coeffslist[0])
float_list= [float(x) for x in str_list.split()]
df=pd.DataFrame(columns=['1','2','3','4'])
a_series = pd.Series(float_list, index = df.columns)
df = df.append(a_series, ignore_index=True)

groupby and sum two columns and set as one column in pandas

I have the following data frame:
import pandas as pd
data = pd.DataFrame()
data['Home'] = ['A','B','C','D','E','F']
data['HomePoint'] = [3,0,1,1,3,3]
data['Away'] = ['B','C','A','E','D','D']
data['AwayPoint'] = [0,3,1,1,0,0]
i want to groupby the columns ['Home', 'Away'] and change the name as Team. Then i like to sum homepoint and awaypoint as name as Points.
Team Points
A 4
B 0
C 4
D 1
E 4
F 3
How can I do it?
I was trying different approach using the following post:
Link
But I was not able to get the format that I wanted.
Greatly appreciate your advice.
Thanks
Zep.

A simple way is to create two new Series indexed by the teams:
home = pd.Series(data.HomePoint.values, data.Home)
away = pd.Series(data.AwayPoint.values, data.Away)
Then, the result you want is:
home.add(away, fill_value=0).astype(int)
Note that home + away does not work, because team F never played away, so would result in NaN for them. So we use Series.add() with fill_value=0.
A complicated way is to use DataFrame.melt():
goo = data.melt(['HomePoint', 'AwayPoint'], var_name='At', value_name='Team')
goo.HomePoint.where(goo.At == 'Home', goo.AwayPoint).groupby(goo.Team).sum()
Or from the other perspective:
ooze = data.melt(['Home', 'Away'])
ooze.value.groupby(ooze.Home.where(ooze.variable == 'HomePoint', ooze.Away)).sum()

You can concatenate, pairwise, columns of your input dataframe. Then use groupby.sum.
# calculate number of pairs
n = int(len(df.columns)/2)+1)
# create list of pairwise dataframes
df_lst = [data.iloc[:, 2*i:2*(i+1)].set_axis(['Team', 'Points'], axis=1, inplace=False) \
for i in range(n)]
# concatenate list of dataframes
df = pd.concat(df_lst, axis=0)
# perform groupby
res = df.groupby('Team', as_index=False)['Points'].sum()
print(res)
Team Points
0 A 4
1 B 0
2 C 4
3 D 1
4 E 4
5 F 3

pandas dataframe contains list

I have currently run the following script which uses Fuzzylogic to replace some common words from the list. Dataframe df1 contains my default list of possible values. Dataframe df2 is the main dataframe where transformations/changes are undertaken after referring to Dataframe df1. The code is as follows:
df1 = pd.DataFrame(['one','two','three','four','five','tsst'])
df2 = pd.DataFrame({'not_shifted':[np.nan,'one','too','three','fours','five','six',np.nan,'test']})
# Drop nan value
df2=pd.DataFrame(df2['not_shifted'].fillna(value=''))
df2['not_shifted'] = df2['not_shifted'].map(lambda x: difflib.get_close_matches(x, df1[0]))
The problem is the output is a dataframe which contains square brackets. To make matters worse, none of the texts within df2['not_shifted'] are viewable/ recallable:
Out[421]:
not_shifted
0 []
1 [one]
2 [two]
3 [three]
4 [four]
5 [five]
6 []
7 []
8 [tsst]
Please help.

df2.not_shifted.apply(lambda x: x[0] if len(x) != 0 else "") or simply df2.not_shifted.str[0] as solved by #Psidom

def replace_all(eg):
rep = {"[":"",
"]":"",
"u":"",
"}":"",
"'":"",
'"':"",
"frozenset":""}
for i,j in rep.items():
eg = eg.replace(i,j)
return eg
for each in df.columns:
df[each] = df[each].apply(lambda x : replace_all(str(x)))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Append dataframe columns in a loop to yield a single dataframe - python

You can use pandas.concat() to concat list of dataframes on columns with axis set to 1. dfs = [] for value in values: sorted_tmp_df = highest_value_sorter(value) sorted_tmp_df = sorted_tmp_df.drop(columns=['index']) dfs.append(sorted_tmp_df) df_ = pd.concat(dfs, axis=1)

Related

Extract values within the quotes signs into two separate columns with python

How to Convert a text data into DataFrame

How to extract values from a list in Python and put into a dataframe

groupby and sum two columns and set as one column in pandas

pandas dataframe contains list

Categories

Resources