How to expand column with parentheses in pandas [duplicate]

How to expand column with parentheses in pandas [duplicate] - python

This question already has answers here:
Split a Pandas column of lists into multiple columns
(11 answers)
Closed 1 year ago.
I have 4 Values data as output of a function. Here's my data
Name Grade
usia (75,78,90,52)
shdh (85,68,60,72)
fbjg (95,58,65,66)
Here's what I want
Name Math English Physics Chemistry
usia 75 78 90 52
shdh 85 68 60 72
fbjg 95 58 65 66

Use DataFrame constructor with DataFrame.pop for remove original column Grade:
import ast
#if strings inputs instead tuples
#df['Grade'] = df['Grade'].apply(ast.literal_eval)
cols = ['Math','English','Physics','Chemistry']
df[cols] = pd.DataFrame(df.pop('Grade').tolist(), index=df.index)
print (df)
Name Math English Physics Chemistry
0 usia 75 78 90 52
1 shdh 85 68 60 72
2 fbjg 95 58 65 66

Related

Numpy Vectorized Window Operations

I'm interested in figuring out how to do vectorized computations in a numpy array / pandas dataframe where each new cell is updated with local information.
For example, lets say I'm a weatherman interested in making predictions about the weather. My prediction algorithm will be the mean of the past 3 days. While this prediction is simple, I'd like to be able to do this with an arbitrary function.
Example data:
day temp
1 70
2 72
3 68
4 67
...
After a transformation should become
day temp prediction
1 70 None (no previous data)
2 72 70 (only one data point)
3 68 71 (two data points)
4 67 70
5 70 69
...
I'm only interested in the prediction column, so no need to make an attempt to join the data back together after achieving the prediction! Thanks!

Use rolling with a window of 3 and the min_periods of 1
df['prediction'] = df['temp'].rolling(window = 3, min_periods = 1).mean().shift()
df
day temp prediction
0 1 70 NaN
1 2 72 70
2 3 68 71
3 4 67 70
4 5 70 69

summing two columns in a dataframe

My df looks as follows:
Roll Name Age Physics English Maths
0 A1 Max 16 87 79 90
1 A2 Lisa 15 47 75 60
2 A3 Luna 17 83 49 95
3 A4 Ron 16 86 79 93
4 A5 Silvia 15 57 99 91
I'd like to add the columns Physics, English, and Maths and display the results in a separate column 'Grade'.
I've tried the code:
df['Physics'] + df['English'] + df['Maths']
But it just concatenates. I am not taught about the lambda function yet.
How do I go about this?

df['Grade'] = df['Physics'] + df['English'] + df['Maths']
it concatenates maybe your data is in **String** just convert into float or integer.
Check Data Types First by using df.dtypes

Try:
df["total"] = df[["Physics", "English", "Maths"]].sum(axis=1)
df

Check Below code, Its is possible you columns are in string format, belwo will solve that:
import pandas as pd
df = pd.DataFrame({"Physics":['1','2','3'],"English":['1','2','3'],"Maths":['1','2','3']})
df['Total'] = df['Physics'].astype('int') +df['English'].astype('int') +df['Maths'].astype('int')
df
Output:

Create a pandas dataframe from dictionary whilst maintaining order of columns

When creating a dataframe as below (instructions from here), the order of the columns changes from "Day, Visitors, Bounce Rate" to "Bounce Rate, Day, Visitors"
import pandas as pd
web_stats = {'Day':[1,2,3,4,5,6],
'Visitors':[43,34,65,56,29,76],
'Bounce Rate':[65,67,78,65,45,52]}
df = pd.DataFrame(web_stats)
Gives:
Bounce Rate Day Visitors
0 65 1 43
1 67 2 34
2 78 3 65
3 65 4 56
4 45 5 29
5 52 6 76
How can the order be kept in tact? (i.e. Day, Visitors, Bounce Rate)

One approach is to use columns
Ex:
import pandas as pd
web_stats = {'Day':[1,2,3,4,5,6],
'Visitors':[43,34,65,56,29,76],
'Bounce Rate':[65,67,78,65,45,52]}
df = pd.DataFrame(web_stats, columns = ['Day', 'Visitors', 'Bounce Rate'])
print(df)
Output:
Day Visitors Bounce Rate
0 1 43 65
1 2 34 67
2 3 65 78
3 4 56 65
4 5 29 45
5 6 76 52

Dictionaries are not considered to be ordered in Python <3.7.
You can use collections.OrderedDict instead:
from collections import OrderedDict
web_stats = OrderedDict([('Day', [1,2,3,4,5,6]),
('Visitors', [43,34,65,56,29,76]),
('Bounce Rate', [65,67,78,65,45,52])])
df = pd.DataFrame(web_stats)

If you don't want to write the column names which becomes really inconvenient if you have multiple keys you may use
df = pd.DataFrame(web_stats, columns = web_stats.keys())

I need help building new dataframe from old one, by applying method to each row, keeping same index and columns

I have a dataframe (df_input), and im trying to convert it to another dataframe (df_output), through applying a formula to each element in each row. The formula requires information about the the whole row (min, max, median).
df_input:
A B C D E F G H I J
2011-01-01 60 48 26 29 41 91 93 87 39 65
2011-01-02 88 52 24 99 1 27 12 26 64 87
2011-01-03 13 1 38 60 8 50 59 1 3 76
df_output:
F(A)F(B)F(C)F(D)F(E)F(F)F(G)F(H)F(I)F(J)
2011-01-01 93 54 45 52 8 94 65 37 2 53
2011-01-02 60 44 94 62 78 77 37 97 98 76
2011-01-03 53 58 16 63 60 9 31 44 79 35
Im trying to go from df_input to df_output, as above, after applying f(x) to each cell per row. The function foo is trying to map element x to f(x) by doing an OLS regression of the min, median and max of the row to some co-ordinates. This is done each period.
I'm aware that I iterate over the rows and then for each row apply the function to each element. Where i am struggling is getting the output of foo, into df_output.
for index, row in df_input.iterrows():
min=row.min()
max=row.max()
mean=row.mean()
#apply function to row
new_row = row.apply(lambda x: foo(x,min,max,mean)
#add this to df_output
help!
My current thinking is to build up the new df row by row? I'm trying to do that but im getting a lot of multiindex columns etc. Any pointers would be great.
thanks so much... merry xmas to you all.

Consider calculating row aggregates with DataFrame.* methods and then pass series values in a DataFrame.apply() across columns:
# ROW-WISE AGGREGATES
df['row_min'] = df.min(axis=1)
df['row_max'] = df.max(axis=1)
df['row_mean'] = df.mean(axis=1)
# COLUMN-WISE CALCULATION (DEFAULT axis=0)
new_df = df[list('ABCDEFGHIJ')].apply(lambda col: foo(col,
df['row_min'],
df['row_max'],
df['row_mean']))

Removing rows with null values in any of a subset of columns (pandas) [duplicate]

This question already has answers here:
Remove row with null value from pandas data frame
(5 answers)
Closed 5 years ago.
If I have the the following dataframe. If there is a null in either Participation, Homework, Test, Presentation (if there is a null is any of the four columns), then I want to remove that row. How do I achieve this in Pandas.
Name Participation Homework Test Presentation Attendance
Andrew 92 Null 85 95 88
John 95 88 98 Null 90
Carrie 82 99 96 89 92
Simone 100 91 88 99 90
Here, I would want to remove everyone except for Carrie and Simone from the dataframe. How do I achieve this in pandas?
I found this on Stackoverflow, which I think may help df = df[pd.notnull(df['column_name'])], but is there anyway I can do this for all four columns (so a subset) instead of each column individually?
Thanks!

You can skip the replace if you use ne:
df[df.ne('Null').all(1)]
Name Participation Homework Test Presentation Attendance
2 Carrie 82 99 96 89 92
3 Simone 100 91 88 99 90

Preparation, let's replace that string 'Null' with np.nan first.
Now, let's try this using notnull, all with axis=1:
df[df.replace('Null',np.nan).notnull().all(1)]
Output:
Name Participation Homework Test Presentation Attendance
2 Carrie 82 99 96 89 92
3 Simone 100 91 88 99 90
Or using isnull, any, and ~:
df[~df.replace('Null',np.nan).isnull().any(1)]

replace + dropna
df.replace({'Null':np.nan}).dropna()
Out[504]:
Name Participation Homework Test Presentation Attendance
2 Carrie 82 99 96 89 92
3 Simone 100 91 88 99 90

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to expand column with parentheses in pandas [duplicate] - python

Related

Numpy Vectorized Window Operations

summing two columns in a dataframe

Create a pandas dataframe from dictionary whilst maintaining order of columns

I need help building new dataframe from old one, by applying method to each row, keeping same index and columns

Removing rows with null values in any of a subset of columns (pandas) [duplicate]

Categories

Resources