How to transfer rows to columns in a DataFrama using Python [duplicate] - python

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I need some help
I have the follow CSV file with this Data Frame:
how could I transfer the data of cases in columns week 1, week 2 (...) using Python and Pandas?
It would be something like this:

x = (
df.pivot_table(
index=["city", "population"],
columns="week",
values="cases",
aggfunc="max",
)
.add_prefix("week ")
.reset_index()
.rename_axis("", axis=1)
)
print(x)
Prints:
city population week 1 week 2
0 x 50000 5 10
1 y 88000 2 15

Related

Select rows with conditions based on two columns(Start date and end date) [duplicate]

This question already has answers here:
pandas: multiple conditions while indexing data frame - unexpected behavior
(5 answers)
Pandas slicing/selecting with multiple conditions with or statement
(1 answer)
Closed 2 years ago.
I have a dataframe which looks like this:
id start_date end_date
0 1 2017/06/01 2021/05/31
1 2 2018/10/01 2022/09/30
2 3 2015/01/01 2019/02/28
3 4 2017/11/01 2021/10/31
Can anyone tell me how i will slice the rows only for the start date which is 2017/06/01 and end date which is 2021/10/31 only.

How to concat hour with date in python [duplicate]

This question already has an answer here:
Python: Adding hours to pandas timestamp
(1 answer)
Closed 3 years ago.
I have a pandas dataframe where date and hour is in two different columns as shown below -
I want to concat these two columns to have a new datatime column where I can apply pandas window/shift functions. Please share your views.
date hour
0 20190409 0
1 20190409 0
2 20190409 0
3 20190409 0
4 20190409 0
Use pandas.to_datetime and pd.to_timedelta and add them together:
df['datetime'] = pd.to_datetime(df['date'], format='%Y%m%d') + pd.to_timedelta(df['hour'], unit='H')

How do I clean phone numbers in pandas [duplicate]

This question already has answers here:
How to only do string manupilation on column of pandas that have 4 digits or less?
(3 answers)
Closed 3 years ago.
I have a pandas dataframe with a column for Phone however, the data is a bit inconsistent. Here are some examples that I would like to focus on.
df["Phone"]
0 732009852
1 738073222
2 755920306
3 0755353288
Row 3 has the necessary leading 0 for an Australian number. How do I update rows like 0,1 and 2?
Use pandas.Series.str.zfill:
s = pd.Series(['732009852', '0755353288'])
s.str.zfill(10)
Output:
0 0732009852
1 0755353288
Or pd.Series.str.rjust:
print(df["Phone"].str.rjust(10, '0'))
Output:
0 0732009852
1 0738073222
2 0755920306
3 0755353288

Pandas DataFrame GroupBy sum/count to new DataFrame [duplicate]

This question already has answers here:
Specifying column order following groupby aggregation
(2 answers)
Closed 5 years ago.
My DataFrame is
State|City|Year|Budget|Income
S1|C1|2000|1000|1
S1|C2|2000|1200|2
S2|C3|2000|5500|3
I need to get a new DataFrame with columns:
State, Year, Count, Sum_Budget, Sum_Income:
That is,
State|Year|Count|Sum_Budget|Sum_Income
S1|2000|2|2200|3
S2|2000|1|5500|3
In C# the code would be:
dataframe
.GroupBy(x => new { x.State, x.City})
.Select(x => new {
x.Key.State,
x.Key.City,
Count = x.Count(),
Sum_Budget = x.Sum(y => y.Budget),
Sum_Income= x.Sum(y => y.Income)
}
}).ToArray();
How do I do so with Pandas?
Use agg:
d = {'Income':'Sum_Income','Budget':'Sum_Budget','City':'Count'}
agg_d = {'Budget':'sum', 'Income':'sum', 'City':'size'}
df = df.groupby(['State', 'Year'], as_index=False).agg(agg_d).rename(columns=d)
print (df)
State Year Sum_Income Sum_Budget Count
0 S1 2000 3 2200 2
1 S2 2000 3 5500 1

Python: combining two columns [duplicate]

This question already has answers here:
Combine two columns of text in pandas dataframe
(21 answers)
Closed 5 years ago.
I have two columns, one has the year, and another has the month data, and I am trying to make one column from them (containing year and month).
Example:
click_year
-----------
2016
click_month
-----------
11
I want to have
YearMonth
-----------
201611
I tried
date['YearMonth'] = pd.concat((date.click_year, date.click_month))
but it gave me "cannot reindex from a duplicate axis" error.
Bill's answer on the post might be what you are looking for.
import pandas as pd
df = pd.DataFrame({'click_year': ['2014', '2015'], 'click_month': ['10', '11']})
>>> df
click_month click_year
0 10 2014
1 11 2015
df['YearMonth'] = df[['click_year','click_month']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
>>> df
click_month click_year YearMonth
0 10 2014 201410
1 11 2015 201511

Categories

Resources