Python: combining two columns [duplicate] - python

This question already has answers here:
Combine two columns of text in pandas dataframe
(21 answers)
Closed 5 years ago.
I have two columns, one has the year, and another has the month data, and I am trying to make one column from them (containing year and month).
Example:
click_year
-----------
2016
click_month
-----------
11
I want to have
YearMonth
-----------
201611
I tried
date['YearMonth'] = pd.concat((date.click_year, date.click_month))
but it gave me "cannot reindex from a duplicate axis" error.

Bill's answer on the post might be what you are looking for.
import pandas as pd
df = pd.DataFrame({'click_year': ['2014', '2015'], 'click_month': ['10', '11']})
>>> df
click_month click_year
0 10 2014
1 11 2015
df['YearMonth'] = df[['click_year','click_month']].apply(lambda x : '{}{}'.format(x[0],x[1]), axis=1)
>>> df
click_month click_year YearMonth
0 10 2014 201410
1 11 2015 201511

Related

Read columns with brackets [duplicate]

This question already has answers here:
Pandas column access w/column names containing spaces
(6 answers)
Closed last year.
I'm trying to read a column named Goods_Issue_Date_(GID)
How can I read this?
I tried:
Df.Goods_Issue_Date_(GID)
Returns Invalid Syntax
Using the following dataframe as an example
data = [['Carrots', "Tuesday"], ['Apples', "Monday"], ['Pears', "Sunday"]]
df = pd.DataFrame(data, columns = ['Product', 'Goods_Issue_Date_(GID)'])
df.head()
Product Goods_Issue_Date_(GID)
0 Carrots Tuesday
1 Apples Monday
2 Pears Sunday
You can select the Goods_Issue_Date_(GID) column like so
df['Goods_Issue_Date_(GID)']
0 Tuesday
1 Monday
2 Sunday
Name: Goods_Issue_Date_(GID), dtype: object

How to transfer rows to columns in a DataFrama using Python [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 1 year ago.
I need some help
I have the follow CSV file with this Data Frame:
how could I transfer the data of cases in columns week 1, week 2 (...) using Python and Pandas?
It would be something like this:
x = (
df.pivot_table(
index=["city", "population"],
columns="week",
values="cases",
aggfunc="max",
)
.add_prefix("week ")
.reset_index()
.rename_axis("", axis=1)
)
print(x)
Prints:
city population week 1 week 2
0 x 50000 5 10
1 y 88000 2 15

How to set index of pandas dataframe in python? [duplicate]

This question already has answers here:
Dataframe set_index not setting
(2 answers)
Pandas set_index does not set the index
(1 answer)
Closed 3 years ago.
I have tried to choose a column to be an index of a data frame. The examples I've seen so far suggests to use the method set_index(), but it doesn't work in my case. I use python 3.7.0
import pandas as pd
df = pd.DataFrame({'Fruit' : ['Apples','Oranges'],
'Amount': [ 1, 17 ]})
df.set_index('Fruit')
print(df1)
The output that I get is
Fruit Amount
0 'Apples' 1
1 'Oranges' 17
The output I want would be something like
Amount
Fruit
'Apples' 1
'Oranges' 17

subset the dataframe into a new one using copy [duplicate]

This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 4 years ago.
I have a dataframe df
a b c
0 5 6 9
1 6 7 10
2 7 8 11
3 8 9 12
So if I want to select only col a and b and store it in another df I would use something like this
df1 = df[['a','b']]
But I have seen places where people write it this way
df1 = df[['a','b']].copy()
Can anyone let me know what is .copy() because the earlier code works just fine.
For example, if you want to rename a dataframe (example using replace):
df2=df
df2=df2.replace('blah','foo')
Here:
df==df2
Will be:
True
You want it to only do to, df2:
df2=df.copy()
df2=df2.replace('blah','foo')
Then now:
df==df2
Returns:
False

Summing over months with pandas

I know there is a simple implementation to do this but I cannot remember the syntax. Have a simple pandas time series and I want to summarize the data by month. Specifically I want to add data over months and years to get some summary of it. Can write it with slicing, but I remember seeing syntax that does it automatically.
import pandas as pd
df = Series(randn(100), index=pd.date_range('2012-01-01', periods=100))
a Multi-indexed Series with Years and sub endexed to months would be first prize.
Partial Answer:
ds.resample('M', how=sum) # for calendar monthly
ds.resample('A', how=sum) # for calendar yearly
Any idea how to elegantly get to multindexed by year sums?
In [1]: import pandas as pd
from numpy.random import randn
In [2]: df = Series(randn(500), index=pd.date_range('2012-01-01', periods=500))
In [3]: s2 = df.groupby([lambda x: x.year, lambda x: x.month]).sum()
In [4]: s2
Out[4]:
2012 1 3.853775
2 4.259941
3 4.629546
4 -10.812505
5 -16.383818
6 -5.255475
7 5.901344
8 13.375258
9 1.758670
10 6.570200
11 6.299812
12 7.237049
2013 1 -1.331835
2 3.399223
3 2.011031
4 7.905396
5 1.127362
dtype: float64

Categories

Resources