convert series to dataframe and rename - python

I have a series that looks as as below
Col
0.006325 1
0.050226 2
0.056898 2
0.075840 2
0.089026 2
0.099637 1
0.115992 1
0.129045 1
0.148997 1
0.164790 2
0.188730 5
0.207524 3
0.235777 1
I want to create a df that looks like
Col Frequency
0.006325 1
0.050226 2
0.056898 2
0.075840 2
0.089026 2
0.099637 1
I have tried series.reset_index().rename(columns={'col','frequency'}) with no success.

Try to use the name= parameter of Series.reset_index(), as follows:
df = series.reset_index(name='frequency')
Demo
data = {0.006325: 1,
0.050226: 2,
0.056898: 2,
0.07584: 2,
0.089026: 2,
0.099637: 1,
0.115992: 1,
0.129045: 1,
0.148997: 1,
0.16479: 2,
0.18873: 5,
0.207524: 3,
0.235777: 1}
series = pd.Series(data).rename_axis(index='Col')
print(series)
Col
0.006325 1
0.050226 2
0.056898 2
0.075840 2
0.089026 2
0.099637 1
0.115992 1
0.129045 1
0.148997 1
0.164790 2
0.188730 5
0.207524 3
0.235777 1
dtype: int64
df = series.reset_index(name='frequency')
print(df)
Col frequency
0 0.006325 1
1 0.050226 2
2 0.056898 2
3 0.075840 2
4 0.089026 2
5 0.099637 1
6 0.115992 1
7 0.129045 1
8 0.148997 1
9 0.164790 2
10 0.188730 5
11 0.207524 3
12 0.235777 1

I can think of two pretty sensible options.
pd_series = pd.Series(range(5), name='series')
# Option 1
# Rename the series and convert to dataframe
pd_df1 = pd.DataFrame(pd_series.rename('Frequency'))
# Option 2
# Pass the series in a dictionary
# the key in the dictionary will be the column name in dataframe
pd_df2 = pd.DataFrame(data={'Frequency': pd_series})

Related

PANDAS groupby 2 columns with condition

I have a data frame and I need to group by 2 columns and create a new column based on condition.
My data looks like this:
ID
week
day_num
1
1
2
1
1
3
1
2
4
1
2
1
2
1
1
2
2
2
3
1
4
I need to group by the columns ID & week so there's a row for each ID for each week. The groupby is based on condition- if for a certain week an ID has the value 1 in column day_num, the value will be 1 under groupby, otherwise 0. For example, ID 1 has 2 & 3 under both rows so it equals 0 under groupby, for week 2 ID 1 it has a row with value 1, so 1.
The output I need looks like this:
ID
week
day1
1
1
0
1
2
1
2
1
1
2
2
0
3
1
0
I searched and found this code, but it uses count, where I just need to write the value 1 or 0.
df1=df1.groupby('ID','week')['day_num'].apply(lambda x: (x=='1').count())
Is there a way to do this?
Thanks!
You can approach from the other way: check equality against 1 in "day_num" and group that by ID & week. Then aggregate with any to see if there was any 1 in the groups. Lastly convert True/Falses to 1/0 and move groupers to columns.
df["day_num"].eq(1).groupby([df["ID"], df["week"]]).any().astype(int).reset_index()
ID week day_num
0 1 1 0
1 1 2 1
2 2 1 1
3 2 2 0
4 3 1 0
import pandas as pd
src = pd.DataFrame({'ID': [1, 1, 1, 1, 2, 2, 3],
'week': [1, 1, 2, 2, 1, 2, 1],
'day_num': [2, 3, 4, 1, 1, 2, 4],
})
src['day_num'] = (~(src['day_num']-1).astype(bool)).astype(int)
r = src.sort_values(by=['day_num']).drop_duplicates(['ID', 'week'], keep='last').sort_index().reset_index(drop=True)
print(r)
Result
ID week day_num
0 1 1 0
1 1 2 1
2 2 1 1
3 2 2 0
4 3 1 0

Sort 'pandas.core.series.Series' so that largest value is in the centre

I have a Pandas Series that looks like this:
import pandas as pd
x = pd.Series([3, 1, 1])
print(x)
0 3
1 1
2 1
I would like to sort the output so that the largest value is in the center. Like this:
0 1
1 3
2 1
Do you have any ideas on how to do this also for series of different lengths (all of them are sorted with decreasing values). The length of the series will always be odd.
Thank you very much!
Anna
First sort values and then use indexing with join values by concat:
x = pd.Series([6, 4, 4, 2, 2, 1, 1])
x = x.sort_values()
print (pd.concat([x[::2], x[len(x)-2:0:-2]]))
5 1
3 2
1 4
0 6
2 4
4 2
6 1
dtype: int64
x = pd.Series(range(7))
x = x.sort_values()
print (pd.concat([x[::2], x[len(x)-2:0:-2]]))
0 0
2 2
4 4
6 6
5 5
3 3
1 1
dtype: int64

Pandas apply a function over groups with same size response

I am trying to duplicate this result from R in Python. The function I want to apply (np.diff) takes an input and returns an array of the same size. When I try to group I get an output the size of the number of groups, not the number of rows.
Example DataFrame:
df = pd.DataFrame({'sample':[1,1,1,1,1,2,2,2,2,2],'value':[1,2,3,4,5,1,3,2,4,3]})
If I apply diff to it I get close to the result I want, except at the group borders. The (-4) value is a problem.
x = np.diff([df.loc[:,'value']], 1, prepend=0)[0]
df.loc[:,'delta'] = x
sample value delta
0 1 1 1
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
5 2 1 -4
6 2 3 2
7 2 2 -1
8 2 4 2
9 2 3 -1
I think the answer is to use groupby and apply or transform but I cannot figure out the syntax. The closest I can get is:
df.groupby('sample').apply(lambda df: np.diff(df['value'], 1, prepend =0 ))
x
1 [1, 1, 1, 1, 1]
2 [1, 2, -1, 2, -1]
Here is possible use DataFrameGroupBy.diff, replace first missing values to 1 and then values to integers:
df['delta'] = df.groupby('sample')['value'].diff().fillna(1).astype(int)
print (df)
sample value delta
0 1 1 1
1 1 2 1
2 1 3 1
3 1 4 1
4 1 5 1
5 2 1 1
6 2 3 2
7 2 2 -1
8 2 4 2
9 2 3 -1
Your solution is possible change by GroupBy.transform, specify processing column after groupby and remove y column in lambda function:
df['delta'] = df.groupby('sample')['value'].transform(lambda x: np.diff(x, 1, prepend = 0))

How to perform an IF statement for duplicate values within the same column

I have a DataFrame and want to find duplicate values within a column and if found, create a new column add a zero for every duplicate case but leave the original value unchanged.
Original DataFrame:
Code1
1
2
3
4
5
1
2
1
1
New DataFrame:
Code1 Code2
1 1
2 2
3 3
4 4
5 5
6 6
1 10
2 20
1 100
1 1000
6 60
Use groupby and cumcount
df.assign(counts = df.groupby("Code1").cumcount(),
Code2=lambda x:x["Code1"]*10**(x["counts"])
).drop("counts", axis=1)
Code1 Code2
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
5 1 10
6 2 20
7 1 100
8 1 1000
there might be a solution using transform (but I'm just not having time right now to investigate). However, here it's really explicit about what is happening
import pandas as pd
data = [1, 2, 3, 4, 5, 1, 2, 1, 1]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['Code1'])
code2 = []
x = {}
for d in data:
if d not in x:
x[d] = d
else:
x[d] = x[d] * 10
code2.append(x[d])
df['Code2'] = code2
print(df)

How to filter groupby for first N items

In Pandas, how can I modify groupby to only take the first N items in the group?
Example
df = pd.DataFrame({'id': [1, 1, 1, 2, 2, 2, 2],
'values': [1, 2, 3, 4, 5, 6, 7]})
>>> df
id values
0 1 1
1 1 2
2 1 3
3 2 4
4 2 5
5 2 6
6 2 7
Desired functionality
# This doesn't work, but I am trying to return the first two items per group.
>>> df.groupby('id').first(2)
id values
0 1 1
1 1 2
3 2 4
4 2 5
What I've tried
I can perform a groupby and iterate through the groups to take the index of the first n values, but there must be a simpler solution.
n = 2 # First two rows.
idx = [i for group in df.groupby('id').groups.itervalues() for i in group[:n]]
>>> df.ix[idx]
id values
0 1 1
1 1 2
3 2 4
4 2 5
You can use head:
In [11]: df.groupby("id").head(2)
Out[11]:
id values
0 1 1
1 1 2
3 2 4
4 2 5
Note: In older versions this used to be equivalent to .apply(pd.DataFrame.head) but it's more efficient since 0.15 (?), now it uses cumcount under the hood.

Categories

Resources