This question already has answers here:
Set value for particular cell in pandas DataFrame using index
(23 answers)
Closed 2 years ago.
I am trying to change a number in the df but the Pandas converts it to a floor number.
A B
0 1 4
1 2 5
2 3 6
I change a number:
df['B'][1] = 1.2
it gives:
A B
0 1 4
1 2 1
2 3 6
instead of:
A B
0 1 4
1 2 1.2
2 3 6
Pandas has some rather complex view/copy behavior. Your syntax assigns a new value to a copy of the data, leaving the original unchanged. You can update the value in place via:
df.loc[1, "B"] = 1.2
result:
A B
0 1 4.0
1 2 1.2
2 3 6.0
Related
This question already has an answer here:
Pandas long to wide (unmelt or similar?) [duplicate]
(1 answer)
Closed 2 months ago.
I have a python dataframe with a few columns, let's say that it looks like this:
Heading 1
Values
A
1
A
2
B
9
B
8
B
6
What I want to is to "pivot" or group the table so it would look something like:
Heading 1
Value 1
Value 2
Value 3
A
1
2
B
9
8
6
I was trying to group the table or pivot/unpivot it by several ways, but i cannot figure out how to do it properly.
You can derive a new column that will hold a row number (so to speak) for each partition of heading 1.
df = pd.DataFrame({"heading 1":['A','A','B','B','B'], "Values":[1,2,9,8,6]})
df['rn'] = df.groupby(['heading 1']).cumcount() + 1
heading 1 Values rn
0 A 1 1
1 A 2 2
2 B 9 1
3 B 8 2
4 B 6 3
Then you can pivot, using the newly derived column as your columns argument:
df = df.pivot(index='heading 1', columns='rn', values='Values').reset_index()
rn heading 1 1 2 3
0 A 1.0 2.0 NaN
1 B 9.0 8.0 6.0
This question already has answers here:
Count number of values in an entire DataFrame
(3 answers)
Closed 1 year ago.
I have a DataFrame which I want to find value frequencies through all the frame.
a b
0 5 7
1 7 8
2 5 7
The result should be like:
5 2
7 3
8 1
Use DataFrame.stack with Series.value_counts and Series.sort_index:
s = df.stack().value_counts().sort_index()
Or DataFrame.melt:
s = df.melt()['value'].value_counts().sort_index()
print (s)
5 2
7 3
8 1
Name: value, dtype: int64
a simple way is to use pd.Series for finding the unique count:
import pandas as pd
# creating the series
s = pd.Series(data = [5,10,9,8,8,4,5,9,10,0,1])
# finding the unique count
print(s.value_counts())
output:
10 2
9 2
8 2
5 2
4 1
1 1
0 1
This question already has an answer here:
Cumsum as a new column in an existing Pandas data
(1 answer)
Closed 2 years ago.
For example.
Let us assume we are having below dataframe:
Num
0 2
1 4
2 1
3 5
4 3
The expected output in another "sum" should be as below:
Num sum
0 2 2
1 4 6 (2+4)
2 1 7 (2+4+1)
3 5 12 (2+4+1+5)
4 3 15 (2+4+1+5+3)
This can be achieved using cumsum:
df['sum'] = df['Num'].cumsum()
This question already has answers here:
How are iloc and loc different?
(6 answers)
Selection with .loc in python
(5 answers)
Closed 4 years ago.
If I have a pandas data frame like this:
A B C D E
1 3 4 2 5 1
2 5 4 2 4 4
3 5 1 8 1 3
4 1 1 9 9 4
5 3 6 4 1 1
and want to find a value with a row value of 3 and column value of D how do I go about doing it?
In this case, I had a row value of 3 and column value of D how would I get a return of 1 in this instance?
Or if I had a row value of 2 and column value of B how would I get a return of 4?
You can use DataFrame.loc: df.loc[row, 'col_name'], eg, df.loc[2, 'B'] for 4
This question already has answers here:
cartesian product in pandas
(13 answers)
Closed 4 years ago.
For example, the data is:
a=pd.DataFrame({'aa':[1,2,3]})
b=pd.DataFrame({'bb':[4,5]})
what I want is to union these two data frames so the new frame is :
aa bb
1 4
1 5
2 4
2 5
3 4
3 5
You can see that every value in a is linked to all the values in b in the new frame. I probably can use tile or repeat to do this. But I have multiple frames which need to be done repeatedly. So I want to know if there is a better way?
Could anyone help me out here?
You can do it like this:
In [24]: a['key'] = 1
In [25]: b['key'] = 1
In [27]: pd.merge(a, b, on='key').drop('key', axis=1)
Out[27]:
aa bb
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5
you can use pd.MultiIndex.from_product and then reset_index. It is generating all the combinations between both set of data (the same idea than itertools.product)
df_outut = (pd.DataFrame(index=pd.MultiIndex.from_product([a.aa,b.bb],names=['aa','bb']))
.reset_index())
and you get
aa bb
0 1 4
1 1 5
2 2 4
3 2 5
4 3 4
5 3 5