This question already has answers here:
Pandas sum across columns and divide each cell from that value
(5 answers)
Closed 3 years ago.
I want calculate division of each cell by sum of each row. Actually there are many column not only A and B.
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1],
'B':[4,5,6,4,5,6,4]]})
sum_row = data.sum(axis=1)
Here is an example of what I expect.
I think this should do the trick
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1],
'B':[4,5,6,4,5,6,4]})
data['sum_row'] = data.sum(axis=1)
for col in list(data.columns.values):
data[col + ' / Sum_Row'] = [data['A'].iloc[e] / data['sum_row'].iloc[e] for e in range(0, len(data['A']))]
Related
This question already has answers here:
Python dataframe replace last n rows with a list of n elements
(2 answers)
df.append() is not appending to the DataFrame
(2 answers)
Closed 1 year ago.
I'm trying to cast a series of 20 values at the end of a dataframe with more than 20 rows.
The original values are coming from a numpy array 'Y_pred':
[[3495.47227957]
[3493.27865109]
[3491.08502262]
[3488.89139414]
[3486.69776567]
[3484.50413719]
[3482.31050871]
[3480.11688024]
[3477.92325176]
[3475.72962329]
[3473.53599481]
[3471.34236633]
[3469.14873786]
[3466.95510938]
[3464.7614809 ]
[3462.56785243]
[3460.37422395]
[3458.18059548]
[3455.986967 ]
[3453.79333852]]
creating column Y_pred and trying to cast the converted series:
df['Y_pred'] = np.nan
df.Y_pred.iloc[-len(Y_pred):].append(pd.Series({'Y_pred': Y_pred}), ignore_index=True)
result is that all rows are NaN
I tried as well this:
series = pd.Series(Y_pred[:, 0])
df.Y_pred.iloc[-20:].append(series, ignore_index=True)
and
df['Y_pred'].append(Y_pred)
nothing works. How to do it properly?
This question already has answers here:
Extract int from string in Pandas
(8 answers)
Closed 1 year ago.
Below is the dataframe
import pandas as pd
import numpy as np
d = {'col1': ['Get URI||1621992600749||com.particlenews.newsbreak||https://graph.fb.com||2021-05-26 01:30:00||1.3.0-QA-1100||90',
'Get URI||1621992600799||com.particlenews.newsbreak||https://graph.fb.com||2021-05-26 01:30:00||1.3.0-QA-1100||90']}
df = pd.DataFrame(data=d)
and need to extract the "1621992600749" and "1621992600799" values.
i have done it multiple ways , by using the split function
new = df["col1"].str.split("||", n = 1, expand = True)
but doesnt give the expected results, any thoughts will be helpful.
You cna use the extract with regex
df['col1'].str.extract(r'(\d+)')
#output
0
0 1621992600749
1 1621992600799
This question already has answers here:
Why is python pandas dataframe rounding my values?
(5 answers)
Closed 3 years ago.
I'm trying to load and extract data from a CSV with pandas and I'm noticing that it is changing the numbers loaded. How do I prevent this?
I've got a CSV, test.csv:
q,a,b,c,d,e,f
z,0.999211563,0.945548791,0.756781883,0.572315951,1.191243688,0.867855435
Here I load data:
df = pd.read_csv("test.csv")
print(df)
This outputs the following rounded figures:
q a b c d e f
0 z 0.999212 0.945549 0.756782 0.572316 1.191244 0.867855
What I ultimate want to do is access values by position:
print(df_.iloc[0, [1, 2, 3, 4, 5, 6]].tolist())
But this is adding numbers to some of the figures.
[0.999211563, 0.9455487909999999, 0.7567818829999999, 0.572315951, 1.191243688, 0.867855435]
Pandas is altering my data. How can I stop pandas from rounding, and adding numbers to figures?
import pandas as pd
with pd.option_context('display.precision', 10):
df = pd.read_csv("test.csv", float_precision=None)
print(df)
This question already has answers here:
How to iterate over consecutive chunks of Pandas dataframe efficiently
(8 answers)
Closed 3 years ago.
I have a dataframe with 40 rows,
and I want to iterate over it so I will have 4 iteration with 10 rows each, serially.
So group#0 will be rows 0-9 , group#1 will be rows 10-19 and so on.
How can I do it?
2 solutions from this stackoverflow question : How to iterate over consecutive chunks of Pandas dataframe efficiently
I advise you to check the link.
Solution from DSM :
for k,g in df.groupby(np.arange(len(df))//10):
print(k,g)
Solution from Ryan :
def chunker(seq, size):
return (seq[pos:pos + size] for pos in xrange(0, len(seq), size))
for i in chunker(df,5):
print i
import pandas as pd
import numpy as np
df1 = {
'State':['Arizona','Georgia','Newyork','Indiana','Florida'],
'Score1':[4,47,55,74,31]}
df1 = pd.DataFrame(df1,columns=['State','Score1'])
print(df1)
We need to add value (here 430) to the index to generate row number and the result is stored in a new column as shown below.
df1['New_ID'] = df1.index + 430
print(df1)
This question already has answers here:
dropping rows from dataframe based on a "not in" condition [duplicate]
(2 answers)
Closed 4 years ago.
so i can do something like:
data = df[ df['Proposal'] != 'C000' ]
to remove all Proposals with string C000, but how can i do something like:
data = df[ df['Proposal'] not in ['C000','C0001' ]
to remove all proposals that match either C000 or C0001 (etc. etc.)
You can try this,
df = df.drop(df[df['Proposal'].isin(['C000','C0001'])].index)
Or to select the required ones,
df = df[~df['Proposal'].isin(['C000','C0001'])]
import numpy as np
data = df.loc[np.logical_not(df['Proposal'].isin({'C000','C0001'})), :]
# or
data = df.loc[ ~df['Proposal'].isin({'C000','C0001'}) , :]