This question already has answers here:
Why is python pandas dataframe rounding my values?
(5 answers)
Closed 3 years ago.
I'm trying to load and extract data from a CSV with pandas and I'm noticing that it is changing the numbers loaded. How do I prevent this?
I've got a CSV, test.csv:
q,a,b,c,d,e,f
z,0.999211563,0.945548791,0.756781883,0.572315951,1.191243688,0.867855435
Here I load data:
df = pd.read_csv("test.csv")
print(df)
This outputs the following rounded figures:
q a b c d e f
0 z 0.999212 0.945549 0.756782 0.572316 1.191244 0.867855
What I ultimate want to do is access values by position:
print(df_.iloc[0, [1, 2, 3, 4, 5, 6]].tolist())
But this is adding numbers to some of the figures.
[0.999211563, 0.9455487909999999, 0.7567818829999999, 0.572315951, 1.191243688, 0.867855435]
Pandas is altering my data. How can I stop pandas from rounding, and adding numbers to figures?
import pandas as pd
with pd.option_context('display.precision', 10):
df = pd.read_csv("test.csv", float_precision=None)
print(df)
Related
This question already has answers here:
access value from dict stored in python df
(3 answers)
Closed 3 months ago.
Edit: the dummy dataframe is edited
I have a pandas data frame with the below kind of column with 200 rows.
Let's say the name of df is data.
-----------------------------------|
B
-----------------------------------|
{'animal':'cat', 'bird':'peacock'...}
I want to extract the value of animal to a separate column C for all the rows.
I tried the below code but it doesn't work.
data['C'] = data["B"].apply(lambda x: x.split(':')[-2] if ':' in x else x)
Please help.
The dictionary is unpacked with pd.json_normalize
import pandas as pd
data = pd.DataFrame({'B': [{0: {'animal': 'cat', 'bird': 'peacock'}}]})
data['C'] = pd.json_normalize(data['B'])['0.animal']
I'm not totally sure of the structure of your data. Does this look right?
import pandas as pd
import re
df = pd.DataFrame({
"B": ["'animal':'cat'", "'bird':'peacock'"]
})
df["C"] = df.B.apply(lambda x: re.sub(r".*?\:(.*$)", r"\1", x))
This question already has answers here:
Python dataframe replace last n rows with a list of n elements
(2 answers)
df.append() is not appending to the DataFrame
(2 answers)
Closed 1 year ago.
I'm trying to cast a series of 20 values at the end of a dataframe with more than 20 rows.
The original values are coming from a numpy array 'Y_pred':
[[3495.47227957]
[3493.27865109]
[3491.08502262]
[3488.89139414]
[3486.69776567]
[3484.50413719]
[3482.31050871]
[3480.11688024]
[3477.92325176]
[3475.72962329]
[3473.53599481]
[3471.34236633]
[3469.14873786]
[3466.95510938]
[3464.7614809 ]
[3462.56785243]
[3460.37422395]
[3458.18059548]
[3455.986967 ]
[3453.79333852]]
creating column Y_pred and trying to cast the converted series:
df['Y_pred'] = np.nan
df.Y_pred.iloc[-len(Y_pred):].append(pd.Series({'Y_pred': Y_pred}), ignore_index=True)
result is that all rows are NaN
I tried as well this:
series = pd.Series(Y_pred[:, 0])
df.Y_pred.iloc[-20:].append(series, ignore_index=True)
and
df['Y_pred'].append(Y_pred)
nothing works. How to do it properly?
This question already has answers here:
Extract int from string in Pandas
(8 answers)
Closed 1 year ago.
Below is the dataframe
import pandas as pd
import numpy as np
d = {'col1': ['Get URI||1621992600749||com.particlenews.newsbreak||https://graph.fb.com||2021-05-26 01:30:00||1.3.0-QA-1100||90',
'Get URI||1621992600799||com.particlenews.newsbreak||https://graph.fb.com||2021-05-26 01:30:00||1.3.0-QA-1100||90']}
df = pd.DataFrame(data=d)
and need to extract the "1621992600749" and "1621992600799" values.
i have done it multiple ways , by using the split function
new = df["col1"].str.split("||", n = 1, expand = True)
but doesnt give the expected results, any thoughts will be helpful.
You cna use the extract with regex
df['col1'].str.extract(r'(\d+)')
#output
0
0 1621992600749
1 1621992600799
This question already has answers here:
Pandas sum across columns and divide each cell from that value
(5 answers)
Closed 3 years ago.
I want calculate division of each cell by sum of each row. Actually there are many column not only A and B.
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1],
'B':[4,5,6,4,5,6,4]]})
sum_row = data.sum(axis=1)
Here is an example of what I expect.
I think this should do the trick
import pandas as pd
data = pd.DataFrame({'A':[1,2,3,1,2,3,1],
'B':[4,5,6,4,5,6,4]})
data['sum_row'] = data.sum(axis=1)
for col in list(data.columns.values):
data[col + ' / Sum_Row'] = [data['A'].iloc[e] / data['sum_row'].iloc[e] for e in range(0, len(data['A']))]
This question already has answers here:
DataFrame String Manipulation
(3 answers)
Closed 8 years ago.
I have a dataframe which I load from an excel file like this:
df = pd.read_excel(filename, 0, index_col=0, skiprows=0, parse_cols=[0, 8, 9], tz='UTC',
parse_dates=True)
I do some simple changing of the column names just for my own readability:
df.columns = ['Ticker', 'Price']
The data in the ticker column looks like:
AAV.
AAV.
AAV.UN
AAV.UN
I am trying to remove the period from the end of the letters when there is no other letters following it.
I know I could use something like:
df['Ticker'].str.rstrip('.')
But that does not work, is there some other way to do what I need? I think my issue is that method is for a series and not a column of values. I tried apply and could not seem to get that to work either.
Any suggestions?
You use map() and a lambda like this
df['Ticker'] = df['Ticker'].map( lambda x : x[:-1] if x.endswith('.') else x)
Ticker
0 AAV
1 AAV
2 AAV.UN
3 AAV.UN