How to deal with "could not convert string to float" error - python

This question has been asked several times here and I checked most of them, but couldn't figure out how to deal with it.
I read a CSV file and I try to convert its values to float as following:
testdataframe = pd.read_csv(r'H:\myCSVfile.csv')
testdataset = testdataframe.values
testdataset = testdataset.astype('float32')
I get this error: ValueError: could not convert string to float: '2020-08-05 22:45:00'
here is testdataframe:
array([['2020-08-05 22:45:00', 5.670524],
['2020-08-05 23:00:00', 5.6840434],
['2020-08-05 23:15:00', 5.6911097],
['2020-08-05 23:30:00', 5.6869917],
['2020-08-05 23:45:00', 5.6786237],
['2020-08-06 00:00:00', 5.6710806]], dtype=object
Thanks in advance for your help.

As #John Gordon correctly mentioned that it is a date/time string
You should apply astype(float) to numeric columns. However, if you still want to proceed with applying the same, here goes the logic to ignore 'errors'
df=pd.DataFrame({"A":[1.2,'1.2','a'],"B":['2020-10-2 10:00:00','2020-10-2 11:00:00','2020-10-2 12:00:00']})
df.astype(float, errors='ignore')

Related

pandas: saving data frame to dict in correct time format

I want the following output from my pandas dataframe df:
{1622564509268542720: '36.15', 1622564509311439360: '37.83', 1622564509312406784: '38.20', 1622564509357944832: '40.40', 1622564509358921984: '33.46', 1622564509404489472: '38.37', 1622564509405471232: '37.15'}
When i type df.head(3).to_dict(), it outputs in the following format
{'ApparentPower': {Timestamp('2021-06-01 16:21:50.754080768'): 40.83, Timestamp('2021-06-01 16:21:50.755921664'): 106.41, Timestamp('2021-06-01 16:21:50.800695808'): 46.56}}
Whats the easiest way to get it into the format i need above?
Try converting the index to integer:
df.set_index(df.index.astype(int))['ApparentPower'].to_dict()
Output:
{1622564510754080768: 40.83,
1622564510755921664: 106.41,
1622564510800695808: 46.56}

How do I convert from int64 back to timestamp or datetime'?

I am working on a project to look at how much a pitcher's different pitches break each game. I looked here for an earlier error which fixed my error but it gives me some weird numbers. What I mean is like when I print what I hope to be August 3rd,2020 I get 1.5964128e+18. Here's how I got there.
hughes2020=pd.read_csv(r"C:/Users/Stratus/Downloads/Hughes2020Test.csv",parse_dates=['game_date'])
game=hughes2020['game_date'].astype(np.int64)
#Skipping to next part to an example
elif name[i]=="Curveball":
if (c<curve)
xcurve[c]=totalx[i]
ycurve[c]=totaly[i]
cudate[c]=game[i]
c+=1
and when I print the cudate it gives me the large number and I am wondering how I can change it back.
And if I run it as
game=hughes2020['game_date'] #.astype(np.int64)
#Skipping to next part to an example
elif name[i]=="Curveball":
if (c<curve)
xcurve[c]=totalx[i]
ycurve[c]=totaly[i]
cudate[c]=game[i]
c+=1
It gives me an
TypeError: float() argument must be a string or a number, not 'Timestamp'
To convert int to datetime use pd.to_datetime():
df = pd.DataFrame(data=[1.5964128e+18], columns = ['t'])
df['t2'] = pd.to_datetime(df['t'])
t t2
0 1.596413e+18 2020-08-03
However a better solution would be to convert the dates at the time of csv reading (As #sinanspd correctly pointed out). Use parse_dates and other related options in pd.read_csv(). Function manual is here

logger print error: not enough arguments for format string

I'm trying to fix a "logger print error: not enough arguments for format string" cropping up on a jupyter lab report and have tried a few solutions but no joy.
my dataframe looks like this:
df_1 = pd.DataFrame(df, columns = ['col1','col2','col3','col4','col5','col6','col7', 'col8', 'col9', 'col10'])
#I'm applying a % format because I only need last four columns in percentage:
df_1['col7'] = df_1['col7'].apply("{0:.0f}%".format)
df_1['col8'] = df_1['col8'].apply("{0:.0f}%".format)
df_1['col9'] = df_1['col9'].apply("{0:.0f}%".format)
df_1['col10'] = df_1['col10'].apply("{0:.0f}%".format)
I want to maintain the table format/structure so i'm not doing print(df_1) but rather just:
df_1
The above works fine, but I can't seem to get past the "logger print error: not enough arguments for format string" error.
p.s I've also tried using formats like "{:.2%}" or "{0:.0%}" but it turns -3 to -300%
Here is what the columns look like without any format:
Edit: fixed by removing this line from dataframe source query '%Y-%m-%d'
If you are using python 3, this should do it:
from random import randint
df_1['col7'] = df_1['col7'].apply(f"{randint(-3,-301)}%")
df_1['col8'] = df_1['col8'].apply(f"{randint(-3,-301)}%")
df_1['col9'] = df_1['col9'].apply(f"{randint(-3,-301)}%")
df_1['col10'] = df_1['col10'].apply(f"{randint(-3,-301)}%")

Python: numpy loadtxt with integer and datetime

I don't understand how works the loadtxt methode of numpy. I have read some questions/answers on the website but it's not clear for me.
I have a file 'data.txt' which is :
WEIGHT,DAY
75.1,16/10/2018
75.2,17/10/2018
...
My code is :
def parsetime(v):
print(type(v))
print(v)
return np.datetime64(
datetime.strptime(v, '%d/%m/%Y')
)
data = np.loadtxt('masse.txt',delimiter=',',usecols=(0, 1),converters = {1:parsetime},skiprows=1)
But it doesn't work correctly cause it's giving to the function parsetime a byte and not a string...
<class 'bytes'>
b'16/10/2018'
I just want an np.array which has in first column an integer and the second column a date.
I'am a bit lost.
Thanks a lot by advance,

ValueError: could not convert string to float:?

I have a dataset with object type, which was imported as a txt file into Jupyter Notebook. But now I am trying to plot some auto-correlation for an individual column and it is not working.
My first attempt was to convert the object columns to float but I get the error message:
could not convert string to float: ?
How do I fix this?
Okay this is my script:
book = pd.read_csv('Book1.csv', parse_dates=True)
t= str(book.Global_active_power)
t
'0 4.216\n1 5.36\n2 5.374\n3 5.388\n4 3.666\n5 3.52\n6 3.702\n7 3.7\n8 3.668\n9 3.662\n10 4.448\n11 5.412\n12 5.224\n13 5.268\n14 4.054\n15 3.384\n16 3.27\n17 3.43\n18 3.266\n19 3.728\n20 5.894\n21 7.706\n22 7.026\n23 5.174\n24 4.474\n25 3.248\n26 3.236\n27 3.228\n28 3.258\n29 3.178\n ... \n1048545 0.324\n1048546 0.324\n1048547 0.324\n1048548 0.322\n1048549 0.322\n1048550 0.322\n1048551 0.324\n1048552 0.324\n1048553 0.326\n1048554 0.326\n1048555 0.324\n1048556 0.324\n1048557 0.322\n1048558 0.322\n1048559 0.324\n1048560 0.322\n1048561 0.322\n1048562 0.324\n1048563 0.388\n1048564 0.424\n1048565 0.42\n1048566 0.418\n1048567 0.418\n1048568 0.42\n1048569 0.422\n1048570 0.426\n1048571 0.424\n1048572 0.422\n1048573 0.422\n1048574 0.422\nName: Global_active_power, Length: 1048575, dtype: object'
I believe the reason is that i have to format my column first for equal number of decimal places and then i can convert to float, but trying to format using this is not working for me
print("{:0<4s}".format(book.Global_active_power))
The column contains a ? entry. Clean this up (along with any other extraneous entries) and you should not see this error.

Categories

Resources