I have a dataset with object type, which was imported as a txt file into Jupyter Notebook. But now I am trying to plot some auto-correlation for an individual column and it is not working.
My first attempt was to convert the object columns to float but I get the error message:
could not convert string to float: ?
How do I fix this?
Okay this is my script:
book = pd.read_csv('Book1.csv', parse_dates=True)
t= str(book.Global_active_power)
t
'0 4.216\n1 5.36\n2 5.374\n3 5.388\n4 3.666\n5 3.52\n6 3.702\n7 3.7\n8 3.668\n9 3.662\n10 4.448\n11 5.412\n12 5.224\n13 5.268\n14 4.054\n15 3.384\n16 3.27\n17 3.43\n18 3.266\n19 3.728\n20 5.894\n21 7.706\n22 7.026\n23 5.174\n24 4.474\n25 3.248\n26 3.236\n27 3.228\n28 3.258\n29 3.178\n ... \n1048545 0.324\n1048546 0.324\n1048547 0.324\n1048548 0.322\n1048549 0.322\n1048550 0.322\n1048551 0.324\n1048552 0.324\n1048553 0.326\n1048554 0.326\n1048555 0.324\n1048556 0.324\n1048557 0.322\n1048558 0.322\n1048559 0.324\n1048560 0.322\n1048561 0.322\n1048562 0.324\n1048563 0.388\n1048564 0.424\n1048565 0.42\n1048566 0.418\n1048567 0.418\n1048568 0.42\n1048569 0.422\n1048570 0.426\n1048571 0.424\n1048572 0.422\n1048573 0.422\n1048574 0.422\nName: Global_active_power, Length: 1048575, dtype: object'
I believe the reason is that i have to format my column first for equal number of decimal places and then i can convert to float, but trying to format using this is not working for me
print("{:0<4s}".format(book.Global_active_power))
The column contains a ? entry. Clean this up (along with any other extraneous entries) and you should not see this error.
Related
I am working on a project to look at how much a pitcher's different pitches break each game. I looked here for an earlier error which fixed my error but it gives me some weird numbers. What I mean is like when I print what I hope to be August 3rd,2020 I get 1.5964128e+18. Here's how I got there.
hughes2020=pd.read_csv(r"C:/Users/Stratus/Downloads/Hughes2020Test.csv",parse_dates=['game_date'])
game=hughes2020['game_date'].astype(np.int64)
#Skipping to next part to an example
elif name[i]=="Curveball":
if (c<curve)
xcurve[c]=totalx[i]
ycurve[c]=totaly[i]
cudate[c]=game[i]
c+=1
and when I print the cudate it gives me the large number and I am wondering how I can change it back.
And if I run it as
game=hughes2020['game_date'] #.astype(np.int64)
#Skipping to next part to an example
elif name[i]=="Curveball":
if (c<curve)
xcurve[c]=totalx[i]
ycurve[c]=totaly[i]
cudate[c]=game[i]
c+=1
It gives me an
TypeError: float() argument must be a string or a number, not 'Timestamp'
To convert int to datetime use pd.to_datetime():
df = pd.DataFrame(data=[1.5964128e+18], columns = ['t'])
df['t2'] = pd.to_datetime(df['t'])
t t2
0 1.596413e+18 2020-08-03
However a better solution would be to convert the dates at the time of csv reading (As #sinanspd correctly pointed out). Use parse_dates and other related options in pd.read_csv(). Function manual is here
This question has been asked several times here and I checked most of them, but couldn't figure out how to deal with it.
I read a CSV file and I try to convert its values to float as following:
testdataframe = pd.read_csv(r'H:\myCSVfile.csv')
testdataset = testdataframe.values
testdataset = testdataset.astype('float32')
I get this error: ValueError: could not convert string to float: '2020-08-05 22:45:00'
here is testdataframe:
array([['2020-08-05 22:45:00', 5.670524],
['2020-08-05 23:00:00', 5.6840434],
['2020-08-05 23:15:00', 5.6911097],
['2020-08-05 23:30:00', 5.6869917],
['2020-08-05 23:45:00', 5.6786237],
['2020-08-06 00:00:00', 5.6710806]], dtype=object
Thanks in advance for your help.
As #John Gordon correctly mentioned that it is a date/time string
You should apply astype(float) to numeric columns. However, if you still want to proceed with applying the same, here goes the logic to ignore 'errors'
df=pd.DataFrame({"A":[1.2,'1.2','a'],"B":['2020-10-2 10:00:00','2020-10-2 11:00:00','2020-10-2 12:00:00']})
df.astype(float, errors='ignore')
I don't understand how works the loadtxt methode of numpy. I have read some questions/answers on the website but it's not clear for me.
I have a file 'data.txt' which is :
WEIGHT,DAY
75.1,16/10/2018
75.2,17/10/2018
...
My code is :
def parsetime(v):
print(type(v))
print(v)
return np.datetime64(
datetime.strptime(v, '%d/%m/%Y')
)
data = np.loadtxt('masse.txt',delimiter=',',usecols=(0, 1),converters = {1:parsetime},skiprows=1)
But it doesn't work correctly cause it's giving to the function parsetime a byte and not a string...
<class 'bytes'>
b'16/10/2018'
I just want an np.array which has in first column an integer and the second column a date.
I'am a bit lost.
Thanks a lot by advance,
I read in a csv file.
govselldata = pd.read_csv('govselldata.csv', dtype={'BUS_LOC_ID': str})
#or by this
#govselldata = pd.read_csv('govselldata.csv')
I have values in string format.
govselldata.dtypes
a int64
BUS_LOC_ID object
But they are not like this '255048925478501030', but rather scientific like this 2.55048925478501e+17.
How do i convert it to '255048925478501030'?
Edit: Using float() did not work. This could be due to some white space.
govselldata['BUS_LOC'] = govselldata['BUS_LOC_ID'].map(lambda x: float(x))
ValueError: could not convert string to float:
I have a large csv file contains some bus network information.
The stop code are made of a large number with a certain letter in the end. However, some of them are only numbers. When I read them into pandas, the large numbers become in scientific notion. like
code_o lat_o lon_o code_d
490016444HN 51.56878 0.1811568 490013271R
490013271R 51.57493 0.1781319 490009721A
490009721A 51.57708 0.1769355 490010407C
490010407C 51.57947 0.1775409 490011659G
490011659G 51.5806 0.1831088 490009810M
490009810M 51.57947 0.1848733 490014448S
490014448S 51.57751 0.185111 490001243Y
490001243Y 51.57379 0.1839945 490013654S
490013654S 51.57143 0.184776 490013482E
490013482E 51.57107 0.187039 490015118E
490015118E 51.5724 0.1923417 490011214E
490011214E 51.57362 0.1959939 490006980E
490006980E 51.57433 0.1999537 4.90E+09
4.90E+09 51.57071 0.2087701 490003049E
490003049E 51.5631 0.2146196 490004001A
490004001A 51.56314 0.2165552 490015350F
The type of them are object, however I need them to be a normal number in order to cross join other tables.
Since the column is not an 'int' or 'float', I cannot modify them by a whole column.
Any suggestion?
I attached the file from dropbox
https://www.dropbox.com/s/jhbxsncd97rq1z4/gtfs_OD_links_L.csv?dl=0
IIUC, try forcing object type for the code_d column on import:
import numpy as np
import pandas as pd
df = pd.read_csv('your_original_file.csv', dtype={'code_d': 'object'})
You can then parse that column, discarding the letter at the end and casting the result to integer type:
df['code_d'] = df['code_d'].str[:-1].astype(np.int)
Keep it simple: df=pd.read_csv('myfile.csv',dtype=str) and it will read everything in as strings. Or as was posted earlier by #Alberto to specify that column only just: df=pd.read_csv('myfile.csv',dtype={'code_o':str})