I am new to pandas and I am trying to convert Time into DateTime format. Unfortunately I get the time with an added date which is not my intention.
My dataFrame is the following:
After running data['Time'] = pd.to_datetime(data['Time'], format = '%H:%M:%S') I get the following:
What am I doing wrong?
Try this:
data = {'time':['05:05:30','06:04:23','03:40:45','12:05:30'], 'value':[2,3,5,7]}
data = pd.DataFrame(data)
data['TIME']=pd.to_datetime(data['time'],format='%H:%M:%S')
you get TIME in the desired format:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 4 non-null object
1 value 4 non-null int64
2 TIME 4 non-null timedelta64[ns]
dtypes: int64(1), object(1), timedelta64[ns](1)
Related
I am trying to convert all the cells value (except date) to float point number, I can successfully convert first 3 column but getting an error on the last one:
Here is my code:
df['Market Cap_'+str(coin)] = df['Market Cap_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Volume_'+str(coin)] = df['Volume_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Open_'+str(coin)] = df['Open_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Close_'+str(coin)] = df['Close_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
Here is df.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 1 to 30
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date_ETHEREUM 30 non-null datetime64[ns]
1 Market Cap_ETHEREUM 30 non-null float64
2 Volume_ETHEREUM 30 non-null float64
3 Open_ETHEREUM 30 non-null float64
4 Close_ETHEREUM 30 non-null object
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.4+ KB
And here is the Error:
AttributeError: Can only use .str accessor with string values!
As you can see the column type is an object, (same as what others were before conversion, but I'm getting an error on this one)
price
quantity
high time
10.4
3
2021-11-08 14:26:00-05:00
dataframe = ddg
the datatype for hightime is datetime64[ns, America/New_York]
i want the high time to be only 14:26:00 (getting rid of 2021-11-08 and -05:00) but i got an error when using the code below
ddg['high_time'] = ddg['high_time'].dt.strftime('%H:%M')
I think because it's not the right column name:
# Your code
>>> ddg['high_time'].dt.strftime('%H:%M')
...
KeyError: 'high_time'
# With right column name
>>> ddg['high time'].dt.strftime('%H:%M')
0 14:26
Name: high time, dtype: object
# My dataframe:
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 1 non-null float64
1 quantity 1 non-null int64
2 high time 1 non-null datetime64[ns, America/New_York]
dtypes: datetime64[ns, America/New_York](1), float64(1), int64(1)
memory usage: 152.0 bytes
I am trying to convert the all the cells value (except date) to float point number, but I'm getting and
error:
Can only use .str accessor with string values!
here is my code:
df['Market Cap_'+str(coin)] = df['Market Cap_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Volume_'+str(coin)] = df['Volume_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Open_'+str(coin)] = df['Open_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Close_'+str(coin)] = df['Close_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
here is the output of df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 1 to 30
Data columns (total 5 columns):
Column Non-Null Count Dtype
0 Date_ETHEREUM 30 non-null datetime64[ns]
1 Market Cap_ETHEREUM 30 non-null float64
2 Volume_ETHEREUM 30 non-null float64
3 Open_ETHEREUM 30 non-null float64
4 Close_ETHEREUM 30 non-null object
dtypes: datetime64ns, float64(3), object(1)
memory usage: 1.4+ KB
here is an image of my dataframe:
Note: Coin is just a string which added dynamically from URL for each particular coin table.
I would appreciate any help or an alternative solution.
You have a $ sign so the value cannot be parsed as a float. Remove it before converting the column to a float type
So I have two spreadsheets in csv format that I've been provided with for my masters uni course.
Part of the processing of the data involved the merging of the files, followed by running some reports off the merged content using dates. this I've completed successfully, however....
The current date format I'm led to believe is epoch so for example the first date on the spreadsheet is 43471
So, firstly I ran this code first to check what format it was looking at
pd.read_csv('bookloans_merged.csv')
df.info()
This returned the result
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1958 entries, 0 to 1957
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Number 1958 non-null int64
1 Title 1958 non-null object
2 Author 1854 non-null object
3 Genre 1958 non-null object
4 SubGenre 1958 non-null object
5 Publisher 1845 non-null object
6 member_number 1958 non-null int64
7 date_of_loan 1958 non-null int64
8 date_of_return 1958 non-null int64
dtypes: int64(4), object(5)
memory usage: 137.8+ KB
I then ran the following code:
# parsing date values
df = pd.read_csv('bookloans_merged.csv')
df[['date_of_loan','date_of_return']] = df[['date_of_loan','date_of_return']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')
df.to_csv('bookloans_merged_dates.csv', index=False)
Running this again:
pd.read_csv('bookloans_merged_dates.csv')
df.info()
I get:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1958 entries, 0 to 1957
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Number 1958 non-null int64
1 Title 1958 non-null object
2 Author 1854 non-null object
3 Genre 1958 non-null object
4 SubGenre 1958 non-null object
5 Publisher 1845 non-null object
6 member_number 1958 non-null int64
7 date_of_loan 1958 non-null datetime64[ns]
8 date_of_return 1958 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(5)
memory usage: 137.8+ KB
So I can see the date_of_loan and date_of_return is now datetime64
trouble is, all the dates are now showing as 1970-01-01 00:00:00.000043471
How do I get to 01/03/2019 format please?
Thanks
David.
So I managed to get this figured out, with a little help. Here is the answer
from datetime import datetime
df1 = pd.DataFrame(data_frame, columns=['Title','Author','date_of_loan'])
df1['date_of_loan'] = pd.to_datetime(df1['date_of_loan'], unit='d', origin=pd.Timestamp('1900-01-01'))
df1.sort_values('date_of_loan', ascending=True)
from datetime import datetime
excel_date = 43139
d_time = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + excel_date - 2)
t_time = d_time.timetuple()
print(d_time)
print(t_time)
So how I was able to use that premise in my program was like this
from datetime import datetime
df1 = pd.DataFrame(data_frame, columns=['Title','Author','date_of_loan'])
df1['date_of_loan'] = pd.to_datetime(df1['date_of_loan'], unit='d', origin=pd.Timestamp('1900-01-01'))
df1.sort_values('date_of_loan', ascending=True)
I am looking to parse data with multiple timezones on a single column. I am using the pd.to_datetime function.
df = pd.DataFrame({'timestamp':['2019-05-21 12:00:00-06:00', '2019-05-21 12:15:00-07:00']})
df['timestamp'] = pd.to_datetime(df.timestamp)
df.info()
This results in:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 timestamp 2 non-null object
dtypes: object(1)
memory usage: 144.0+ bytes
I did some testing and noticed that the same does not happen when the offsets are all the same:
df = pd.DataFrame({'timestamp':['2019-05-21 12:00:00-06:00', '2019-05-21 12:15:00-06:00']})
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 timestamp 2 non-null datetime64[ns, pytz.FixedOffset(-360)]
dtypes: datetime64[ns, pytz.FixedOffset(-360)](1)
memory usage: 144.0 bytes
If this error is confirmed, it will have direct implication over the datetime accessors, but it also breaks some compatibility (or assumed compatibilities) with library that operate conversions on the types. The pd.to_datetime() is successfully able to convert everything to a datetime.datetime, but, libraries like pyarrow will apply a fixed tz offset on the column.
Based on many questions on StackOverflow (ex: Convert pandas column with multiple timezones to single timezone) this was not the behavior of pandas in previous versions.
I am on pandas 1.2.4 (I updated from 1.2.2 that shows the same). Python 3.7.9.
Should I report this as a GitHub issue?
I'd suggest to keep the original timestamp column with the offset (to not lose that info) and work with UTC (utc=True). If you know the time zone that put that offset on the data, you could also tz_convert.
Ex / cleaned-up version of the linked question:
import pandas as pd
# sample data
df = pd.DataFrame({'timestamp':['2019-05-21 12:00:00-06:00',
'2019-02-21 12:15:00-07:00']})
# assuming we know the origin time zone
zone = 'America/Boise'
# skip the .dt.tz_convert(zone) part if you don't have the specific zone
df['datetime'] = pd.to_datetime(df['timestamp'], utc=True).dt.tz_convert(zone)
df
timestamp datetime
0 2019-05-21 12:00:00-06:00 2019-05-21 12:00:00-06:00
1 2019-02-21 12:15:00-07:00 2019-02-21 12:15:00-07:00
df['datetime']
0 2019-05-21 12:00:00-06:00
1 2019-02-21 12:15:00-07:00
Name: datetime, dtype: datetime64[ns, America/Boise]