Extracting hours from a csv with pandas

Extracting hours from a csv with pandas - python

I have a csv that looks like this
time,result
1308959819,1
1379259923,2
1318632821,3
1375216682,2
1335930758,4
times are in unix format. I want to extract the hours from such times and groupby the file with respect to such values.
I tried
times = pd.to_datetime(df.time, unit='s')
or even
times = pd.DataFrame(pd.to_datetime(df.time, unit='s'))
but in both cases I got an error with
times.hour
>>>AttributeError: 'DataFrame' object has no attribute 'hour'

You're getting that error because Series and DataFrames don't have hour attributes. You can access the information you want using the .dt convenience accessor (docs here):
>>> times = pd.to_datetime(df.time, unit='s')
>>> times
0 2011-06-24 23:56:59
1 2013-09-15 15:45:23
2 2011-10-14 22:53:41
3 2013-07-30 20:38:02
4 2012-05-02 03:52:38
Name: time, dtype: datetime64[ns]
>>> times.dt
<pandas.tseries.common.DatetimeProperties object at 0xb5de94c>
>>> times.dt.hour
0 23
1 15
2 22
3 20
4 3
dtype: int64

You can use the builtin datetime class to do this.
import datetime
# your code here
hours = datetime.datetime.fromtimestamp(df.time).hour

Related

Python Pandas: Convert timedelta Value From Subtracting Two Dates Into Integer Datatype (AttributeError)

I have the following data set output (shown below) that was produced by the following code:
df_EVENT5_5['dtin'] = pd.to_datetime(df_EVENT5_5['dtin'])
df_EVENT5_5['age'] = df_EVENT5_5['dtin'].apply(dt.datetime.date) - df_EVENT5_5['dtbuilt'].apply(dt.datetime.date)
id age
1 6252 days, 0:00:00
2 1800 days, 0:00:00
3 5873 days, 0:00:00
In the above data set, after running dtypes on the data frame, age appears to be an object.
I want to convert the "age" column into an integer datatype that only has the value of days. Below is my desired output:
id age
1 6252
2 1800
3 5873
I tried the following code:
df_EVENT5_5['age_no_days'] = df_EVENT5_5['age'].dt.total_seconds()/ (24 * 60 * 60)
Below is the error:
AttributeError: Can only use .dt accessor with datetimelike values

The fact that you are getting an object column suggests to me that there are some values that can't be interpreted as proper timedeltas. If that's the case, I would use pd.to_timedelta with the argument errors='coerce', then call dt.days:
df['age'] = pd.to_timedelta(df['age'],errors='coerce').dt.days
>>> df
id age
0 1 6252
1 2 1800
2 3 5873

How to remove microseconds from timedelta

I have microseconds that I want to essentially truncate from a pandas column. I tried something like analyze_me['how_long_it_took_to_order'] = analyze_me['how_long_it_took_to_order'].apply(lambda x: x.replace(microsecond=0) but to this error came up replace() takes no keyword arguments.
For example: I want 00:19:58.582052 to become 00:19:58 or 00:19:58.58

I think you need to convert your string in to a timedelta with pd.to_timedelta and then take advantage of the excellent dt accessor with the floor method which truncates based on string. Here are the first two rows of your data.
df['how_long_it_took_to_order'] = pd.to_timedelta(df['how_long_it_took_to_order'])
df['how_long_it_took_to_order'].dt.floor('s')
0 00:19:58
1 00:25:09
Can round to the hundredth of a second.
df['how_long_it_took_to_order'].dt.floor('10ms')
0 00:19:58.580000
1 00:25:09.100000
Here I create some a Series of timedeltas and then use the dt accessor with the floor method to truncate down to the nearest microsecond.
d = pd.timedelta_range(0, periods=6, freq='644257us')
s = pd.Series(d)
s
0 00:00:00
1 00:00:00.644257
2 00:00:01.288514
3 00:00:01.932771
4 00:00:02.577028
5 00:00:03.221285
dtype: timedelta64[ns]
Now truncate
s.dt.floor('s')
0 00:00:00
1 00:00:00
2 00:00:01
3 00:00:01
4 00:00:02
5 00:00:03
dtype: timedelta64[ns]
If you want to truncate to the nearest hundredth of a second do this:
s.dt.floor('10ms')
0 00:00:00
1 00:00:00.640000
2 00:00:01.280000
3 00:00:01.930000
4 00:00:02.570000
5 00:00:03.220000
dtype: timedelta64[ns]

your how_long_it_took_to_order column seems to be of string (object) dtype.
So try this:
analyze_me['how_long_it_took_to_order'] = \
analyze_me['how_long_it_took_to_order'].str.split('.').str[0]
or:
analyze_me['how_long_it_took_to_order'] = \
analyze_me['how_long_it_took_to_order'].str.replace('(\.\d{2})\d+', r'\1')
for "centiseconds", like: 00:19:58.58

I needed this for a simple script where I wasn't using Pandas, and came up with a simple hack which should work everywhere.
age = age - timedelta(microseconds=age.microseconds)
where age is my timedelta object.
You can't directly modify the microseconds member of a timedelta object because it's immutable, but of course, you can replace it with another immutable object.

Increment attributes of a datetime Series in pandas

I have a Series containing datetime64[ns] elements called series, and would like to increment the months. I thought the following would work fine, but it doesn't:
series.dt.month += 1
The error is
ValueError: modifications to a property of a datetimelike object are not supported. Change values on the original.
Is there a simple way to achieve this without needing to redefine things?

First, I created timeseries date example:
import datetime
t = [datetime.datetime(2015,4,18,23,33,58),datetime.datetime(2015,4,19,14,32,8),datetime.datetime(2015,4,20,18,42,44),datetime.datetime(2015,4,20,21,41,19)]
import pandas as pd
df = pd.DataFrame(t,columns=['Date'])
Timeseries:
df
Out[]:
Date
0 2015-04-18 23:33:58
1 2015-04-19 14:32:08
2 2015-04-20 18:42:44
3 2015-04-20 21:41:19
Now increment part, you can use offset option.
df['Date']+pd.DateOffset(days=30)
Output:
df['Date']+pd.DateOffset(days=30)
Out[66]:
0 2015-05-18 23:33:58
1 2015-05-19 14:32:08
2 2015-05-20 18:42:44
3 2015-05-20 21:41:19
Name: Date, dtype: datetime64[ns]

Apply where function [SQL like] on datatime Pandas

I have a dataset like:
Date/Time Byte
0 2015-04-02 10:44:31 1
1 2015-04-02 10:44:21 10
2 2015-04-02 11:01:11 2
3 2015-04-02 11:01:21 20
I wish to print all rows related to:
2015-04-02 at 11h
I tried many different solutions but with no results
df is my DataFrame.
For istance to print only flows related to 11 I tried the following:
res = df.loc[df['stamp'].hour == 11]
With error:
AttributeError: 'Series' object has no attribute 'hour'
How can I extract all rows related to a specific hour?
How can I extract all rows related to a specific hour of a specific day?
Thanks, have a good day

use pd.to_datetime() on your timestamps if they are stored as strings.
Then you can do
df[df['a_date_col'].apply(lambda x: x.hour) == 11]
Or you can use the .dt accessor:
df[df['a_date_col'].dt.hour == 11]
http://pandas.pydata.org/pandas-docs/dev/timeseries.html
http://pandas.pydata.org/pandas-docs/dev/basics.html#dt-accessor
Accessing years within a dataframe in Pandas

Data Cleaning Consequence on a Pre-Built Index

Objective:
To create an Index that accommodates a pre-existing set of price data from a csv file. I can build an index using list comprehensions. If it's done in that way, the construction would give me a filtered list of length 86,772--when run over 1/3/2007-8/30/2012 for 42 times (i.e. 10 minute intervals). However, my data of prices coming from the csv is length: 62,034. Observe that the difference in length is due to data cleaning issues.
That said, I am not sure how to overcome the apparent mismatch between the real data and this pre-built (list comp) dataframe.
Attempt:
Am I using the first two lines incorrectly?
data=pd.read_csv('___.csv', parse_dates={'datetime':[0,1]}).set_index('datetime')
dt_index = pd.DatetimeIndex([datetime.combine(i.date,i.time) for i in data.index])
ts = pd.Series(data.prices.values, dt_index)
Questions:
As I understand it, I should use 'combine' since I want the index construction to be completely informed by my csv file. And, 'combine' returns a new datetime object whose date components are equal to the given date object’s, and whose time components are equal to the given time object’s.
When I parse_dates, is it lumping the time and date together and considering it to be a 'date'?
Is there a better way to achieve the stated objective?
Traceback Error:
AttributeError: 'unicode' object has no attribute 'date'

You can write this neatly as follows:
ts = df1.prices
Here's an example:
In [1]: df = pd.read_csv('prices.csv',
parse_dates={'datetime': [0,1]}).set_index('datetime')
In [2]: df # dataframe
Out[2]:
prices duty
datetime
2012-11-12 10:00:00 1 0
2012-12-12 10:00:00 2 0
2012-12-12 10:00:00 3 1
In [3]: df.prices # timeseries
Out[3]:
datetime
2012-11-12 10:00:00 1
2012-12-12 10:00:00 2
2012-12-12 11:00:00 3
Name: prices
In [4]: ts = df.prices
You can groupby date like so (similar to this example from the docs):
In [5]: key = lambda x: x.date()
In [6]: df.groupby(key).sum()
Out[6]:
prices duty
2012-11-12 1 0
2012-12-12 5 1
In [7]: ts.groupby(key).sum()
Out[7]:
2012-11-12 1
2012-12-12 5
Where prices.csv contains:
date,time,prices,duty
11/12/2012,10:00,1,0
12/12/2012,10:00,2,0
12/12/2012,11:00,3,1

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting hours from a csv with pandas - python

You can use the builtin datetime class to do this. import datetime # your code here hours = datetime.datetime.fromtimestamp(df.time).hour

Related

Python Pandas: Convert timedelta Value From Subtracting Two Dates Into Integer Datatype (AttributeError)

How to remove microseconds from timedelta

Increment attributes of a datetime Series in pandas

Apply where function [SQL like] on datatime Pandas

Data Cleaning Consequence on a Pre-Built Index

Categories

Resources