I would like to compare 2 datetime columns in pandas and select those that do not have the same month. I cannot find a good source for working with datetime fields in theis a way and I attempted the standard gdf.loc[(gdf[Field1] != gdf[Field2])] but when I add .month, .dt.month, or pd.to_datime/pd.to_datetimeIndex it givesme a series error has no attribute. The 2 columns are datetime objects.
import geopandas as gpd, pandas as pd, datetime
Field1 = 'TrackStartTime'
Field2 = 'TrackEndTime'
gdf = gpd.read_file('Tracks.gpkg', driver='GPKG', layer='segments')
gdf.loc[~(gdf[Field1] == gdf[Field2])]
print(gdf[[Field1, Field2]])
Related
TimeSeries = pandas.Series(df['time_col'].values.tolist())
pandas.to_timedelta(TimeSeries).mean()
After taking mean() I need to convert it to TimeStamp datatype to add it to DataFrame.
Below lines are not working
pandas.to_timestamp(pandas.to_timedelta(TimeSeries).mean())
pandas.Timestamp(pandas.to_timedelta(TimeSeries).mean())
Thanks in advance,
Ragu
For convert timedelta to timestamp is necessary add some datetime, e.g:
import pandas as pd
out = pd.to_datetime('2000-01-01') + pd.to_timedelta(TimeSeries).mean()
You could use the total_seconds of the mean timedelta and feed it to pd.Timestamp with the correct unit specified:
import pandas as pd
# example Series:
TimeSeriesDelta = pd.Series(pd.to_timedelta(['00:00:02.285932',
'00:00:11.366717',
'00:00:11.367594']))
timestamp = pd.Timestamp(TimeSeriesDelta.mean().total_seconds(), unit='s')
# Timestamp('1970-01-01 00:00:08.340081')
Note that this will add a date, 1970-01-01.
I want to calculate the number of business days between two dates and create a new pandas dataframe column with those days. I also have a holiday calendar and I want to exclude dates in the holiday calendar while making my calculation.
I looked around and I saw the numpy busday_count function as a useful tool for it. The function counts the number of business days between two dates and also allows you to include a holiday calendar.
I also looked around and I saw the holidays package which gives me the holiday dates for different countries. I thought it will be great to add this holiday calendar into the numpy function.
Then I proceeded as follows;
import pandas as pd
import numpy as np
import holidays
from datetime import datetime, timedelta, date
df = {'start' : ['2019-01-02', '2019-02-01'],
'end' : ['2020-01-04', '2020-03-05']
}
df = pd.DataFrame(df)
holidays_country = holidays.CountryHoliday('UnitedKingdom')
start_date = [d.date for d in df['start']]
end_date = [d.date for d in df['end']]
holidays_numpy = holidays_country[start_date:end_date]
df['business_days'] = np.busday_count(begindates = start_date,
enddates = end_date,
holidays=holidays_numpy)
When I run this code, it throws this error TypeError: Cannot convert type '<class 'list'>' to date
When I looked further, I noticed that the start_date and end_date are lists and that might be whey the error was occuring.
I then changed the holidays_numpy variable to holidays_numpy = holidays_country['2019-01-01':'2019-12-31'] and it worked.
However, since my dates are different for each row in my dataframe, is there a way to set the two arguments in my holiday_numpy variable to select corresponding values (just like the zip function) each from start_date and end_date?
I'm also open to alternative ways of solving this problem.
This should work:
import pandas as pd
import numpy as np
import holidays
df = {'start' : ['2019-01-02', '2019-02-01'],
'end' : ['2020-01-04', '2020-03-05']}
df = pd.DataFrame(df)
holidays_country = holidays.CountryHoliday('UK')
def f(x):
return np.busday_count(x[0],x[1],holidays=holidays_country[x[0]:x[1]])
df['business_days'] = df[['start','end']].apply(f,axis=1)
df.head()
I have a DataFrame of dates and would like to filter for a particular date +- some days.
import pandas as pd
import numpy as np
import datetime
dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="D")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])
If I select lets say date 2009-08-03 and a window of 5 days, output would be similar to:
>>>
Power
2010-07-29 713.108020
2010-07-30 1055.109543
2010-07-31 951.159099
2010-08-01 1350.638983
2010-08-02 453.166697
2010-08-03 1066.859386
2010-08-04 1381.900717
2010-08-05 107.489179
2010-08-06 1195.945723
2010-08-07 1209.762910
2010-08-08 349.554492
N.B.: The original problem I am trying to accomplish is under Python: Filter DataFrame in Pandas by hour, day and month grouped by year
The function I created to accomplish this is filterDaysWindow and can be used as follows:
import pandas as pd
import numpy as np
import datetime
dates = pd.date_range(start="08/01/2009",end="08/01/2012",freq="D")
df = pd.DataFrame(np.random.rand(len(dates), 1)*1500, index=dates, columns=['Power'])
def filterDaysWindow(df, date, daysWindow):
"""
Filter a Dataframe by a date within a window of days
#type df: DataFrame
#param df: DataFrame of dates
#type date: datetime.date
#param date: date to focus on
#type daysWindow: int
#param daysWindow: Number of days to perform the days window selection
#rtype: DataFrame
#return: Returns a DataFrame with dates within date+-daysWindow
"""
dateStart = date - datetime.timedelta(days=daysWindow)
dateEnd = date + datetime.timedelta(days=daysWindow)
return df [dateStart:dateEnd]
df_filtered = filterDaysWindow(df, datetime.date(2010,8,3), 5)
print df_filtered
I have a csv with data like this:
[id names timestamp is_valid]
[1 name:surname 2016-06-09 23:29:50.083093 True]
I need to select rows based on this condition: if is_valid is true and if timestamp has passed 24 hours. So it should be True and current time 2016-06-10 23:29:50.083093 to pass the condition.
How can I achieve this? I know how to apply the first condition:
from datetime import datetime, timedelta
import pandas as pd
from dateutil import parser
df=pd.read_csv('acc.csv')
user=(df[df['is_valid']==True])
I can even print timestamp, parse it and compare with datetime.now(). But this is definitely a terrible thing to do.
try this:
from datetime import datetime, timedelta
import pandas as pd
from dateutil import parser
df = pd.read_csv('acc.csv')
tidx = pd.to_datetime(df['timestamp'].values)
past_24 = (pd.datetime.now() - tidx).total_seconds() > 60 * 60 * 24
user = df[df['is_valid'] & past_24]
Using Python how to extract the Date values from a DateTime column?
Like this example using SQL:
SELECT
CONVERT(DATE, GETDATE()) date;
Having a string (datestring in this example) that represents each value of that column, you can use the strptime method of datetime module:
import datetime as dt
datestring = "2016-02-05 00:48:23"
date = dt.datetime.strptime(datestring, "%Y-%m-%d %H:%M:%S").date()
Then you can have access to the day, month and year as follows:
day = date.day
month = date.month
year = date.year
You could use a Pandas DataFrame and read the sql table using the pandas.read_sql() function, given your SQL connection:
import pandas as pd
df = pd.read_sql('select referrer_col, timestamp_col from my_table', your_connection)
Then convert the timestamp column using Series.dt.date
df['date_only'] = df['timestamp_col'].dt.date