I would like to be able to pass in text from a user or file to filter pandas, which seems like query is the best way to handle it. However, I have a datetime index and can't seem to figure out a way to use timedeltas. I know I can filter dates with > or < like
query_string = 'index < datetime.datetime(2020, 2, 20, 11, 8, 19, 615268)'
df.query(queryString)
and
date = datetime.datetime.now()
query_string = 'index < #date'
df.query(queryString)
What I want to do is get a relative date range like getting the last 10 seconds of entries
date = datetime.now()
query_string = 'index > #date - datetime.timedelta(seconds=10)'
df.query(query_string)
This fails, and I can't seem to find a way to do something like filter anything relative to a timestamp. Is there any other way to format it so I can add/subtract a time from a date using df.query()?
Query does not support timedelta (checked in version Pandas 1.0.5). You can work around it using:
df[df['index'] >= datetime.now() - datetime.timedelta(10, 'S')]
Related
Okay so I am relatively new to programming and this has me absolutely stumped. Im scraping data from a website and the data changes every week. I want to run my scraping process each time the data changes starting back on 09-09-2015 and running to current.
I know how to do this easily running thru every number like 0909 then 0910 then 0911 but that is not what I need as that will be requesting way too many requests from the server that are pointless.
Here is the format of the URL
http://www.myexamplesite.com/?date=09092015
I know the simple:
for i in range(startDate, endDate):
url = 'http://www.myexamplesite.com/?date={}'.format(i)
driver.get(url)
But one thing i've never been able to figure out is manipulate pythons dateTime to accurately reflect the format the website uses.
i.e:
09092015
09162015
09232015
09302015
10072015
...
09272017
If all else fails I only need to do this once so it wouldnt take too long to just ignore the loop altogether and just manually enter the date I wish to scrape from and then just append all of my dataframes together. Im mainly curious on how to manipulate the datetime function in this sense for future projects that may require more data.
A good place to start are datetime, date and timedelta objects docs.
First, let's construct our starting date and ending date (today):
>>> from datetime import date, timedelta
>>> start = date(2015, 9, 9)
>>> end = date.today()
>>> start, end
(datetime.date(2015, 9, 9), datetime.date(2017, 9, 27))
Now let's define the unit of increment -- one day:
>>> day = timedelta(days=1)
>>> day
datetime.timedelta(1)
A nice thing about dates (date/datetime) and time deltas (timedelta) is they and can be added:
>>> start + day
datetime.date(2015, 9, 10)
We can also use format() to get that date in a human-readable form:
>>> "{date.day:02}{date.month:02}{date.year}".format(date=start+day)
'10092015'
So, when we put all this together:
from datetime import date, timedelta
start = date(2015, 9, 9)
end = date.today()
week = timedelta(days=7)
mydate = start
while mydate < end:
print("{date.day:02}{date.month:02}{date.year}".format(date=mydate))
mydate += week
we get a simple iteration over dates starting with 2015-09-09 and ending with today, incremented by 7 days (a week):
09092015
16092015
23092015
30092015
07102015
...
Take a look here
https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
You can see the table pictured here for formatting dates and times and the usage.
Of course, if the format of the dates changes in the future or you are parsing different strings, you will have to make code changes. There really is no way around that.
I have a dictionary with many sorted dates. How could I write a loop in Python that checks if a certain date is in the dictionary and if not, it returns the closest date available? I want it to work that if after subtracting one day to the date, it checks again if now it exists in the dictionary and if not, it subtracts again until it finds a existing date.
Thanks in advance
from datetime import timedelta
def function(date):
if date not in dictio:
date -= timedelta(days=1)
return date
I've made a recursive function to solve your problem:
import datetime
def find_date(date, date_dict):
if date not in date_dict.keys():
return find_date(date-datetime.timedelta(days=1), date_dict)
else:
return date
I don't know what is the content of your dictionary but the following example should show you how this works:
import numpy as np
# creates a casual dates dictionary
months = np.random.randint(3,5,20)
days = np.random.randint(1,30,20)
dates = {
datetime.date(2019,m,d): '{}_{:02}_{:02}'.format(2019,m,d)
for m,d in zip(months,days)}
# select the date to find
target_date = datetime.date(2019, np.random.randint(3,5), np.random.randint(1,30))
# print the result
print("The date I wanted: {}".format(target_date))
print("The date I got: {}".format(find_date(target_date, dates)))
What you are looking for is possibly a while loop, although beware because if it will not find the date it will run to infinite. Perhaps you want to define a limit of attempts until the script should give up?
from datetime import timedelta, date
d1 = {
date(2019, 4, 1): None
}
def function(date, dictio):
while date not in dictio:
date -= timedelta(days=1)
return date
res_date = function(date.today(), d1)
print(res_date)
I'm using Pandas to generate a list of dates and times within a specified range to get query an API. My aim is to query weeks or months on per-hour basis.
time_range = pd.date_range('20180601T07:00:0000', '20180701T07:00:0000', freq='H')
time_range = time_range.strftime("%Y%m%d"+'T%H:00-0000')
yields a list of times in the desired list format. Where I'm encountering difficulty is that the URL is formatted...
startdatetime=20180601T07:00-0000&enddatetime=20180601T08:00-0000
I understand I need to start with values 0 & 1 from the Pandas list but I don't how to cycle through. Should I be thinking of a dictionary like...
{date1:[hour1, hour2, etc...], date2:[hour1, hour2, etc...], ...}
and use a .format where startdatetime={1}&enddatetime={2} ?
or should it be more like a for loop...
for date in date_range:
url = 'http://somename?startdatetime={date}&enddatetime{date2}'
urldate = url.format(date=date)
urldate2 = url.format(date2=date + 1)
Any help is appreciated!
If I understand clearly, you want to iterate from a starting date/time (2018-06-01 07:00) to an ending date/time (2018-07-01 07:00) with a step of one hour. And produce an URL with date/time intervals of one hour.
I don’t know why you use Panda for that when you can do it with the standard library, like that:
import datetime
start = datetime.datetime(2018, 6, 1, 7)
end = datetime.datetime(2018, 7, 1, 7)
delta = datetime.timedelta(hours=1)
fmt = 'http://somename?startdatetime={date1:%Y%m%d%H:00-0000}&enddatetime{date2:%Y%m%d%H:00-0000}'
while start < end:
date1 = start
date2 = start + delta
url = fmt.format(date1=date1, date2=date2)
print(url)
start = date2
You get:
http://somename?startdatetime=2018060107:00-0000&enddatetime2018060108:00-0000
http://somename?startdatetime=2018060108:00-0000&enddatetime2018060109:00-0000
http://somename?startdatetime=2018060109:00-0000&enddatetime2018060110:00-0000
http://somename?startdatetime=2018060110:00-0000&enddatetime2018060111:00-0000
...
In the loop, I work with date instances. I use a format string, like “{date2:%Y%m%d%H:00-0000}” to format the date and time in the required format.
Notice that the date_range() function is easy to implement with the standard library:
def date_range(start, end, delta):
while start < end:
yield start
start = start + delta
To get the list of dates with an interval of one hour, you can do:
dates = list(date_range(
datetime.datetime(2018, 6, 1, 7),
datetime.datetime(2018, 7, 1, 7),
datetime.timedelta(hours=1)))
Then, the solution becomes:
fmt = 'http://somename?startdatetime={date1:%Y%m%d%H:00-0000}&enddatetime{date2:%Y%m%d%H:00-0000}'
for date1, date2 in zip(dates[:-1], dates[1:]):
url = fmt.format(date1=date1, date2=date2)
print(url)
The trick is to use the zip() function with the list of dates shifted with one item to get the couples of dates.
I'm trying to filter my query by a datetime. This datetime is the datetime for the value range the customer wants to know information for. I'm trying to set it to the first of the month selected by the customer. I pass the month number convert it to the correct string format and then convert to a datetime object because simply looking for the string object was returning no values and Django's documentation says you need to do it like:
pub_date__gte=datetime(2005, 1, 30)
Code:
if 'billing-report' in request.POST:
customer_id = int(post_data['selected_customer'])
This is the code I use to get the selected customer date and turn it into a tupple
if 'billing-report' in request.POST:
customer_id = int(post_data['selected_customer'])
selected_date = int(post_data['month'])
if selected_date < 10:
selected_date = '0'+str(selected_date)
year = datetime.now()
year = year.year
query_date = str(year) + '-' + str(selected_date) + '-01'
query_date_filter = datetime.strptime(query_date, "%Y-%m-%d")
compute_usages = ComputeUsages.objects.filter(customer_id = customer_id).filter(values_date = query_date_filter)
django debug shows: datetime.datetime(2014, 10, 1, 0, 0)
query_date looks like: 2014-07-01 before it is converted
.
No error but no data is returned
I used to use:
compute_usages = ComputeUsages.objects.filter(customer_id = customer_id).filter(values_date = datetime(query_date_filter))
which was causing the error. I'm sorry for changing my question as it evolved that is why I'm re-including what I was doing before so the comments make sense.
Almost all of that code is irrelevant to your question.
I don't understand why you are calling datetime on query_date. That is already a datetime, as you know because you converted it to one with strptime earlier. So there's no need for any more conversion:
ComputeUsages.objects.filter(customer_id=customer_id).filter(values_date=query_date)
Well after spending sometime exploring setting the query filter to datetime(year, month, day) I came to the realization that django doesn't convert it to a neutral datetime format it has to match exactly. Also my data in the database had the year, day, month.
Learning point:
You have to use the datetime() exactly how it is in the database django does not convert to a neutral format and compare. I assumed it was like writing a query and saying to_date or to_timestamp where the db will take your format and convert it to a neutral format to compare against the rest of the db.
Here is the correct way
compute_usages = ComputeUsages.objects.filter(customer_id = customer_id).filter(values_date = datetime(year, day, selected_month))
Right now I am using the following functions to calculate a date and time int like this (ymd), (hms). I believe it is easier to do this for comparison.
def getDayAsInt():
time = datetime.datetime.now()
year = time.strftime("%Y")
month=makeTimeTwoDigit(time.strftime("%m"))
day=makeTimeTwoDigit(time.strftime("%d"))
return year+month+day
def getTimeOfDay():
day=makeTimeTwoDigit(time.strftime("%d"))
hour=makeTimeTwoDigit(time.strftime("%H"))
minute=makeTimeTwoDigit(time.strftime("%M"))
second=makeTimeTwoDigit(time.strftime("%S"))
return hour+minute+second
I initially tried something like this:
'date': str(datetime.now()),
However I ran into an issue of easier generating a date range to query it. For example if today is 20140616 I can simply query dates between 20140601 and 20140616 where as generating all of the possible date times is harder. Does that make sense?
Ex I want to find out events that happened today but having a date time string stored in dynamodb is harder (more things to match to) to match.
I'm wondering if there is an easier or more efficient way? Is breaking the date and time down like that done? Should I take this:
year = time.strftime("%Y")
month=makeTimeTwoDigit(time.strftime("%m"))
day=makeTimeTwoDigit(time.strftime("%d"))
And do it inn one line? Like should I do time.strftime("%Y%m%d")?
If you are doing the comparisons in python, an easier solution would be to use builtin datetime objects and the normal comparison operators, like < and >.
from datetime import datetime
dt_object = datetime.strptime('Jun 1 2005 1:33PM', '%b %d %Y %I:%M%p')
if datetime(2006, 6, 5, 0, 0, 0) <= dt_object < datetime(2006, 6, 6, 0, 0, 0):
# do something when date is anytime on June 5th, 2006
If you must do the comparison in the query, you can use regular string comparison as long as your dates are stored in ISO-8601 format. The advantage of ISO-8601 is that chronological sorting is equivalent to lexographic sorting, i.e. you can treat them as normal strings.
The equivalent comparison using ISO-8601 format:
'2006-06-05T00:00:00Z' <= dt < '2006-06-06T00:00:00Z'
I thinking breaking the day (year/month/date) from time (hour/minute/second) is the cleanest solution for you since you want to do query on day.