SQLite query not functioning properly - python

I have made every attempt that I know of to make this work, but at this point I think I am just running in circles.
I am taking user input and using that to query a database. The caveat is that there are dates within the database that need to have days added to them, and to make sure that the user is seeing all the UPDATED information between the dates they chose, I changed the user's start date so that it includes two months beforehand.
At this point, the information is passed into a dataframe where it is then filtered to contain only relevant information as well as adjusting the dates that need to be adjusted. After that, it's passed through a mask on the dataframe to make sure that the user is seeing the updated information only, and not dates that are outside of their chosen range that originally weren't.
There were a few points throughout this process that my code was running properly, but I kept realizing there were changes that needed to be made. As to be expected, those changes caused my code to break and I've not been able to figure out how to fix it.
One issue is that the SQL queries are not returning the correct information. It seems that the chosen start date will allow any entries past that date, but the chosen end date will only include database entries if the end date is very near to the highest date in the database. The problem with that is that the user may not always know what the highest value in the database is, therefore they need to be able to choose an arbitrary value to query by.
There is an also an issue where it seems the query only wants to work some of the time. On two separate instances I ran the same exact queries and it only worked one time and not the other.
Here is my code:
self.StartDate = (self.StartMonth.get() + " " + self.StartDay.get() + "," + " " + self.StartYear.get())
self.StartDate = datetime.strptime(self.StartDate, '%b %d, %Y').date()
self.StartDate = self.StartDate - timedelta(days = 60)
self.StartDate = self.StartDate.strftime('%b %d, %Y')
self.EndDate = (self.EndMonth.get() + " " + self.EndDay.get() + "," + " " + self.EndYear.get())
self.EndDate = datetime.strptime(self.EndDate, '%b %d, %Y').date()
self.EndDate = self.EndDate.strftime('%b %d, %Y')
JobType = self.JobType.get()
if JobType == 'All':
self.cursor.execute('''
SELECT
*
FROM
MainTable
WHERE
ETADate >= ? and
ETADate <= ?
''',
(self.StartDate, self.EndDate,)
)
self.data = self.cursor.fetchall()
else:
self.cursor.execute('''
SELECT
*
FROM
MainTable
WHERE
ETADate BETWEEN
? AND ?
AND EndUse = ?
''',
(self.StartDate, self.EndDate, JobType,)
)
self.data = self.cursor.fetchall()
self.Data_Cleanup()
def Data_Cleanup(self):
self.df = pd.DataFrame (
self.data,
columns = [
'id',
'JobNumber',
'ETADate',
'Balance',
'EndUse',
'PayType'
]
)
remove = ['id', 'JobNumber']
self.df = self.df.drop(columns = remove)
self.df['ETADate'] = pd.to_datetime(self.df['ETADate'])
self.df.loc[self.df['PayType'] == '14 Days', 'ETADate'] = self.df['ETADate'] + timedelta(days = 14)
self.df.loc[self.df['PayType'] == '30 Days', 'ETADate'] = self.df['ETADate'] + timedelta(days = 30)
self.df['ETADate'] = self.df['ETADate'].astype('category')
self.df['EndUse'] = self.df['EndUse'].astype('category')
self.df['PayType'] = self.df['PayType'].astype('category')
mask = (self.df['ETADate'] >= self.StartDate) & (self.df['ETADate'] <= self.EndDate)
print(self.df.loc[mask])
Ideally, the data would be updated before it is added to the database, but unfortunately the source of this data isn't capable of updating it correctly.
I appreciate any help.

You are storing dates as a string, formatted like Jan 02, 2021. That means you'll compare the month first, alphabetically, then the day numerically, then the year. Or, to take a few random dates, the sort order looks like this:
Dec 23, 2021
Jan 01, 2021
Nov 07, 2026
Nov 16, 2025
If you want a query that makes sense, you'll either need quite a bit of SQL logic to parse these dates on the SQLite side, or preferably, just store the dates using a format that sorts correctly as a string. If you use .strftime("%Y-%m-%d") those same dates will sort in order:
2021-01-01
2021-12-23
2025-11-16
2026-11-07
This will require changing the format of the columns in your database, of course.

Related

Regex with date as String in Azure path

I have many folders (in Microsoft Azure data lake), each folder is named with a date as the form "ddmmyyyy". Generally, I used the regex to extract all files of all folders of an exact month of a year in the way
path_data="/mnt/data/[0-9]*032022/data_[0-9]*.json" # all folders of all days of month 03 of 2022
result=spark.read.json(path_data)
My problem now is to extract all folders that match exactly one year before a given date
For example: for the date 14-03-2022; I need a regex to automatically read all files of all folders between 14-03-2021 and 14-03-2022.
I tried to extract the month and year in vars using strings, then using those two strings in a regex respecting the conditions ( for the showed example month should be greater than 03 when year equal to 2021 and less than 03 when the year is equal to 2022). I tried something similar to (while replacing the vars with 03, 2021 and 2022).
date_regex="([0-9]{2}[03-12]2021)|([0-9]{2}[01-03]2022)"
Is there any hint how I can perform such a task!
Thanks in advance
If I understand your question correctly.
To find our date between ??-03-2021 and ??-03-2022 from the file name field, you can use the following Regex
date_regex="([0-9]{2}-03-2021)|([0-9]{2}-03-2022)"
Also, if you want to be more customized, it is better to apply the changes from the link below and take advantage of it
https://regex101.com/r/AgqFfH/1
update : extract any folder named with a date between 14032021 and 14032022
solution : First we extract the date in ddmmyyyy format with ridge, then we give the files assuming that our format is correct and such a phrase is found in it.
date_regex="((0[1-9]|1[0-9]|2[0-8])|(0[1-9]|1[012]))"
if re.find(r"((0[1-9]|1[0-9]|2[0-8])|(0[1-9]|1[012]))") > 14032021 and re.find(r"((0[1-9]|1[0-9]|2[0-8])|(0[1-9]|1[012]))") < 14032022
..do any operation..
The above code is just overnight code for your overview of the solution method.
First we extract the date in ddmmyyyy format with regex, then we give the files assuming that our format is correct and such a phrase is found in it.
I hope this solution helps.
It certainly isn't pretty, but here you go:
#input
day = "14"; month = "03"; startYear = "2021";
#day construction
sameTensAfter = '(' + day[0] + '[' + day[1] + '-9])';
theDaysAfter = '([' + chr(ord(day[0])+1) + '-9][0-9])';
sameTensBefore = '(' + day[0] + '[0-' + day[1] + '])';
theDaysBefore = '';
if day[0] != '0':
theDaysBefore = '([0-' + chr(ord(day[0])-1) + '][0-9])';
#build the part for the dates with the same month as query
afterDayPart = '%s|%s' %(sameTensAfter, theDaysAfter);
beforeDayPart = '%s|%s' %(sameTensBefore, theDaysBefore);
theMonthAfter = str(int(month) + 1).zfill(2);
afterMonthPart = theMonthAfter[0] + '([' + theMonthAfter[1] + '-9])';
if theMonthAfter[0] == '0':
afterMonthPart += '|(1[0-2])';
theMonthBefore = str(int(month) - 1).zfill(2);
beforeMonthPart = theMonthBefore[0] + '([0-' + theMonthBefore[1] + '])';
if theMonthBefore[0] == '1':
beforeMonthPart = '(0[0-9])|' + beforeMonthPart;
#4 kinds of matches:
startDateRange = '((%s)(%s)(%s))' %(afterDayPart, month, startYear);
anyDayAfterMonth = '((%s)(%s)(%s))' %('[0-9]{2}', afterMonthPart, startYear);
endDateRange = '((%s)(%s)(%s))' %(beforeDayPart, month, int(startYear)+1);
anyDayBeforeMonth = '((%s)(%s)(%s))' %('[0-9]{2}', beforeMonthPart, int(startYear)+1);
#print regex
date_regex = startDateRange + '|' + anyDayAfterMonth + '|' + endDateRange + '|' + anyDayBeforeMonth;
print date_regex;
#this prints:
#(((1[4-9])|([2-9][0-9]))(03)(2021))|(([0-9]{2})(0([4-9])|(1[0-2]))(2021))|(((1[0-4])|([0-0][0-9]))(03)(2022))|(([0-9]{2})(0([0-2]))(2022))
startDateRange: the month is the same and it's the starting year, this will take all the days including and after.
anyDayAfterMonth: the month is greater and it's the starting year, this will take any day.
endDateRange: the month is the same and it's the ending year, this will take all the days including and before.
anyDayBeforeMonth: the month is less than and it's the ending year, this will take any day.
Here's an example: https://regex101.com/r/i76s58/1
to compare the date, use datetime module, example below.
Then you can only extract folders within your condition
# importing datetime module
import datetime
# date in yyyy/mm/dd format
d1 = datetime.datetime(2018, 5, 3)
d2 = datetime.datetime(2018, 6, 1)
# Comparing the dates will return
# either True or False
print("d1 is greater than d2 : ", d1 > d2)
print("d1 is less than d2 : ", d1 < d2)
print("d1 is not equal to d2 : ", d1 != d2)

Getting API Report Results into a pandas dataframe

I am having an issue because of one of my vendors. For some reason whenever I run any report through their statistics API it is always ran using Pacific Standard Time, regardless of the fact that I am in Eastern Standard Time. To account for this, I have to run the report with the start and end date dialed back by three hours, then I need to manually change the time of the "TimeStamp" column forward by three hours. Finally I need all the results input into my MS SQL instance. I have gotten to the point where I can get the results back, but I am stuck on what to do next. My instincts say it's going to probably be a pandas solution, but I am not sure how to get the results into the pandas dataframe. Here is what I have so far (note the vendor I am working with is called Five9, and I found a library for them that helps me connect to the API and get the report results I want):
from five9 import Five9
import datetime
from datetime import datetime, timedelta
import time
from pytz import timezone
import pyodbc
import json
now_utc = datetime.now(timezone('UTC'))
now_eastern = now_utc.astimezone(timezone('US/Eastern'))
#Change days from current time
startreportime = now_eastern - timedelta(days=2)
endreportime = now_eastern - timedelta(days=1)
#Set start and end time for report criteria
starttime = f"{(startreportime):%Y-%m-%d}" + 'T21:00:00.000'
endtime = f"{(endreportime):%Y-%m-%d}" + 'T20:59:00.000'
#connect to API
client = Five9('MyUID','MyPWD')
#Set variables as start and end
start = starttime
end = endtime
#set criteria using variables
criteria = {'time':{'end':end, 'start':start}}
#Get report and seet criteria for report
identifier = client.configuration.runReport(folderName='Five9 Import Data',\
reportName='Agent State Details',criteria=criteria)
#Sleep so report has time to complete
time.sleep(30)
#Get report results
get_results = client.configuration.getReportResult(identifier)
results = get_results['records']
print(results)
Using this I get these kinds of results:
[{
'values': {
'data': [
'Mon, 22 Feb 2021 21:00:00',
'abowling#*****.com',
'Adam',
'Bowling',
'Login',
None,
None,
'TUPSS, Telamon Inbound, Stericycle Environment Inbound, Stericycle ComSol Inbound,
'01:18:05',
'08 - TS'
]
}
If I could get these results into a dataframe I am pretty sure I could manage the rest. I know how to use a timedelta to handle the timestamp issues, and I can handle getting it from a dataframe to sql. I am just having a heck of a time trying to figure out how to get these results into a dataframe.
Not sure if anyone will read this, but I got it to work with the following:
def process_rows(rows):
for row in rows:
date1 = row['values']['data'][0]
date1 = datetime.strptime(date1, '%a, %d %b %Y %H:%M:%S').astimezone(timezone("US/Pacific"))
date2 = date1.astimezone(timezone("US/Eastern"))
date2 = date2.strftime('%Y-%m-%d %H:%M:%S')
cloned_row = [value for value in row['values']['data']]
cloned_row[0] = str(date2)
if cloned_row[8] == '24:00:00':
cloned_row[8] = '00:00:00'
yield cloned_row
args = process_rows(results)
insertSQL = ('''
INSERT INTO [Reporting].[dbo].[AgentState]
(TimeStamp, Agent, FirstName, LastName, State, ReasonCode, Media, Skill, StateTime, [Group])
VALUES (?,?,?,?,?,?,?,?,?,?)
'''
)
cursor.fast_executemany = True
cursor.executemany(insertSQL, args)
conn.commit()

Python Date issue when starting a new month

I have an issue with a python request. In this request I need to set two dates, today and yesterday. This has functioned without issue throughout my testing until today.
The issue here being that of course we have just started a new month.
I am currently using the following date codes, however as i have now realized they do not take the monthly reset into consideration.
yesterday = str(datetime.datetime.today().month) + "/" +
str(datetime.datetime.today().day-1) + "/" +
str(datetime.datetime.today().year)
today = str(datetime.datetime.today().month) + "/" +
str(datetime.datetime.today().day) + "/" +
str(datetime.datetime.today().year)
As soon as the date is not 0 the application works like a charm.
Use datetime.timedelta
Ex:
import datetime
today = datetime.datetime.now()
print(today.strftime("%m/%d/%Y"))
yesterday = (today - datetime.timedelta(days=1)).strftime("%m/%d/%Y")
print(yesterday)
Output:
10/01/2018
09/30/2018
Use strftime to get your required date format in string
You make things too complicated, instead of worrying about "wrap arounds", etc. In your code you subtract the number of days with 1, but if we are the first of the month (for example October, 1st), then by subtracting one from it, we get "October 0th" (sic.).
You better perform the arithmetic on the date object:
yesterday_date = datetime.date.today() - datetime.timestamp(days=1)
and then convert it to a string with:
yesterday = yesterday_date.strftime('%m/%d/%Y')
At the moment of writing, this generates:
>>> yesterday_date.strftime('%m/%d/%Y')
'09/30/2018'
Performing arithmetic in the printing is giving "two responsibilities at once", and this is typically bad software design: the idea is one responsibility.

How to resolve created events' time mismatches due to Calendar API upgrade?

For reference, my timezone is Eastern - New York.
I am inserting events from a PostgreSQL database to a Google Calendar. I have been using UTC-4 since early June, when I finally got my app moved from v2 to v3, and for a couple of years in v2. Up until the August 18 that has worked giving me the correct time. On August 18 the time was off by one hour so I changed the setting to UTC-5. That worked for about 2 hours and then I have had to reset it back to UTC-4.
Now today, August 21, it is off an hour again and I have set the UTC back to -5. The events are getting inserted as they should with the exception of an event being an hour off and the UTC needing to be changed sometimes. The system time is correct on my server.
Any ideas on what is happening?
Some of my code snippets:
#get an event from a PostgreSQL database to insert into a Google Calendar
curs.execute("SELECT c_event_title,c_name,c_event_date,c_event_starttime,c_event_endtime,c_department,seat_arrange,c_attendee_count from sched_421 where sched_id_421=%i;" %recnum)
mit=curs.fetchall() # mit IS NOW ALL THE RESULTS OF THE QUERY
for myrec in mit: # FOR THE ONE RECORD (EVENT) IN THE QUERY RESULTS
myend_time = time.strftime("%I:%M %p", time.strptime(str(myrec[4]),"%H:%M:%S"))
if myend_time[0]=='0': # Remove leading zero for 01:00 - 09:00
myend_time = myend_time[1:]
title = ' - %s %s - Group:%s' %(myend_time,myrec[0],myrec[5])
mycontent = myrec[0]+' - '+ myrec[5]
content = mycontent
where = where_dict[room_calendar]
# THIS IS WHERE THE UTC IS, SOMETIMES 4 WORKS SOMETIMES 5 WORKS
start_time = '%sT%s-05:00' %(myrec[2],myrec[3]) # Google format
end_time = '%sT%s-05:00' %(myrec[2],myrec[4]) # Google format
myend_time = '%s' %myrec[4] # User format (am/pm)
seat_arrange = '\nSeating - %s' %str(myrec[6])
attendee_count = '\nNumber of participants: %s' %str(myrec[7])
descript = str(myrec[0]) + ' ' + seat_arrange + attendee_count+ "\n Created By: me#somewhere.com"
# upload the event to the calendar
created_event = service.events().insert(calendarId=calendar_dict[room_calendar], body=event).execute()
Are the dates you are looking at on different sides of the daylight savings switch?
Eastern Time Zone is UTC-4:00 from March to November and UTC-5:00 from November to March.
Hard-coding the TZ Offset like that is a bad idea, especially in a TZ that uses daylight savings. It would be best to store all the times as UTC and just apply the TZ information at the endpoints (data input and data display).
At the very least, you will want to have something calculate the correct TZ offset, based on the date, like a helper function or some block of logic.
I'm not sure how much control you have over the data in the database, so that would dictate which path you choose.
Ideally, you could change the 3 fields (date, start time, end time) in the database into 2 (start datetime UTC, end datetime UTC)
I have had to change this code:
# THIS IS WHERE THE UTC IS, SOMETIMES 4 WORKS SOMETIMES 5 WORKS
start_time = '%sT%s-05:00' %(myrec[2],myrec[3]) # Google format
end_time = '%sT%s-05:00' %(myrec[2],myrec[4]) # Google format
to (check to see if the event is in daylight savings time or not, this was not necessary with v2)
if bool (pytz.timezone('America/New_York').dst(datetime.datetime(myrec[2].year,myrec[2].month,myrec[2].day), is_dst=None)):
utc_offset = '4'
else:
utc_offset = '5'
start_time = '%sT%s-0%s:00' %(myrec[2],myrec[3],utc_offset)
end_time = '%sT%s-0%s:00' %(myrec[2],myrec[4],utc_offset)

Queryset of people with a birthday in the next X days

how do i get queryset of people with a birthday in the next X days? I saw this answer, but it does not suit me, because gets people only with current year of birth.
Assuming a model like this--
class Person(models.Model):
name = models.CharField(max_length=40)
birthday = models.DateTimeField() # their next birthday
The next step would be to create a query filtering out any records with birthdays having a month and day in between (now.month, now.day) and (then.month, then.day). You can actually access the month and day attributes of the datetime object using the queryset API by passing Person.objects.filter a keyword argument like this: "birthday__month." I tried this with an actual queryset API method like "birthday__month__gte" and it failed though. So I would suggest simply generating a literal list of month/day tuples representing each (month, day) in the date range you want records for, then compose them all into a query with django.db.models.Q, like so:
from datetime import datetime, timedelta
import operator
from django.db.models import Q
def birthdays_within(days):
now = datetime.now()
then = now + timedelta(days)
# Build the list of month/day tuples.
monthdays = [(now.month, now.day)]
while now <= then:
monthdays.append((now.month, now.day))
now += timedelta(days=1)
# Tranform each into queryset keyword args.
monthdays = (dict(zip(("birthday__month", "birthday__day"), t))
for t in monthdays)
# Compose the djano.db.models.Q objects together for a single query.
query = reduce(operator.or_, (Q(**d) for d in monthdays))
# Run the query.
return Person.objects.filter(query)
After debugging, this should return a queryset with each person who has a birthday with month and day equal to any of the months or days in the specified list of tuples.
Assuming it's datetime field do something like this (using future_date from dimosaur answer):
Profile.objects.get(
Q(birthday__lte=future_date),
Q(birthday__gte=datetime.date.today())
)
I can think of 2 ways without using custom queries, both with "problems"
1) Not efficient as it does 1 query per day
start = datetime.date.today()
max_days = 14
days = [ start + datetime.timedelta(days=i) for i in xrange(0, max_days) ]
birthdays = []
for d in days:
for p in Profile.objects.filter(birthday__month=d.month, birthday__day=d.day):
birthdays.append(p)
print birthdays
2) Single query, but requires a model change. You would need to add bday_month and bday_day integer fields. These can obviously be populated automatically from the real date.
The limitation of this example is that you can only check against 2 months, start month and the end month. Setting 29 days you could jump over february, showing only Jan 31 and Mar 1.
from django.db.models import Q
start = datetime.date.today()
end = start + datetime.timedelta(days=14)
print Profile.objects.filter(
Q(bday_month=start.month) & Q(bday_day__gte=start.day) |
Q(bday_month=end.month) & Q(bday_day__lte=end.day)
)
If X is a constant that you know:
import datetime
future_date = datetime.date.today() + datetime.timedelta(days=X)
Profile.objects.filter(
birth_date__month=future_date.month,
birth_date__day=future_date.day
)
Something like that.
I have tried to do it in a really silly way, but seems it works:
import datetime
from django.db.models import Q
x = 5
q_args = ''
for d in range(x):
future_date = datetime.date.today() + datetime.timedelta(days=d)
q_args += 'Q(birth_date__month=%d, birth_date__day=%d)%s' % (
future_date.month,
future_date.day,
' | ' if d < x - 1 else ''
)
people = People.objects.filter(eval(q_args))
I was unsatisfied with all replies here. They are all a variant on "check one date/year by one in a range...", making a long, ugly queries. Here is a simple solution, if one is willing to denormalize a bit:
Change your model so instead of just datetime birthdate(yyyy, mm, dd) holding the real date you add a datetime birthday(DUMMY_YEAR, mm, dd) column. So every person in your DB will have saved its real birth date, and then a another birth date with a fixed year, shared with everyone else. Don't show this second field to users, though, and don't allow them to edit it.
Once you edited your model, make sure the birthdate and birthday are always connected by extending models.Model save method in your class:
def save(self, *args, **kwargs):
self.birthday = datetime.date(BIRTHDAY_YEAR,
self.birthdate.month, self.birthdate.day)
super(YOUR_CLASS, self).save(*args, **kwargs)
And once you ensured that whenever a date is saved as birthdate, the birthday is updated too, you can filter it with just birthday__gte/birthday__lte. See an excerpt from my admin filter, where I take care of a year boundary:
def queryset(self, request, queryset):
if self.value() == 'today':
# if we are looking for just today, it is simple
return queryset.filter(birthday = datetime.date(
BIRTHDAY_YEAR, now().month, now().day
))
if self.value() == 'week':
# However, if we are looking for next few days,
# we have to bear in mind what happens on the eve
# of a new year. So if the interval we are looking at
# is going over the new year, break the search into
# two with an OR.
future_date = (now() + datetime.timedelta(days=7)).date()
if (now().year == future_date.year):
return queryset.filter(
Q(birthday__gte = datetime.date(
BIRTHDAY_YEAR, now().month, now().day
)) &
Q(birthday__lte = datetime.date(
BIRTHDAY_YEAR,
future_date.month,
future_date.day)
)
)
else:
return queryset.filter(
# end of the old year
Q(birthday__gte = datetime.date(
BIRTHDAY_YEAR, now().month, now().day
)) &
Q(birthday__lte = datetime.date(BIRTHDAY_YEAR,12, 31)) |
# beginning of the new year
Q(birthday__gte = datetime.date(BIRTHDAY_YEAR, 1, 1)) &
Q(birthday__lte = datetime.date(BIRTHDAY_YEAR,
future_date.month,
future_date.day)
)
)
In case you wonder what the Q() is, look on Complex lookups with Q objects

Categories

Resources