queries while retrieving data from list in python - python

I have a list in python which has below data . each data represent a table name
[tablename_20211011, tablename_20201010, tablename_20211009,
tablename_20211009, tablename_20211008]
20211011 -- this is the date when table got created
how i can fetch the table names which are created in last 1 year python.
if crteria is 1 yr then result should be tablename_20211011,tablename_20211009, tablename_20211008,tablename_20211009

!!!works!!! here you dont need to mention last year date manually it does the job automatically
from datetime import date
(datetime.datetime.now() - datetime.timedelta(days=365)).strftime("%Y%m%d")
d1 = today.strftime("%Y%m%d")
#this gives you date of last year
[x for x in a if x[:-8]>=d1]
this returns the items after the given date

Assuming that it is always the last 8 characters of your filename that are the date (ie. YYYYMMDD format), you could just use:
files = ['tablename_20211011', 'tablename_20201010', 'tablename_20211009', 'tablename_20211009', 'tablename_20211008']
print ([x for x in files if x[-8:] >= '20210101'])
Simply set the date-string to the right of the >= symbol as needed.
If the date is not always the last 8 characters of the string, then you may need to use a regular expression (regex) approach to extract it.

Related

Split based on _ and find the difference between dates

I am trying to find the difference between the below two dates. It is in the format of "dd-mm-yyyy". I splitted the two strings based on _ and extract the date, month and year.
previous_date = "filename_03_03_2021"
current_date = "filename_09_03_2021"
previous_array = previous_date.split("_")
Not sure after that what could be done to combine them into a date format and find the difference between dates in "days".
Any leads/suggestions would be appreciated.
You could index into the list after split like previous_array[1] to get the values and add those to date
But tnstead of using split, you might use a pattern with 3 capture groups to make the match a bit more specific to get the numbers and then subtract the dates and get the days value.
You might make the date like pattern more specific using the pattern on this page
import re
from datetime import date
previous_date = "filename_03_03_2021"
current_date = "filename_09_03_2021"
pattern = r"filename_(\d{2})_(\d{2})_(\d{4})"
pMatch = re.match(pattern, previous_date)
cMatch = re.match(pattern, current_date)
pDate = date(int(pMatch.group(3)), int(pMatch.group(2)), int(pMatch.group(1)))
cDate = date(int(cMatch.group(3)), int(cMatch.group(2)), int(cMatch.group(1)))
print((cDate - pDate).days)
Output
6
See a Python demo

Processing data with incorrect dates like 30th of February

In trying to process a large number of bank account statements given in CSV format I realized that some of the dates are incorrect (30th of February, which is not possible).
So this snippet fails [1] telling me that some dates are incorrect:
df_from_csv = pd.read_csv( csv_filename
, encoding='cp1252'
, sep=";"
, thousands='.', decimal=","
, dayfirst=True
, parse_dates=['Buchungstag', 'Wertstellung']
)
I could of course pre-process those CSV files and replace the 30th of Feb with 28th of Feb (or whatever the Feb ended in that year).
But is there a way to do this in Pandas, while importing? Like "If this column fails, set it to X"?
Sample row
775945;28.02.2018;30.02.2018;;901;"Zinsen"
As you can see, the date 30.02.2018 is not correct, because there ain't no 30th of Feb. But this seems to be a known problem in Germany. See [2].
[1] Here's the error message:
ValueError: day is out of range for month
[2] https://de.wikipedia.org/wiki/30._Februar
Here is how I solved it:
I added a custom date-parser:
import calendar
def mydateparser(dat_str):
"""Given a date like `30.02.2020` create a correct date `28.02.2020`"""
if dat_str.startswith("30.02"):
(d, m, y) = [int(el) for el in dat_str.split(".")]
# This here will get the first and last days in a given year/month:
(first, last) = calendar.monthrange(y, m)
# Use the correct last day (`last`) in creating a new datestring:
dat_str = f"{last:02d}.{m:02d}.{y}"
return pd.datetime.strptime(dat_str, "%d.%m.%Y")
# and used it in `read_csv`
for csv_filename in glob.glob(f"{path}/*.csv"):
# read csv into DataFrame
df_from_csv = pd.read_csv(csv_filename,
encoding='cp1252',
sep=";",
thousands='.', decimal=",",
dayfirst=True,
parse_dates=['Buchungstag', 'Wertstellung'],
date_parser=mydateparser
)
This allows me to fix those incorrect "30.02.XX" dates and allow pandas to convert those two date columns (['Buchungstag', 'Wertstellung']) into dates, instead of objects.
You could load it all up as text, then run it through a regex to identify non legal dates - which you could apply some adjustment function.
A sample regex you might apply could be:
ok_date_pattern = re.compile(r"^(0[1-9]|[12][0-9]|3[01])[-](0[1-9]|1[012])[-](19|20|99)[0-9]{2}\b")
This finds dates in DD-MM-YYYY format where the DD is constrained to being from 01 to 31 (i.e. a day of 42 would be considered illegal) and MM is constrained to 01 to 12, and YYYY is constrained to being within the range 1900 to 2099.
There are other regexes that go into more depth - such as some of the inventive answers found here
What you then need is a working adjustment function - perhaps that parses the date as best it can and returns a nearest legal date. I'm not aware of anything that does that out of the box, but a function could be written to deal with the most common edge cases I guess.
Then it'd be a case of tagging legal and illegal dates using an appropriate regex, and assigning some date-conversion function to deal with these two classes of dates appropriately.

How to import date column from csv in python in format d/m/y

I have a data sheet in which issue_d is a date column having values stored in a format - 11-Dec. On clicking any cell of the column, date is coming as 12/11/2018.
But while reading the csv file, issue_d is getting imported as 11-Dec. Year is not getting imported.
How do I get the issue_d column in format- d/m/y?
Code i tried -
import pandas
data=pandas.read_csv('Project_data.csv')
print(data)
checking issue_d column: data['issue_d']
result :
0 11-Dec
1 11-Dec
2 11-Dec
expected:
0 11-Dec-2018
1 11-Dec-2018
2 11-Dec-201
You can use to_datetime with add year to column:
df['issue_d'] = pd.to_datetime(df['issue_d'] + '-2018')
print (df)
issue_d
0 2018-12-11
1 2018-12-11
2 2018-12-11
A more 'controllable' way of getting the data is to first get the datetime from the data frame as normal, and then convert it:
dt = dt.strftime('%Y-%m-%d')
In this case, you'd put %d in front. strftime is a great technique because it allows the most customization when converting a datetime variable, and I used it in my tutorial book - if you're a beginner to python algorithms, you should definitely check it out!
After you do this, you can splice out each individual month, day, and year, and then use
strftime("%B")
to get the string-name of the month (e.g. "February").
Good Luck!

What is an efficient way to trim a date in Python?

Currently I am trying to trim the current date into day, month and year with the following code.
#Code from my local machine
from datetime import datetime
from datetime import timedelta
five_days_ago = datetime.now()-timedelta(days=5)
# result: 2017-07-14 19:52:15.847476
get_date = str(five_days_ago).rpartition(' ')[0]
#result: 2017-07-14
#Extract the day
day = get_date.rpartition('-')[2]
# result: 14
#Extract the year
year = get_date.rpartition('-')[0])
# result: 2017-07
I am not a Python professional because I grasp this language for a couple of months ago but I want to understand a few things here:
Why did I receive this 2017-07 if str.rpartition() is supposed to separate a string once you have declared some sort separator (-, /, " ")? I was expecting to receive 2017...
Is there an efficient way to separate day, month and year? I do not want to repeat the same mistakes with my insecure code.
I tried my code in the following tech. setups:
local machine with Python 3.5.2 (x64), Python 3.6.1 (x64) and repl.it with Python 3.6.1
Try the code online, copy and paste the line codes
Try the following:
from datetime import date, timedelta
five_days_ago = date.today() - timedelta(days=5)
day = five_days_ago.day
year = five_days_ago.year
If what you want is a date (not a date and time), use date instead of datetime. Then, the day and year are simply properties on the date object.
As to your question regarding rpartition, it works by splitting on the rightmost separator (in your case, the hyphen between the month and the day) - that's what the r in rpartition means. So get_date.rpartition('-') returns ['2017-07', '-', '14'].
If you want to persist with your approach, your year code would be made to work if you replace rpartition with partition, e.g.:
year = get_date.partition('-')[0]
# result: 2017
However, there's also a related (better) approach - use split:
parts = get_date.split('-')
year = parts[0]
month = parts[1]
day = parts[2]

How to add variables together into a new variable where you control the separation

Let's say i've declared three variables which are a date, how can I combine them into a new variable where i can print them in the correct 1/2/03 format by simply printing the new variable name.
month = 1
day = 2
year = 03
date = month, day, year <<<<< What would this code have to be?
print(date)
I know i could set the sep='/' argument in the print statement if i call all three variables individually, but this means i can't add addition text into the print statement without it also being separated by a /. therefore i need a single variable i can call.
The .join() method does what you want (assuming the input is strings):
>>> '/'.join((month, day, year))
1/2/03
As does all of Python's formatting options, e.g.:
>>> '%s/%s/%s' % (month, day, year)
1/2/03
But date formatting (and working with dates in general) is tricky, and there are existing tools to do it "right", namely the datetime module, see date.strftime().
>>> date = datetime.date(2003, 1, 2)
>>> date.strftime('%m/%d/%y')
'01/02/03'
>>> date.strftime('%-m/%-d/%y')
'1/2/03'
Note the - before the m and the d to suppress leading zeros on the month and date.
You can use the join method. You can also use a list comprehension to format the strings so they are each 2 digits wide.
>>> '/'.join('%02d' % i for i in [month, day, year])
'01/02/03'
You want to read about the str.format() method:
https://docs.python.org/3/library/stdtypes.html#str.format
Or if you're using Python 2:
https://docs.python.org/2/library/stdtypes.html#str.format
The join() function will also work in this case, but learning about str.format() will be more useful to you in the long run.
The correct answer is: use the datetime module:
import datetime
month = 1
day = 2
year = 2003
date = datetime(year, month, day)
print(date)
print(date.strftime("%m/%d/%Y"))
# etc
Trying to handle dates as tuples is just a PITA, so don't waste your time.

Categories

Resources