how to remove DD in YYYYMMDD in python - python

I need to remove the day in date and I tried to use datetime.strftime and datetime.strptime but it couldn't work. I need to create a tuple of 2 items(date,price) from a nested list but I need to change the date format first.
here's part of the code:
def get_data(my_csv):
with open("my_csv.csv", "r") as csv_file:
csv_reader = csv.reader(csv_file, delimiter = (','))
next(csv_reader)
data = []
for line in csv_reader:
data.append(line)
return data
def get_monthly_avg(data):
oldformat = '20040819'
datetimeobject = datetime.strptime(oldformat,'%y%m%d')
newformat = datetime.strftime('%y%m ')

You miss print with date formats. 'Y' has to be capitalized.
from datetime import datetime
# use datetime to convert
def strip_date(data):
d = datetime.strptime(data,'%Y%m%d')
return datetime.strftime(d,'%Y%m')
data = '20110513'
print (strip_date(data))
# or just cut off day (2 last symbols) from date string
print (data[:6])
The first variant is better because you can verify that string is in proper date format.
Output:
201105
201105

You didnt specify any code, but this might work:
date = functionThatGetsDate()
date = date[0:6]

Related

How to use python to find the maximum value in a csv file list

I am very new to python and need some help finding the maximum/highest value of in a column of data (time) that is imported from a csv file. this is the code i have tried.
file = open ("results.csv")
unneeded = file.readline()
for line in file:
data = file.readline ()
linelist = line.split(",")
hours = linelist[4]
maxtime = 0
for x in hours:
if x > maxtime:
maxtime = x
print (maxtime)
any help is appreciated
edit: i tried this code but it gives me the wrong answer :(
file = open ("results.csv")
unneeded = file.readline()
maxtime = 0
for line in file:
data = file.readline ()
linelist = line.split(",")
hours = linelist[4]
if hours > str(maxtime):
maxtime = hours
print (maxtime)
[first few lines of results][1]
edit:
results cvs
[1]: https://i.stack.imgur.com/z3pEJ.png
I haven't tested it but this should work. Using the CSV library is easy for parsing CSV files.
import csv
with open("results.csv") as file:
csv_reader = csv.reader(file, delimiter=',')
for row in csv_reader:
hours = row[4]
maxtime = 0
if hours > maxtime:
maxtime = x
print (maxtime)
file.close()
My recommendation is using the pandas module for anything CSV-related.
Using dateutil, I create a dataset of dates and identification numbers whose date values are shuffled (no specific order) like so:
from dateutil.parser import *
from dateutil.rrule import *
from random import shuffle
dates = list(rrule(
DAILY,
dtstart=parse("19970902T090000"),
until=parse("19971224T000000")
))
shuffle(dates)
with open('out.csv', mode='w', encoding='utf-8') as out:
for i,date in enumerate(dates):
out.write(f'{i},{date}\n')
So thus, in this particular dataset, 1997-12-23 09:00:00 would be the "largest" date. Then, to extract the maximum date, we can just do it via string comparisons if it is formatted in the ISO 8601 date/time format:
from pandas import read_csv
df = read_csv('out.csv', names=['index', 'time'], header=1)
print(max(df['time']))
After running it, we indeed get 1997-12-23 09:00:00 printed in the terminal!

How remove time from date values pulled from a JSON file?

I am working in python and using a JSON file and pulling info from it and sending to a csv file. The code I am using is as follows:
import csv
import json
csv_kwargs = {
'dialect': 'excel',
'doublequote': True,
'quoting': csv.QUOTE_MINIMAL
}
inpfile = open('checkin.json', 'r', encoding='utf-8')
outfile = open('checkin.csv', 'w', encoding='utf-8')
writer = csv.writer(outfile, **csv_kwargs, lineterminator="\n")
for line in inpfile:
d = json.loads(line)
writer.writerow([d['business_id'],d['date']])
inpfile.close()
outfile.close()
checkin.json key values of business_id and date. The date values are in the form of 'MM:DD:YYYY HH:MM:SS' where it shows the date and then the time. Each business_id includes multiple dates associated with it. I included a line of the JSON file to show how each 'business_id' works and the dates associated with it. A line from the JSON is shown below:
{"business_id":"--1UhMGODdWsrMastO9DZw","date":"2016-04-26 19:49:16, 2016-08-30 18:36:57, 2016-10-15 02:45:18, 2016-11-18 01:54:50, 2017-04-20 18:39:06, 2017-05-03 17:58:02"}
My question is how do you code this to keep the date, but not the time being that they are in the same key value.
You can parse the date in your JSON as a timestamp and then truncate it to date using Python's built-in datetime module.
Import the module:
from datetime import datetime
Parse the date while writing:
for line in inpfile:
d = json.loads(line)
dates = map(lambda dt: datetime.strptime(dt.strip(), '%Y-%m-%d %H:%M:%S').strftime('%Y-%m-%d'), d['dates'].split(' '))
for date in dates:
writer.writerow([d['business_id'], date])
The formatting for date values described in you question isn't consistent, first you say it's MM:DD:YYYY, however in the sample line from the json input file it appears to be YYYY-MM-DD, and while such details may matter, that particular one doesn't to the revised code below. What did matter was the fact that there can be more than one, which is why I'm updating my answer.
import csv
import json
csv_kwargs = {
'dialect': 'excel',
'doublequote': True,
'quoting': csv.QUOTE_MINIMAL,
}
with open('checkin.json', 'r', encoding='utf-8') as inpfile, \
open('checkin.csv', 'w', encoding='utf-8', newline='') as outfile:
writer = csv.writer(outfile, **csv_kwargs)
for line in inpfile:
d = json.loads(line)
# Convert date value string into list of dates with the times removed.
dates = [date.strip().split(' ')[0] for date in d['date'].split(',')]
writer.writerow([d['business_id']] + dates)
If you're strictly using this program to convert the json file to csv, you can simply use string slices:
date, time = d['date'][:12], d['date'][12:]
If you want to store it as a datetime object to do something else
dt = time.strptime(d['date'], "'%m:%d:%Y''%H:%M:%S'")
# Other stuff
dt_string = dt.strftime("'%m:%d:%Y'")

How to compare date from csv(string) to actual date

filenameA ="ApptA.csv"
filenameAc = "CheckoutA.csv"
def checkouttenantA():
global filenameA
global filenameAc
import csv
import datetime
with open(filenameA, 'r') as inp, open(filenameAc, 'a' , newline = "") as out:
my_writer = csv.writer(out)
for row in csv.reader(inp):
my_date= datetime.date.today()
string_date = my_date.strftime("%d/%m/%Y")
if row[5] <= string_date:
my_writer.writerow(row)
Dates are saved in format %d/%m/%Y in an excel file on column [5]. I am trying to compare dates in csv file with actual date, but it is only comparing the %d part. I assume it is because dates are in string format.
Ok so there are a few improvements to make as well, which I'll put as an edit to this, but you're converting todays date to a string with strftime() and comparing the two strings, you should be converting the string date from the csv file to a datetime object and comparing those instead.
I'll add plenty of comments to try and explain the code and the reasoning behind it.
# imports should go at the top
import csv
# notice we are importing datetime from datetime (we are importing the `datetime` type from the module datetime
import from datetime import datetime
# try to avoid globals where possible (they're not needed here)
def check_dates_in_csv(input_filepath):
''' function to load csv file and compare dates to todays date'''
# create a list to store the rows which meet our criteria
# appending the rows to this will make a list of lists (nested list)
output_data = []
# get todays date before loop to avoid calling now() every line
# we only need this once and it'll slow the loop down calling it every row
todays_date = datetime.now()
# open your csv here using the function argument
with open(input_filepath, output_filepath) as csv_file:
reader = csv.reader(csv_file)
# iterate over the rows and grab the date in each row
for row in reader:
string_date = row[5]
# convert the string to a datetime object
csv_date = datetime.strptime(string_date, '%d/%m/%Y')
# compare the dates and append if it meets the criteria
if csv_date <= todays_date:
output_data.append(row)
# function should only do one thing, compare the dates
# save the output after
return output_data
# then run the script here
# this comparison is basically the entry point of the python program
# this answer explains it better than I could: https://stackoverflow.com/questions/419163/what-does-if-name-main-do
if __name__ == "__main__":
# use our new function to get the output data
output_data = check_dates_in_csv("input_file.csv")
# save the data here
with open("output.csv", "w") as output_file:
writer = csv.writer(output_file)
writer.writerows(output_data)
I would recommend to use Pandas for such tasks:
import pandas as pd
filenameA ="ApptA.csv"
filenameAc = "CheckoutA.csv"
today = pd.datetime.today()
df = pd.read_csv(filenameA, parse_dates=[5])
df.loc[df.iloc[:, 5] <= today].to_csv(filenameAc, index=False)

ValueError: time data '' does not match format '%d-%m-%Y %H:%M:%S'

What I am trying to do is:
Delete all rows where csv date is lower than 25.05.2016 23:59
Save the file with a different name
I have the following data in a csv in col A
WFQVG98765
FI Quality-Value-Growth
Some Random String 1
Datum
13-05-2016 23:59
14-05-2016 23:59
15-05-2016 23:59
16-05-2016 23:59
17-05-2016 23:59
18-05-2016 23:59
19-05-2016 02:03
.
.
.
.
This is what I have tried now
import csv
import datetime
from dateutil.parser import parse
def is_date(string):
try:
parse(string)
return True
except ValueError:
return False
'''
1. Delete all rows where csv date is lower than 25.05.2016 23:59
2. Save the file with a different name
'''
cmpDate = datetime.datetime.strptime('25.05.2016 23:59:00', '%d.%m.%Y %H:%M:%S')
with open('WF.csv', 'r') as csvfile:
csvReader = csv.reader(csvfile, delimiter=',')
for row in csvReader:
print (row[0])
if is_date(row[0]) and not row[0].strip(' '):
csvDate = datetime.datetime.strptime(row[0], '%d-%m-%Y %H:%M:%S') 'Error Here : ValueError: time data '' does not match format '%d-%m-%Y %H:%M:%S'
I also tried this for the error line
csvDate = datetime.datetime.strptime(row[0], '%d-%m-%Y %H:%M') 'But got the same error
if csvDate<cmpDate:
print (row[0]+'TRUE')
Here how can I delete the row if the condition is true and finally save it with a different name ?
You can analyse each row to compare the dates, and save the rows you want to keep in a list. You can then store those rows into a new csv file and delete the old one if you don't need it anymore.
Here's a snipped that does what you're asking for:
import csv
from datetime import datetime
cmpDate = datetime.strptime('25.05.2016 23:59:00', '%d.%m.%Y %H:%M:%S')
def is_lower(date_str):
try:
csvDate = datetime.strptime(row[0], '%d-%m-%Y %H:%M')
return (csvDate < cmpDate)
except:
pass
with open('WF.csv', 'r') as csvfile:
csvReader = csv.reader(csvfile, delimiter=',')
data = [row for row in csvReader if not is_lower(row[0])]
with open('output.csv', 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
[writer.writerow(row) for row in data]
is_date() is giving you false positives. Be more strict when you check the date format and consistent when you load a date string into datetime - follow one of the principles of Zen of Python - "There should be one-- and preferably only one --obvious way to do it":
def is_date(date_string):
try:
datetime.datetime.strptime(date_string, '%d-%m-%Y %H:%M:%S')
return True
except ValueError:
return False
In other words, don't mix dateutil.parser.parse() and datetime.datetime.strptime().
The datetime.datetime.strptime exception indicates you are passing a blank string to the function in row[0].
Once you get that issue resolved, you need to add code to write acceptable rows to a new file.
You're doing the wrong comparison when you call strip. Two things:
First of all, just use row[0].strip() (with no arguments). This will strip all whitespace (like newlines, etc), not just spaces.
Secondly, if is_date(row[0]) and not row[0].strip(' ') only passes when row[0] is empty, which is the opposite of what you want. This should be if row[0].strip() and is_date(row[0]):
Even better, given how your is_date function is implemented, you should probably just put your datetime creation into a function that handles errors. This is my usual pattern:
def parse_date(str_date):
try:
return datetime.datetime.strptime(str_date, '%d-%m-%Y %H:%M')
except ValueError:
return None
cmp_date = datetime.datetime.strptime('25.05.2016 23:59:00', '%d.%m.%Y %H:%M:%S')
output_rows = []
with open('WF.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
csv_date = parse_date(row[0].strip()) # returns a datetime or None
if csv_date and csv_date >= cmp_date:
output_rows.append(row)
# Finally, write output_rows to the output file

Remove specific rows from CSV file in python

I am trying to remove rows with a specific ID within particular dates from a large CSV file.
The CSV file contains a column [3] with dates formatted like "1962-05-23" and a column with identifiers [2]: "ddd:011232700:mpeg21:a00191".
Within the following date range:
01-01-1951 to 12-31-1951
07-01-1962 to 12-31-1962
01-01 to 09-30-1963
7-01 to 07-31-1965
10-01 to 10-31-1965
04-01-1966 to 11-30-1966
01-01-1969 to 12-31-1969
01-01-1970 to 12-31-1989
I want to remove rows that contain the ID ddd:11*
I think I have to create a variable that contains both the date range and the ID. And look for these in every row, but I'm very new to python so I'm not sure what would be an eloquent way to do this.
This is what I have now. -CODE UPDATED
import csv
import collections
import sys
import re
from datetime import datetime
csv.field_size_limit(sys.maxsize)
dateranges = [("01-01-1951","12-31-1951"),("07-01-1962","12-31-1962")]
dateranges = list(map(lambda dr: tuple(map(lambda x: datetime.strptime(x,"%m-%d-%Y"),dr)),dateranges))
def datefilter(x):
x = datetime.strptime(x,"%Y-%m-%d")
for r in dateranges:
if r[0]<=x and r[1]>=x: return True
return False
writer = csv.writer(open('filtered.csv', 'wb'))
for row in csv.reader('my_file.csv', delimiter='\t'):
if datefilter(row[3]):
if not row[2].startswith("dd:111"):
writer.writerow(row)
else:
writer.writerow(row)
writer.close()
I'd recommend using pandas: it's great for filtering tables. Nice and readable.
import pandas as pd
# assumes the csv contains a header, and the 2 columns of interest are labeled "mydate" and "identifier"
# Note that "date" is a pandas keyword so not wise to use for column names
df = pd.read_csv(inputFilename, parse_dates=[2]) # assumes mydate column is the 3rd column (0-based)
df = df[~df.identifier.str.contains('ddd:11')] # filters out all rows with 'ddd:11' in the 'identifier' column
# then filter out anything not inside the specified date ranges:
df = df[((pd.to_datetime("1951-01-01") <= df.mydate) & (df.mydate <= pd.to_datetime("1951-12-31"))) |
((pd.to_datetime("1962-07-01") <= df.mydate) & (df.mydate <= pd.to_datetime("1962-12-31")))]
df.to_csv(outputFilename)
See Pandas Boolean Indexing
Here is how I would approach that, but it may not be the best method.
from datetime import datetime
dateranges = [("01-01-1951","12-31-1951"),("07-01-1962","12-31-1962")]
dateranges = list(map(lambda dr: tuple(map(lambda x: datetime.strptime(x,"%m-%d-%Y"),dr)),dateranges))
def datefilter(x):
# The date format is different here to match the format of the csv
x = datetime.strptime(x,"%Y-%m-%d")
for r in dateranges:
if r[0]<=x and r[1]>=x: return True
return False
with open(main_file, "rb") as fp:
root = csv.reader(fp, delimiter='\t')
result = collections.defaultdict(list)
for row in root:
if datefilter(row[3]):
# use a regular expression or any other means to filter on id here
if row[2].startswith("dd:111"): #code to remove item
What I have done is create a list of tuples of your date ranges (for brevity, I only put 2 ranges in it), and then I convert those into datetime objects.
I have used maps for doing that in one line: first loop over all tuples in that list, applying a function which loops over all entries in that tuple and converts to a date time, using the tuple and list functions to get back to the original structure. Doing it the long way would look like:
dateranges2=[]
for dr in dateranges:
dateranges2.append((datetime.strptime(dr[0],"%m-%d-%Y"),datetime.strptime(dr[1],"%m-%d-%Y"))
dateranges = dateranges2
Notice that I just convert each item in the tuple into a datetime, and add the tuples to the new list, replacing the original (which I don't need anymore).
Next, I create a datefilter function which takes a datestring, converts it to a datetime, and then loops over all the ranges, checking if the value is in the range. If it is, we return True (indicating this item should be filtered), otherwise return False if we have checking all ranges with no match (indicating that we don't filter this item).
Now you can check out the id using any method that you want once the date has matched, and remove the item if desired. As your example is constant in the first few characters, we can just use the string startswith function to check the id. If it is more complex, we could use a regex.
My kinda approach workds like this -
import csv
import re
import datetime
field_id = 'ddd:11'
d1 = datetime.date(1951,1,01) #change the start date
d2 = datetime.date(1951,12,31) #change the end date
diff = d2 - d1
date_list = []
for i in range(diff.days + 1):
date_list.append((d1 + datetime.timedelta(i)).isoformat())
with open('mwevers_example_2016.01.02-07.25.55.csv','rb') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
for date in date_list:
if row[3] == date:
print row
var = re.search('\\b'+field_id,row[2])
if bool(var) == True:
print 'olalala'#here you can make a function to copy those rows into another file or any list
import csv
import sys
import re
from datetime import datetime
csv.field_size_limit(sys.maxsize)
field_id = 'ddd:11'
dateranges = [("1951-01-01", "1951-12-31"),
("1962-07-01", "1962-12-31"),
("1963-01-01", "1963-09-30"),
("1965-07-01", "1965-07-30"),
("1965-10-01", "1965-10-31"),
("1966-04-01", "1966-11-30"),
("1969-01-01", "1989-12-31")
]
dateranges = list(map(lambda dr:
tuple(map(lambda x:
datetime.strptime(x, "%Y-%m-%d"), dr)),
dateranges))
def datefilter(x):
x = datetime.strptime(x, "%Y-%m-%d")
for r in dateranges:
if r[0] <= x and r[1] >= x:
return True
return False
output = []
with open('my_file.csv', 'r') as f:
reader = csv.reader(f, delimiter='\t', quotechar='"')
next(reader)
for row in reader:
if datefilter(row[4]):
var = re.search('\\b'+field_id, row[3])
if bool(var) == False:
output.append(row)
else:
output.append(row)
with open('output.csv', 'w') as outputfile:
writer = csv.writer(outputfile, delimiter='\t', quotechar='"')
writer.writerows(output)

Categories

Resources