I want to create a csv file with a filename of the following format:
"day-month-year hour:minute-malware_scan.csv"
Example:" 6-8-2016 21:45-malware_scan.csv"
The first part of the filename is formed by the actual date and time at file creation time, instead "-malware_scan.csv" is a fixed string.
I know that in order to get the date and time I should use the time or datetime module and the strftime() function for formatting.
At first I tried with:
t = datetime.datetime.now()
formatted_time = t.strftime(%d-%m-%y %H:%M)
filename = formatted_time + "-malware_scan.csv"
with open(filename, "a") as f:
...............
I didn't get the expected result, so I tried another way:
i = datetime.datetime.now()
file_to_open = "{day}-{month}-{year} {hour}:{minute}-malware_scan.csv".format(day = i.day, month = i.month, year = i.year, hour = i.hour, minute = i.minute)
with open(file_to_open, "a") as f:
.......................
Also using the code above I don't get the expected result.
I get a filename of this kind: "6-8-2016 21". Day, month, year and hour is displayed but the minutes and the rest of the string (-malware_scan.csv) isn't diplayed.
I'm focusing only on the filename with this question, not on the csv writing itself, whose code is omitted.
The : character is not allowed for filenames on PC. You could discard the : separator entirely:
>>> from datetime import datetime
>>> t = datetime.now()
>>> formatted_time = t.strftime('%d-%m-%y %H%M')
>>> formatted_time
'06-08-16 2226'
>>> datetime.strptime(formatted_time, '%d-%m-%y %H%M')
datetime.datetime(2016, 8, 6, 22, 26)
Or replace that character with an underscore or hyphen.
Thanks to Moses Koledoye for spotting the problem. I was thinking I made a mistake in the Python code, but actually the problem was the characters of the filename.
According to MSDN the following are reserved characters that cannot be used in a filename on Windows:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Related
I have a document that contains several time measurements that I want to average, so I'm converting minutes and seconds to total seconds. The file looks something like:
Boring text
time 15:07
Right now I can get there with the following:
if line.startswith('time') :
rtminutes = re.findall('([0-9][0-9]):', line)
for min in rtminutes :
rtmintosec = float(min) * 60
rtseconds = re.findall(':([0-9][0-9])', line)
for sec in rtseconds :
totsecs = rtmintosec + float(sec)
print ("Total roast time in seconds:", totsecs)
It seems like the better way would be using time and total_seconds.
import time
if line.startswith('time') :
rt = re.search('\d{2}:\d{2}', line)
rtime = time.strftime(rt, '%M:%S')
rtime.total_seconds()
Every time I've attempted a time approach, I get errors along the lines of:
TypeError: strftime() argument 1 must be str, not re.Match
I'm obviously missing the boat somewhere. Suggestions appreciated.
You need to make sure a match was found, and if so extract the matched string from the re.Match object.
rt = re.search('\d{2}:\d{2}', line)
if rt: # if rt is None then rt.group() will raise an exception
rtime = time.strftime(rt.group(), '%M:%S')
rtime.total_seconds()
hi i tested with this code i just removed the \n and its working for me
#line is your line number where the tims is time 15:07
#Boringtext is the file name
from colorama import Fore, Style
import linecache as s
import time
from datetime import timedelta
from datetime import datetime
def file_len(fname , line):
return s.getline(fname,line)
line = file_len('Boringtext',2)
if line.startswith('time') :
rt = line.split(' ')
rtime = datetime.strptime(rt[1].rstrip("\n"), '%M:%S')
t = timedelta(minutes = rtime.minute , seconds = rtime.second )
print(Fore.BLUE,t.total_seconds(),Style.RESET_ALL)
I have a log file and am trying to print the data between two dates.
2020-01-31T20:12:38.1234Z, asdasdasdasdasdasd,...\n
2020-01-31T20:12:39.1234Z, abcdef,...\n
2020-01-31T20:12:40.1234Z, ghikjl,...\n
2020-01-31T20:12:41.1234Z, mnopqrstuv,...\n
2020-01-31T20:12:42.1234Z, wxyzdsasad,...\n
This is the sample log file and I want to print the lines between 2020-01-31T20:12:39 up to 2020-01-31T20:12:41.
So far I have manged to find and print the starting date line. I have passed the starting date as start.
with open("logfile.log") as myFile:
for line in myFile:
linenum += 1
if line.find(start) != -1:
print("Line " + str(linenum) + ": " + line.rstrip('\n'))
but how do I keep printing till the end date?
Not the answer in python but in bash.
sed -n '/2020-01-31T20:12:38.1234Z/,/2020-01-31T20:12:41.1234Z/p' file.log
Output:
2020-01-31T20:12:38.1234Z, asdasdasdasdasdasd,...\n
2020-01-31T20:12:39.1234Z, abcdef,...\n
2020-01-31T20:12:40.1234Z, ghikjl,...\n
2020-01-31T20:12:41.1234Z, mnopqrstuv,...\n
Since the time string is already structured nicely in your file, you can just do a simple string comparison between the times you're interested in without converting the string to a datetime object.
Use the csv module to read in the file, using the default comma delimiter, and then the filter() function to filter between two dates.
import csv
reader = csv.reader(open("logfile.log"))
filtered = filter(lambda p: p[0].split('.')[0] >= '2020-01-31T20:12:39' and p[0].split('.')[0] <= '2020-01-31T20:12:41', reader)
for l in filtered:
print(','.join(l))
Edit:
I used split() to remove the fractional part of the time string in the string comparison since you're interested in times to the nearest minute accuracy, e.g. 2020-01-31T20:12:39.
if you want in python,
import time
from datetime import datetime as dt
def to_timestamp(date,forma='%Y-%m-%dT%H:%M:%S'):
return time.mktime(dt.strptime(date,forma).timetuple())
start=to_timestamp(startdate)
end=to_timestamp(enddate)
logs={}
with open("logfile.log") as f:
for line in f:
date=line.split(', ')[0].split('.')[0]
logline=line.split(', ')[1].strip('\n')
if to_timestamp(date)>=start and to_timestamp(end) <= end:
logs[date]=logline
I'm trying to make a dynamic function: I give two datetime values and it could read the log between those datetime values, for example:
start_point = "2019-04-25 09:30:46.781"
stop_point = "2019-04-25 10:15:49.109"
I'm thinking of algorithm that checks:
if the dates are equal:
check if the start hour 0 char (09 -> 0) is higher or less than stop hour 0 char (10 -> 1);
same check with the hour 1 char ((start) 09 -> 9, (stop) 10 -> 0);
same check with the minute 0 char;
same check with the minute 1 char;
if the dates differ:
some other checks...
I don't know if I'm not inventing a wheel again, but I'm really lost, I'll list things I tried:
1.
...
cmd = subprocess.Popen(['egrep "2019-04-19 ([0-1][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9].[0-9]{3}" file.log'], shell=True, stdout=subprocess.PIPE)
cmd_result = cmd.communicate()[0]
for i in str(cmd_result).split("\n"):
print(i)
...
The problem with this one: I added the values from the example and it couldn't work, because it has invalid ranges like hour 1 chars it creates range [9-0], minute char 0 as well [3-1] and etc.
2.
Tried the following solutions from The best way to filter a log by a dates range in python
Any help is appreciated.
EDIT
the log line structure:
...
2019-04-25 09:30:46.781 text text text ...
2019-04-25 09:30:46.853 text text text ...
...
EDIT 2
So I tried the code:
from datetime import datetime as dt
s1 = "2019-04-25 09:34:11.057"
s2 = "2019-04-25 09:59:43.534"
start = dt.strptime('2019-04-25 09:34:11.057','%Y-%m-%d %H:%M:%S.%f')
stop = dt.strptime('2019-04-25 09:59:43.534', '%Y-%m-%d %H:%M:%S.%f')
start_1 = dt.strptime('09:34:11.057','%H:%M:%S.%f')
stop_1 = dt.strptime('09:59:43.534','%H:%M:%S.%f')
with open('file.out','r') as file:
for line in file:
ts = dt.strptime(line.split()[1],'%H:%M:%S.%f')
if (ts > start_1) and (ts < stop_1):
print line
and I got the error
ValueError: time data 'Platform' does not match format '%H:%M:%S.%f'
So it seems I found the other problem it contains sometimes non datetime at line start. Is there a way to provide a regex in which I provide the datetime format?
EDIT 3
Fixed the issue when the string appears at the start of the line which causes ValueError and fixed index out of range error when maybe the other values occur:
try:
ts = dt.strptime(line.split()[1],'%H:%M:%S.%f')
if (ts > start_1) and (ts < stop_1):
print line
except IndexError as err:
continue
except ValueError as err:
continue
So now it lists not in the range I provide, now it read the log
FROM 2019-02-27 09:38:46.229TO 2019-02-28 09:57:11.028. Any thoughts?
Your edit 2 had the right idea. You need to put exception handling in to catch lines which are not formatted correctly and skip them, for example blank lines, or lines that do not have the timestamp. This can be done as follows:
from datetime import datetime
s1 = "2019-04-25 09:24:11.057"
s2 = "2019-04-25 09:59:43.534"
fmt = '%Y-%m-%d %H:%M:%S.%f'
start = datetime.strptime(s1, fmt)
stop = datetime.strptime(s2, fmt)
with open('file.out', 'r') as file:
for line in file:
line = line.strip()
try:
ts = datetime.strptime(' '.join(line.split(' ', maxsplit=2)[:2]), fmt)
if start <= ts <= stop:
print(line)
except:
pass
The whole of the timestamp is used to create ts, this was so it can be correctly compared with start and stop.
Each line first has the trailing newline removed. It is then split on spaces up to twice. The first two splits are then joined back together and converted into a datetime object. If this fails, it implies that you do not have a correctly formatted line.
I am trying to convert the date formats and make them uniform throughout the document using Python 3.6.
Here is the sample of the dates in my document:(There can be other formats as the document is large.)
9/21/1989
19640430
6/27/1980
5/11/1987
Mar 12 1951
2 aug 2015
I have checked the datetime lbrary. But could not understand hoow to detect and change the format of the dates automatically. Here is what I have checked till now:
>>> from datetime import datetime
>>> oldformat = '20140716'
>>> datetimeobject = datetime.strptime(oldformat,'%Y%m%d')
>>> newformat = datetimeobject.strftime('%m-%d-%Y')
>>> print (newformat)
07-16-2014
But I am not getting how I can make the program detect the date patterns automatically and convert them to one single uniform pattern of dates as mm/dd/yyyy
Kindly, suggest what I need to do, so as to achieve my goal using Python 3.6.
There is no universal Python way of doing this, but I'd recommend using regex to identify the type and then converting it correctly:
Example Python
import re
from datetime import datetime
with open("in.txt","r") as fi, open("out.txt","w") as fo:
for line in fi:
line = line.strip()
dateObj = None
if re.match(r"^\d{8}$", line):
dateObj = datetime.strptime(line,'%Y%m%d')
elif re.match(r"^\d{1,2}/", line):
dateObj = datetime.strptime(line,'%m/%d/%Y')
elif re.match(r"^[a-z]{3}", line, re.IGNORECASE):
dateObj = datetime.strptime(line,'%b %d %Y')
elif re.match(r"^\d{1,2} [a-z]{3}", line, re.IGNORECASE):
dateObj = datetime.strptime(line,'%d %b %Y')
fo.write(dateObj.strftime('%m-%d-%Y') + "\n")
Example Input
9/21/1989
19640430
6/27/1980
5/11/1987
Mar 12 1951
2 aug 2015
Example Output
09-21-1989
04-30-1964
06-27-1980
05-11-1987
03-12-1951
08-02-2015
I have tried using the dateutil library in my code to detect the date strings in any format. and then used the datetime library to convert it into the appropriate format.
Here is the code:
>>> import dateutil.parser
>>> yourdate = dateutil.parser.parse("May 24 2016")
>>>
>>> print(yourdate)
2016-05-24 00:00:00
>>> from datetime import datetime
>>> oldformat = yourdate
>>> datetimeobject = datetime.strptime(str(oldformat),'%Y-%m-%d %H:%M:%S')
>>> newformat = datetimeobject.strftime('%m-%d-%Y')
>>> print (newformat)
05-24-2016
This works.
See the image of the output:
(There can be other formats as the document is large.)
Unfortunately, Python does not provide "guess what I mean" functionality (although you might be able to repurpose GNU date for that, as it is quite flexible). You will have to make a list of all of the formats you want to support, and then try each in turn (using datetime.strptime() as you've shown) until one of them works.
Python does not try to guess because, in an international context, it is not generally possible to divine what the user wants. In the US, 2/3/1994 means "February 3rd, 1994," but in Europe the same string means "The 2nd of March, 1994." Python deliberately abstains from this confusion.
import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = [format(s,'02')+format(d.year,'04')+format(d.month,'02')+format(d.day,'02')+format(d.hour,'02')+format(d.minute,'02')+format(int(p*0.2),'04')]
outfile.writelines(nr+'/n')
Using the above script, I have read in a .txt file and reformatted it as 'nr' so it looks like this:
['012015072314000000']
['012015072313450000']
['012015072313300000']
['012015072313150000']
['012015072313000000']
['012015072312450000']
['012015072312300000']
['012015072312150000']
..etc.
I need to now print it onto my new .txt file, but Python is not allowing me to print 'nr' with line breaks after each entry, I think because the data is in strings. I get this error:
TypeError: can only concatenate list (not "str") to list
Is there another way to do this?
You are trying to combine a list with a string, which cannot work. Simply don't create a list in nr.
import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = "{:02d}{:%Y%m%d%H%M}{:04d}\n".format(s,d,int(p*0.2))
outfile.write(nr)
There is no need to put your string into a list; just use outfile.write() here and build a string without a list:
nr = format(s,'02') + format(d.year,'04') + format(d.month, '02') + format(d.day, '02') + format(d.hour, '02') + format(d.minute, '02') + format(int(p*0.2), '04')
outfile.write(nr + '\n')
Rather than use 7 separate format() calls, use str.format():
nr = '{:02}{:%Y%m%d%H%M}{:04}\n'.format(s, d, int(p * 0.2))
outfile.write(nr)
Note that I formatted the datetime object with one formatting operation, and I included the newline into the string format.
You appear to have hard-coded the s value; you may as well put that into the format directly:
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, int(p * 0.2))
outfile.write(nr)
Together, that updates your script to:
with open('soundTransit1_remote_rawMeasurements_15m.txt', 'r') as infile,\
open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile, delimiter='\t')
for row in inr:
d = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
p = int(int(row[5]) * 0.2)
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, p)
outfile.write(nr)
Take into account that the csv module works better if you follow the guidelines about opening files; in Python 2 you need to open the file in binary mode ('rb'), in Python 3 you need to set the newline parameter to ''. That way the module can control newlines correctly and supports including newlines in column values.