Saving Multiple files in Python - python

I am trying to create a new file each time the following runs. At the moment it creates 1 file and just overwrites it. Is there a to make it not overwrite and create a new file for each loop?
import xml.etree.ElementTree as ET
import time
import csv
with open('OrderCSV.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
orders_data = ET.Element('orders_data')
orders = ET.SubElement(orders_data, 'orders')
##Order Details
order_reference = ET.SubElement(orders, 'order reference')
order_reference.set('',"12345")
order_date = ET.SubElement(order_reference, 'order_date')
order_priority = ET.SubElement(order_reference, 'order_priority')
order_category = ET.SubElement(order_reference, 'order_category')
delivery_service = ET.SubElement(order_reference, 'delivery_service')
delivery_service.text = row['delivery_service']
timestr = time.strftime("%Y%m%d%H%M%S")
mydata = ET.tostring(orders_data)
myfile = open(timestr, "wb")
myfile.write(mydata)

You could see if the file already exists and wait a bit
while True:
timestr = time.strftime("%Y%m%d%H%M%S")
if not os.path.exists(timestr):
break
time.sleep(.1)
with open(timestr, "wb") as myfile:
mydata = ET.tostring(orders_data)
myfile.write(mydata)
Instead of waiting you could just add seconds. This will cause the file names to drift forward in time if you process a lot of them per second.
mytime = time.time()
while True:
timestr = time.strftime("%Y%m%d%H%M%S", time.localtime(mytime))
if not os.path.exists(timestr):
break
time.sleep(.1)
with open(timestr, "wb") as myfile:
mydata = ET.tostring(orders_data)
myfile.write(mydata)
Another option is to get a single timestamp before the loop and update it as you go.
mytime = time.strftime("%Y%m%d%H%M%S")
for index, row in enumerate(reader):
....
mytime = f"mytime-{index}"
....

change the variable name each time you run the loop and I would suggest using with statement for opening file as you also have to close it after you open it
with open(timestr, 'wb') as myfile:
myfile.write(mydata)
edit: only flaw I can imagine in your code is not closing the file after opening it

Related

Capture the last timestamp, without reading the complete file using Python

I am fairly new to python and I trying to capture the last line on a syslog file using python but unable to do so. This is a huge log file so I want to avoid loading the complete file in memory. I just want to read the last line of the file and capture the timestamp for further analysis.
I have the below code which captures all the timestamps into a python dict which take a really long time to run for it to get to the last timestamp once it completed my plan was to reverse the list and capture the first object in the index[0]:
The lastFile function uses glob module and gives me the most latest log file name which is being fed into recentEdit of the main function.
Is there a better way of doing this
Script1:
#!/usr/bin/python
import glob
import os
import re
def main():
syslogDir = (r'Location/*')
listOfFiles = glob.glob(syslogDir)
recentEdit = lastFile(syslogDir)
print(recentEdit)
astack=[]
with open(recentEdit, "r") as f:
for line in f:
result = [re.findall(r'\d{4}.\d{2}.\d{2}T\d{2}.\d{2}.\d{2}.\d+.\d{2}.\d{2}',line)]
print(result)
def lastFile(i):
listOfFiles = glob.glob(i)
latestFile = max(listOfFiles, key=os.path.getctime)
return(latestFile)
if __name__ == '__main__': main()
Script2:
###############################################################################
###############################################################################
#The readline() gives me the first line of the log file which is also not what I am looking for:
#!/usr/bin/python
import glob
import os
import re
def main():
syslogDir = (r'Location/*')
listOfFiles = glob.glob(syslogDir)
recentEdit = lastFile(syslogDir)
print(recentEdit)
with open(recentEdit, "r") as f:
fLastLine = f.readline()
print(fLastLine)
# astack=[]
# with open(recentEdit, "r") as f:
# for line in f:
# result = [re.findall(r'\d{4}.\d{2}.\d{2}T\d{2}.\d{2}.\d{2}.\d+.\d{2}.\d{2}',line)]
# print(result)
def lastFile(i):
listOfFiles = glob.glob(i)
latestFile = max(listOfFiles, key=os.path.getctime)
return(latestFile)
if __name__ == '__main__': main()
I really appreciate your help!!
Sincerely.
If you want to directly go,to the end of the file. Follow these steps:
1.Every time your program runs persist or store the last '\n' index.
2.If you have persisted index of last '\n' then you can directly seek to that index using
file.seek(yourpersistedindex)
3.after this when you call file.readline() you will get the lines starting from yourpersistedindex.
4.Store this index everytime your are running your script.
For Example:
you file log.txt has content like:
timestamp1 \n
timestamp2 \n
timestamp3 \n
import pickle
lastNewLineIndex = None
#here trying to read the lastNewLineIndex
try:
rfile = open('pickledfile', 'rb')
lastNewLineIndex = pickle.load(rfile)
rfile.close()
except:
pass
logfile = open('log.txt','r')
newLastNewLineIndex = None
if lastNewLineIndex:
#seek(index) will take filepointer to the index
logfile.seek(lastNewLineIndex)
#will read the line starting from the index we provided in seek function
lastLine = logfile.readline()
print(lastLine)
#tell() gives you the current index
newLastNewLineIndex = logfile.tell()
logfile.close()
else:
counter = 0
text = logfile.read()
for c in text:
if c == '\n':
newLastNewLineIndex = counter
counter+=1
#here saving the new LastNewLineIndex
wfile = open('pickledfile', 'wb')
pickle.dump(newLastNewLineIndex,wfile)
wfile.close()

Wondering if there is a better way to update files?

I currently have a python program that is both a web-scraper, and file-writer which updates databases that are on my desktop using windows 10 task scheduler. The problem is, for some reason the task scheduler doesn't run the python files at the specified time 100% of the time. I was wondering if there was a better approach to assure that the files get updated at their specified times, as long as the computer is on.
I've Tried changing the task scheduler settings, but I still have this problem.
import requests
from bs4 import BeautifulSoup
from datetime import datetime
#Updates Everyday.
#Fantasy5-WebScraper
response = requests.get('https://www.lotteryusa.com/michigan/fantasy-5/')
soup = BeautifulSoup(response.text, 'html.parser')
date = soup.find(class_='date')
results = soup.find(class_='draw-result list-unstyled list-inline')
d = datetime.strptime(date.time['datetime'], '%Y-%m-%d')
Fantasy5 = (d.strftime("%Y-%m-%d")+(',')+results.get_text().strip().replace('\n',','))
print(Fantasy5)
#Writing to DataBase
with open("Filename.txt", "r") as f:
data = f.read()
with open("Filename.txt", "w") as f:
f.write('{}{}{}'.format(Fantasy5, '\n' if data else '', data))
f.close()
#Writing to DataFrame
with open("Filename.txt", "r") as f:
data = f.read()
with open("Filename.txt", "w") as f:
f.write('{}{}{}'.format(Fantasy5, '\n' if data else '', data))
f.close()
You can use schedule to do this task. then add the python file to startup so it gets executed every time you start the computer.
this program will do the job every day at 6 am.
import schedule
import time
import requests
from bs4 import BeautifulSoup
from datetime import datetime
def job(t):
response = requests.get('https://www.lotteryusa.com/michigan/fantasy-5/')
soup = BeautifulSoup(response.text, 'html.parser')
date = soup.find(class_='date')
results = soup.find(class_='draw-result list-unstyled list-inline')
d = datetime.strptime(date.time['datetime'], '%Y-%m-%d')
Fantasy5 = (d.strftime("%Y-%m-%d")+(',')+results.get_text().strip().replace('\n',','))
print(Fantasy5)
#Writing to DataBase
with open("Filename.txt", "r") as f:
data = f.read()
with open("Filename.txt", "w") as f:
f.write('{}{}{}'.format(Fantasy5, '\n' if data else '', data))
f.close()
#Writing to DataFrame
with open("Filename.txt", "r") as f:
data = f.read()
with open("Filename.txt", "w") as f:
f.write('{}{}{}'.format(Fantasy5, '\n' if data else '', data))
f.close()
return
schedule.every().day.at("06:00").do(job,'It is 06:00')
while True:
schedule.run_pending()
time.sleep(60)

Adding notes to a data file (csv) in python

I am trying to capture data from an oscilloscope using a python script. The script saves it as in csv format. I need to add few lines of text describing the data at the beginning.
I looked at existing threads to see if there was a possible solution. I just started learning Python. I am using code that came with the instrument.
This is part of the script that saves the data as csv.
NewD = (np.insert(Wav_Data, 0, DataTime, axis = 0)).T
filename = BASE_DIRECTORY + BASE_FILE_NAME + ".csv"
now = time.time() # Only to show how long it takes to save
with open(filename, 'w') as filehandle:
np.savetxt(filename, NewD, delimiter = ',', header = column_titles)
I tried to use the section below from another code but am not sure how to append this to the csv file.
with open("notes.txt") as f:
NOTES = f.readlines()
NOTES = "".join(NOTES)
It is unable to find notes.txt which is located in the same directory as the script.
Eager to hear your feedback. Thanks in advance.
Updated to:
# Save data
NewD = (np.insert(Wav_Data, 0, DataTime, axis = 0)).T
filename = BASE_DIRECTORY + BASE_FILE_NAME + ".csv"
with open("notes.txt") as f:
NOTES = f.readlines()
NOTES = "".join(NOTES)
with open(filename, "a") as fh:
fh.write(NOTES)
now = time.time() # Only to show how long it takes to save
with open(filename, 'w') as filehandle:
np.savetxt(filename, NewD, delimiter = ',', header = column_titles)
Just open the file for appending or write
If you want to write CSV first then notes:
with open("notes.txt") as f:
NOTES = f.readlines()
NOTES = "".join(NOTES)
with open(filename, "w") as fh:
fh.write(NOTES)
# this time we give np the opened filehandle, not the filename
np.savetxt(fh, NewD, delimiter = ',', header = column_titles)

How to call a date within a gzip.open call

I am wanting to write a script where I open a gziped file with 'todays date' in its title.
Here is what I have so far:
todays_date = time.strftime("%Y%m%d") #format time as YYYYMMDD
nextpath = os.getcwd()
service_file = glob.glob(nextpath+"\\"+"shot_*_"+todays_date+"*_vice.gz")
input_file = glob.glob(nextpath+"\\"+"input_file.csv")
myData = gzip.open(service_file, 'rb')
myFile = open(input_file, 'wb') with myFile:
writer = csv.writer(myFile)
writer.writerows(myData)
This was working when I wrote the full path:
myData = gzip.open(D:/Temp/shot_655_20180109121455_vice.gz
myFile = open(D:/Temp/input_file.csv, 'wb') with myFile:
But since I have attempted to change it to make the date variable changeable I get the error:
SyntaxError: invalid syntax
I know I am calling on it wrong somehow but I am stuck and any help would be appreciated.
Thanks
You're using 'with open' incorrectly. It should look like this:
with open(my_file, 'r') as mf:
# do stuff here
this way you don't have to worry about closing it later. Otherwise you can just assign the result of open() to a variable:
mf = open(my_file, 'r')
....
mf.close()
Here's a link to the docs, with more information https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files

trying to read xlrd, extract data, and write csv

I am trying to read an excel file, extract some data, and write it out as a csv. This is pretty new to me and I'm messing up somewhere: I keep getting an empty csv. I'm sure I'm missing something very basic, but darned if I can see it. Here is the code:
```
import xlrd
import os
import csv
from zipfile import ZipFile
import datetime
datafile = "./2013_ERCOT_Hourly_Load_Data.xls"
outfile = "./2013_Max_Loads.csv"
def parse_file(datafile):
workbook = xlrd.open_workbook(datafile)
sheet = workbook.sheet_by_index(0)
data = None
outputlist = []
for col in range(1, sheet.ncols):
cv = sheet.col_values(col, start_rowx=1, end_rowx=None)
header = sheet.cell_value(0,col)
maxval = max(cv)
maxpos = cv.index(maxval) + 1
maxtime = sheet.cell_value(maxpos, 0)
realtime = xlrd.xldate_as_tuple(maxtime, 0)
year = realtime[0]
month = realtime[1]
day = realtime[2]
hour = realtime[3]
data = [
'Region:', header,
'Year:', year,
'Month:', month,
'Day:', day,
'Hour:', hour,
maxpos,
maxtime,
realtime,
maxval,
]
path = "./2013_Max_Loads.csv"
return outputlist
def save_file(data, filename):
with open(filename, "wb") as f:
writer = csv.writer(f, delimiter='|')
for line in data:
writer.writerow(line)
parse_file(datafile)
save_file(parse_file(datafile),"2013_Max_Loads.csv")
You declare outfile but you don't use it
You aren't passing a directory (path) for the file to be saved in.
I also think that calling parse_file twice might be messing you up. Just pass the filename and call it from within the save_file function.
I also found that you were returning output list as a blank list.
So here, try this. I will assume your xlrd commands are correct, because I have not personally used the module.
import csv
import xlrd
def parse_file(datafile):
workbook = xlrd.open_workbook(datafile)
sheet = workbook.sheet_by_index(0)
outputlist = []
outputlist_append = outputlist.append
for col in range(1, sheet.ncols):
cv = sheet.col_values(col, start_rowx=1, end_rowx=None)
header = sheet.cell_value(0,col)
maxval = max(cv)
maxpos = cv.index(maxval) + 1
maxtime = sheet.cell_value(maxpos, 0)
realtime = xlrd.xldate_as_tuple(maxtime, 0)
year = realtime[0]
month = realtime[1]
day = realtime[2]
hour = realtime[3]
data = [
'Region:', header,
'Year:', year,
'Month:', month,
'Day:', day,
'Hour:', hour,
maxpos,
maxtime,
realtime,
maxval,
]
outputlist_append(data)
return outputlist
def save_file(data, filename):
parse_file(data)
with open(filename, 'wb') as f:
writer = csv.writer(f, delimiter='|')
for line in data:
writer.writerow(line)
return
datafile = "./2013_ERCOT_Hourly_Load_Data.xls"
outfile = "./2013_Max_Loads.csv"
save_file(datafile, outfile)
UPDATE: Edit in code in function save_file() to implement #wwii's suggestion.
Try substituting the new save_file() below:
def save_file(data, filename):
parse_file(data)
with open(filename, 'wb') as f:
wr = csv.writer(f, delimiter='|')
wr.writerows(data)
return
Also, change the variable (you used writer) to something like wr. You really want to avoid any possible conflicts with having a variable with the same name as a method, a function, or class you are calling.

Categories

Resources