I have a confusing issue in some code I'm currently running in a Jupyter notebook. I am reading in a couple thousand text files, each of them tens of MB, with Pandas, and searching for certain values within them. To track the progress while this happens, I threw in some print statements every once in a while. The important pieces of the code here are:
read_dir = '/Volumes/Data/'
all_text_files = os.listdir(read_dir)
all_text_files.sort()
# Split the file up so we at least get some output if this doesn't finish running
file_chunks = [all_text_files[i:i+100] for i in range(0,len(all_text_files),100)]
for chunk_num, current_chunk in enumerate(file_chunks):
print('Chunk', chunk_num)
for file_num, file in enumerate(current_chunk):
if file_num % 10 == 0:
print('File', file_num, '-', file)
# Read in the data down here
Normally, I see a bunch of print statements like
File 10 - some_file.txt
However, I recently saw one of these print statements stall midway. That is, the Jupyter output sat at File 10 - for a couple of minutes, and then eventually it completed to the above line, and the code continued to run.
Any idea how it got stuck in the middle of a print statement like that? Is that stalling likely within the Python print function, or is it in displaying that output on the Jupyter notebook? And finally, is there any information to take away about this - has something in my own code potentially caused this?
Related
TLDR: How can I make a notebook cell save its own python code to a file so that I can reference it later?
I'm doing tons of small experiments where I make adjustments to Python code to change its behaviour, and then run various algorithms to produce results for my research. I want to save the cell code (the actual python code, not the output) into a new uniquely named file every time I run it so that I can easily keep track of which experiments I have already conducted. I found lots of answers on saving the output of a cell, but this is not what I need. Any ideas how to make a notebook cell save its own code to a file in Google Colab?
For example, I'm looking to save a file that contains the entire below snippet in text:
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
All cell codes are stored in a List variable In.
For example you can print the lastest cell by
print(In[-1]) # show itself
# print(In[-1]) # show itself
So you can easily save the content of In[-1] or In[-2] to wherever you want.
Posting one potential solution but still looking for a better and cleaner option.
By defining the entire cell as a string, I can execute it and save to file with a separate command:
cell_str = '''
df['signal adjusted'] = df['signal'].pct_change() + df['baseline']
results = run_experiment(df)
'''
exec(cell_str)
with open('cell.txt', 'w') as f:
f.write(cell_str)
I have multiple codes written in different cells in jupyter notebook. The first cell contains the file name on which I need to perform the task. I am facing difficulty in running loop in jupyter notebook as I want to perform operation file by file. i.e. first take file1 to go through all the cells and then come back to lookout for file2 and so on.
I know a similar question has been asked Link1 but I am not sure how it can be done in jupter as I know in spyder we can indent in for loop and it'll run till we complete all the task then it jumps to another one but here in jupyter it seems difficult as it's cell by cell operation.
Cell 1 file_names = ['file1','file2','file3']
Cell 2 a = []
Cell 3 for file in file_names:
a.append(file)
Your distribution of code to the cells is fine (i.e. it is one possibility out of many).
You define variables in cells 1 and 2 and execute the loop in cell 3. Your file processing takes place only in cell 3.
Of course, every cell must contain valid python syntax (as must a script opened in spyder). This means the body of the for-loop must be correctly indented.
I apologize for the length of this. I am a relative Neophyte to Excel VBA and even more junior with Python. I have run into an issue with an error that occasionally occurs in python using OpenPyXl (just trying that for the first time).
Background: I have a series of python scripts (12) running and querying an API to gather data and populate 12 different, though similar, workbooks. Separately, I have a equal number of Excel instances periodically looking for that data and doing near-real-time analysis and reporting. Another python script looks for key information to be reported from the spreadsheets and will text it to me when identified. The problem seems to occur between the data gathering python scripts and a copy command in the data analysis workbooks.
The way the python data gathering scripts "talk" to the analysis workbooks is via the sheets they build in their workbooks. The existing vba in the analysis workbooks will copy the data workbooks to another directory (so that they can be opened and manipulated without impacting their use by the python scripts) and then interpret and copy the data into the Excel analysis workbook. Although I recently tested a method to read the data directly from those python-created workbooks without opening them, the vba will require some major surgery to convert to that method and is likely not going to happen soon.
TL,DR: There are data workbooks and analysis workbooks. Python builds the data workbooks and the analysis workbooks use VBA to copy the data workbooks to another directory and load specific data from the copied data workbooks. There is a one-to-one correspondence between the data and analysis workbooks.
Based on the above, I believe that the only "interference" that occurs with the data workbooks is when the macro in the analysis workbook copies the workbook. I thought this would be a relatively safe level of interference, but it apparently is not.
The copy is done in VBA with this set of commands (the actual VBA sub is about 500 lines):
fso.CopyFile strFromFilePath, strFilePath, True
where fso is set thusly:
Set fso = CreateObject("Scripting.FileSystemObject")
and the strFromFilePath and strFilePath both include a fully qualified file name (with their respective paths). This has not generated any errors on the VBA side.
The data is copied about once a minute (though it varies from 40 seconds to about 5 minutes) and seems to work fine from a VBA perspective.
What fails is the python side about 1% of the time (which is probably 12 or fewer times daily. While that seems small, the associated data capture process halts until I notice and restart it. This means anywhere from 1 to all 12 of the data capture processes will fail at some point each day.
Here is what a failure looks like:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
monitor('DLD',1,13,0)
File "<string>", line 794, in monitor
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'DLD20210819.xlsx'
and I believe it occurs as a result of the following lines of python code (which comes after a while statement with various if conditions to populate the worksheets). The python script itself is about 200 lines long:
time.sleep(1) # no idea why wb.save sometimes fails; trying a delay
wb.save(FileName)
Notice, I left in one of the attempts to correct this. I have tried waiting as much as 3 seconds with no noticeable difference.
I admit I have no idea how to detect errors thrown by OpenPyXl and am quite unskilled at python error handling, but I had tried this code yesterday:
retries = 1
success = False
while not success and retries < 3:
try:
wb.save
success = True
except PermissionError as saveerror:
print ('>>> Save Error: ',saveerror)
wait = 3
print('=== Waiting %s secs and re-trying... ===' % wait)
#sys.stdout.flush()
time.sleep(wait)
retries += 1
My review of the output tells me that the except code never executed while testing the data capture routine over 3000 times. However, the "save" also never happened so the analysis spreadsheets did not receive any information until later when the python code saved the workbook and closed it.
I also tried adding a wb.close after setting the success variable to true, but got the same results.
I am considering either rewriting the VBA to try to grab the data directly from the unopened data workbooks without first copying them (which actually sounds more dangerous) or using an external synching tool to copy them outside of VBA (which could potentially cause exactly the same problem).
Does anyone have an idea of what may be happening and how to address it? It works nearly all the time but just fails several times a day.
Can someone help me to better understand how to trap the error thrown by OpenPyXl so that I can have it retry rather than just abending?
Any suggestions are appreciated. Thank you for reading.
Not sure if this is the best way, but the comment from simpleApp gave me an idea that I may want to use a technique I used elsewhere in the VBA. Since I am new to these tools, perhaps someone can suggest a cleaner approach, but I am going to try using a semaphore file to signal when I am copying the file to alert the python script that it should avoid saving.
In the below I am separating out the directory the prefix and the suffix. The prefix would be different for each of the 12 or more instances I am running and I have not figured out where I want to put these files nor what suffix I should use, so I made them variables.
For example, in the VBA I will have something like this to create a file saying currently available:
Dim strSemaphoreFolder As String
Dim strFilePrefix As String
Dim strFileDeletePath As String
Dim strFileInUseName As String
Dim strFileAvailableName As String
Dim strSemaphoreFileSuffix As String
Dim fso As Scripting.FileSystemObject
Dim fileTemp As TextStream
Set fso = CreateObject("Scripting.FileSystemObject")
strSemaphoreFileSuffix = ".txt"
strSemaphoreFolder = "c:\temp\monitor\"
strFilePrefix = "RJD"
strFileDeletePath = strSemaphoreFolder & strFilePrefix & "*" & strSemaphoreFileSuffix
' Clean up remnants from prior activities
If Len(Dir(strFileDeletePath)) > 0 Then
Kill strFileDeletePath
End If
' files should be gone
' Set the In-use and Available Names
strFileInUseName = strFilePrefix & "InUse" & strSemaphoreFileSuffix
strFileAvailableName = strFilePrefix & "Available" & strSemaphoreFileSuffix
' Create an available file
Set fileTemp = fso.CreateTextFile(strSemaphoreFolder & strFileAvailableName, True)
fileTemp.Close
' available file should be there
Then, when I am about to copy the file, I will briefly change the filename to indicate that the file is in use, perform the potentially problematic copy and then change it back with something like this:
' Temporarily name the semaphore file to "In Use"
Name strSemaphoreFolder & strFileAvailableName As strSemaphoreFolder & strFileInUseName
fso.CopyFile strFromFilePath, strFilePath, True
' After copying the file name it back to "Available"
Name strSemaphoreFolder & strFileInUseName As strSemaphoreFolder & strFileAvailableName
Over in the Python script, before I do the wb.save command, I will insert a check to see whether the file indicates that it is available or in use with something like this:
prefix = 'RJD'
directory = 'c:\\temp\\monitor\\'
suffix = '.txt'
filepathname = directory + prefix + 'Available' + suffix
while not (os.path.isfile(directory + prefix + 'Available' + suffix)):
time.sleep(1)
wb.save
Does this seem like it would work?
I am thinking that it should avoid the failure if I have properly identified it as an attempt to save the file in the Python script while the VBA script is telling the operating system to copy it.
Thoughts?
afterthoughts:
Using the technique I described, I probably need to create the "Available" semaphore file in the Python script and simply assume it will be there in the VBA script since the Python script is collecting the data and may be doing so before the VBA is even started.
A better alternative may be to simply check for the existence of the "In Use" file which will never be there unless the VBA wants it there, like this:
while (os.path.isfile(directory + prefix + 'InUse' + suffix)):
time.sleep(1)
wb.save
I am trying to make my program read from a file every 5 seconds until it reaches its end.
I have this code:
data = pd.read_csv(path + file_name)
n = len(data.index)
for i in range(0, n):
element = data['first_column_name'][0]
I tried writing time.sleep(5) after reading the element, but it has to be from the beginning, to look like streaming data... if possible
How can I make it read the element from the file every 5 seconds?
Python is by default single threaded, meaning that it only executes one thing at a time. When you use time.sleep, python can do nothing else but watch this timer count down. You seem to want your program to do stuff and periodically check on a file. What I think you are looking for is async/multithreading. This is a big topic with lots of different options for different circumstances; this Real Python article gives a gentle introduction.
I have a written a small program that reads values from two pieces of equipment every minuet and then saves it to a .csv file. I wanted the file to be updated and saved after every collection of every point so that if pc crashes, or other problem occurs no data loss occurs. To do that I open the file (ab mode), use write row and the close the file in a loop. The time between collections is about 1 minuet. This works quiet well, but the problem is after 5-6 hours of data collection, it stops saving to .csv file, and does not bring up any errors, the code continues to run with graph being update like nothing happened, but opening the .csv file reveals that data is lost. I would like to know if there is something wrong with the code I am using. I should also not I am running a subprocess from this that does live plotting, but I do not think it would cause an issue... I added those code lines as well.
##Initial file declaration and header
with open(filename,'wb') as wdata:
savefile=csv.writer(wdata,dialect='excel')
savefile.writerow(['System time','Time from Start(s)','Weight(g)','uS/cm','uS','Measured degC','%/C','Ideal degC','/cm'])
##Open Plotting Subprocess
draw=subprocess.Popen('TriPlot.py',shell=True,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
##data collection loop
while True:
Collect Data x and y
Waits for data about 60 seconds, no sleep or pause commoand used, pyserial inteface is used.
## Send Data to subprocess
draw.stdin.write('%d\n' % tnow)
draw.stdin.write('%d\n' % s_w)
draw.stdin.write('%d\n' % tnow)
draw.stdin.write('%d\n' % float(s_c[5]))
##Saving data Section
wdata=open(filename,'ab')
savefile=csv.writer(wdata,dialect='excel')
savefile.writerow([tcurrent,tnow,s_w,s_c[5],s_c[7],s_c[9],s_c[11],s_c[13],s_c[15]])
wdata.close()
P.S This code uses the following packages for code not shown. pyserial, csv, os, subprocess,Tkinter, string, numpy, time and wx.
If draw.stdin.write() blocks it probably means that you are not consuming draw.stdout in a timely manner. The docs warn about the dead-lock due to the full OS pipe buffer.
If you don't need the output you could set stdout=devnull where devnull = open(os.devnull, 'wb') otherwise there are several approaches to read the output without blocking your code: threads, select, tempfile.TemoraryFile.