How to design data provider - python

I have an application made of many individual scripts. Output of each of them is an input od the next one. Each script reads data on the beggining and saves modified data as its last activity. In short:
script1.py: reads mariadb data to df -> does stuff -> saves raw data in mysql.sql sqlite3 format
script2.py: reads sqlite3 file -> does stuff -> saves raw data in data.txt - tab separated values
program3.exe: reads data.txt -> does stuff -> writes another.txt - tab separated values
script4.py: reads another.txt -> does stuff -> creates data4.csv
script5.py: reads data4.csv -> does stuff -> inserts mariadb entries
What I am searching and asking for is: is there any design pattern (or other mechanism) for creating data provider for situation like that? "Data provider" should be a some abstraction layer which:
have different data source types (like mariadb connection, csv files, txt files, others) predefined and easy to extern that list.
should reads data from "data-specified-source" and deliver the data to given script/proggram (f.i. by execute script with parameter)
should validate if output of each application part (each script/program) is valid or take over the task of generating this data
In general "Data provider" would run script1.py with some parameter (dataframe?) in some sandbox, take over data before it is saved and prepare data for script2.py proper execution. OR it just could run script1.py with some parameter, wait for execution, check if output is valid, convert (if necessary) that output to another format and run script2.py with well-prepared data.
I have access to python script sources (script1.py ... script5.py) and I can modify them. I am unable to modify program3.exe source code but it is always one part of the whole process. What is the best way (or just a way) to design such a layer?

Since you include a .exe file, I'll assume you are using Windows. You can write a batch file or a powershell script. On linux the equivalent would be a bash script.
If your sources and destinations are hard coded, then the batch file is going to be something like
script1.py
REM assume output file is named mysql.sql
script2.py
REM assume output file is data.txt and has tab separated values
program3.exe
REM assume output file is another.txt and has tab separated values
script4.py
REM creates data4.csv
script5.py
The REM is short for REMARK in a batch file and allows for commenting.

Related

workbook save failing, not sure why

I apologize for the length of this. I am a relative Neophyte to Excel VBA and even more junior with Python. I have run into an issue with an error that occasionally occurs in python using OpenPyXl (just trying that for the first time).
Background: I have a series of python scripts (12) running and querying an API to gather data and populate 12 different, though similar, workbooks. Separately, I have a equal number of Excel instances periodically looking for that data and doing near-real-time analysis and reporting. Another python script looks for key information to be reported from the spreadsheets and will text it to me when identified. The problem seems to occur between the data gathering python scripts and a copy command in the data analysis workbooks.
The way the python data gathering scripts "talk" to the analysis workbooks is via the sheets they build in their workbooks. The existing vba in the analysis workbooks will copy the data workbooks to another directory (so that they can be opened and manipulated without impacting their use by the python scripts) and then interpret and copy the data into the Excel analysis workbook. Although I recently tested a method to read the data directly from those python-created workbooks without opening them, the vba will require some major surgery to convert to that method and is likely not going to happen soon.
TL,DR: There are data workbooks and analysis workbooks. Python builds the data workbooks and the analysis workbooks use VBA to copy the data workbooks to another directory and load specific data from the copied data workbooks. There is a one-to-one correspondence between the data and analysis workbooks.
Based on the above, I believe that the only "interference" that occurs with the data workbooks is when the macro in the analysis workbook copies the workbook. I thought this would be a relatively safe level of interference, but it apparently is not.
The copy is done in VBA with this set of commands (the actual VBA sub is about 500 lines):
fso.CopyFile strFromFilePath, strFilePath, True
where fso is set thusly:
Set fso = CreateObject("Scripting.FileSystemObject")
and the strFromFilePath and strFilePath both include a fully qualified file name (with their respective paths). This has not generated any errors on the VBA side.
The data is copied about once a minute (though it varies from 40 seconds to about 5 minutes) and seems to work fine from a VBA perspective.
What fails is the python side about 1% of the time (which is probably 12 or fewer times daily. While that seems small, the associated data capture process halts until I notice and restart it. This means anywhere from 1 to all 12 of the data capture processes will fail at some point each day.
Here is what a failure looks like:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
monitor('DLD',1,13,0)
File "<string>", line 794, in monitor
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'DLD20210819.xlsx'
and I believe it occurs as a result of the following lines of python code (which comes after a while statement with various if conditions to populate the worksheets). The python script itself is about 200 lines long:
time.sleep(1) # no idea why wb.save sometimes fails; trying a delay
wb.save(FileName)
Notice, I left in one of the attempts to correct this. I have tried waiting as much as 3 seconds with no noticeable difference.
I admit I have no idea how to detect errors thrown by OpenPyXl and am quite unskilled at python error handling, but I had tried this code yesterday:
retries = 1
success = False
while not success and retries < 3:
try:
wb.save
success = True
except PermissionError as saveerror:
print ('>>> Save Error: ',saveerror)
wait = 3
print('=== Waiting %s secs and re-trying... ===' % wait)
#sys.stdout.flush()
time.sleep(wait)
retries += 1
My review of the output tells me that the except code never executed while testing the data capture routine over 3000 times. However, the "save" also never happened so the analysis spreadsheets did not receive any information until later when the python code saved the workbook and closed it.
I also tried adding a wb.close after setting the success variable to true, but got the same results.
I am considering either rewriting the VBA to try to grab the data directly from the unopened data workbooks without first copying them (which actually sounds more dangerous) or using an external synching tool to copy them outside of VBA (which could potentially cause exactly the same problem).
Does anyone have an idea of what may be happening and how to address it? It works nearly all the time but just fails several times a day.
Can someone help me to better understand how to trap the error thrown by OpenPyXl so that I can have it retry rather than just abending?
Any suggestions are appreciated. Thank you for reading.
Not sure if this is the best way, but the comment from simpleApp gave me an idea that I may want to use a technique I used elsewhere in the VBA. Since I am new to these tools, perhaps someone can suggest a cleaner approach, but I am going to try using a semaphore file to signal when I am copying the file to alert the python script that it should avoid saving.
In the below I am separating out the directory the prefix and the suffix. The prefix would be different for each of the 12 or more instances I am running and I have not figured out where I want to put these files nor what suffix I should use, so I made them variables.
For example, in the VBA I will have something like this to create a file saying currently available:
Dim strSemaphoreFolder As String
Dim strFilePrefix As String
Dim strFileDeletePath As String
Dim strFileInUseName As String
Dim strFileAvailableName As String
Dim strSemaphoreFileSuffix As String
Dim fso As Scripting.FileSystemObject
Dim fileTemp As TextStream
Set fso = CreateObject("Scripting.FileSystemObject")
strSemaphoreFileSuffix = ".txt"
strSemaphoreFolder = "c:\temp\monitor\"
strFilePrefix = "RJD"
strFileDeletePath = strSemaphoreFolder & strFilePrefix & "*" & strSemaphoreFileSuffix
' Clean up remnants from prior activities
If Len(Dir(strFileDeletePath)) > 0 Then
Kill strFileDeletePath
End If
' files should be gone
' Set the In-use and Available Names
strFileInUseName = strFilePrefix & "InUse" & strSemaphoreFileSuffix
strFileAvailableName = strFilePrefix & "Available" & strSemaphoreFileSuffix
' Create an available file
Set fileTemp = fso.CreateTextFile(strSemaphoreFolder & strFileAvailableName, True)
fileTemp.Close
' available file should be there
Then, when I am about to copy the file, I will briefly change the filename to indicate that the file is in use, perform the potentially problematic copy and then change it back with something like this:
' Temporarily name the semaphore file to "In Use"
Name strSemaphoreFolder & strFileAvailableName As strSemaphoreFolder & strFileInUseName
fso.CopyFile strFromFilePath, strFilePath, True
' After copying the file name it back to "Available"
Name strSemaphoreFolder & strFileInUseName As strSemaphoreFolder & strFileAvailableName
Over in the Python script, before I do the wb.save command, I will insert a check to see whether the file indicates that it is available or in use with something like this:
prefix = 'RJD'
directory = 'c:\\temp\\monitor\\'
suffix = '.txt'
filepathname = directory + prefix + 'Available' + suffix
while not (os.path.isfile(directory + prefix + 'Available' + suffix)):
time.sleep(1)
wb.save
Does this seem like it would work?
I am thinking that it should avoid the failure if I have properly identified it as an attempt to save the file in the Python script while the VBA script is telling the operating system to copy it.
Thoughts?
afterthoughts:
Using the technique I described, I probably need to create the "Available" semaphore file in the Python script and simply assume it will be there in the VBA script since the Python script is collecting the data and may be doing so before the VBA is even started.
A better alternative may be to simply check for the existence of the "In Use" file which will never be there unless the VBA wants it there, like this:
while (os.path.isfile(directory + prefix + 'InUse' + suffix)):
time.sleep(1)
wb.save

How do I send/read data from VBA in Python?

Background
Right now I'm creating a macro to help automate the creation of some graphs in VBA. However, the creation of the graphs requires specific tasks to be done, for example, certain points in a series to be larger depending on previous instances. I would much rather do this data manipulation in python.
Problem
I want to use excel for its user-friendly interface but want to handle all the data manipulation within Python. How can I send data I create in VBA to python. To clarify I'm not trying to read specific cells in the excel sheet.
If I define a string in VBA say...
Dim example_string as String
example_string = "Hello, 1, 2, 3, Bye"
How can I send this information I created within VBA to Python for manipulation?
More Specifics
I have a textbox in excel that is filled by the user, which I read using VBA. I want to send that txt data from VBA to python. The user highlights the desired cells, which are not necessarily the same each time, clicks a button and fills a textbox. I don't want to use range or specific cell selection since this would require the user to specifically enter all the desired data into cells (too time-consuming).
I want to understand the basic procedure of how to send data between VBA and python.
You can do the whole thing in python, it will be more efficient and you can either use excel or sqlite3 as database, go here to read about graphic interfaces with tkinter, use pandas and numpy to process your data.
If you insist in sending data to python, import sys to your python script to read parameters and then run it from vba with the shell() method.
EDIT: You wanted an example, here it is =>
Open a new excel file, create a procedure like this (VBA CODE):
Sub sendToPython()
Dim shell As Object
Dim python As String
Dim callThis As String
Dim passing
Set shell = VBA.CreateObject("Wscript.Shell")
'/* This is where you installed python (Notice the triple quotes and use your own path *always)*/
python = """C:\Users\yourUserName\appdata\local\programs\python\python37\python.exe"""
'/* This is the data you'll be passing to python script*/
passing = "The*eye*of*the*tiger"
callThis = "C:\Users\yourUserName\desktop\yourScriptName.py " & passing & ""
shell.Run python & callThis
End Sub
The idea is to create some kind of a parser in python, this is my silly example (PYTHON CODE):
import sys
f = open("log.txt", "w")
arg = (sys.argv[1]).split("*")
s = " "
arg = s.join(arg)
print("This is the parameter i've entered: " + arg, file=f)
Notice how i used sys to read a parameter and i exported to actually see some results because otherwise you'll just see a black screen popping up for like a millisecond.
I also found this article, but it requires you to wrap the python script in a class and i don't know if that works for you

Using pyTDMS in Python

I am currently running a labview script that uses a DAQmx controller to take in voltage readings from batteries. The script takes in the voltage data and the amount of time that the test was run for and writes a TDMS file.
I would like to write a python script to take in this TDMS file and average some of the values. I have not been able to figure out how to use pyTDMS in order to read in this file.
Does anyone know how to use pyTDMS in python?
this is what my file looks like when I open it in notepad++
TDSm® i ä ä /ÿÿÿÿ name junk071316_164717 /'DAQ Assistant_0'ÿÿÿÿ /'DAQ Assistant_0'/'Voltage'i ÿÿÿÿ NI_Scaling_Status unscaled NI_Number_Of_Scales NI_Scale[1]_Scale_Type Linear NI_Scale[1]_Linear_Slope
I H Hà4? NI_Scale[1]_Linear_Y_Intercept
¿Ú$À NI_Scale[1]_Linear_Input_Source /'DAQ Assistant_0'/'Voltage_0'i ÿÿÿÿ NI_Scaling_Status unscaled NI_Number_Of_Scales NI_Scale[1]_Scale_Type Linear NI_Scale[1]_Linear_Slope
òòòà4? NI_Scale[1]_Linear_Y_Intercept
`IÝ$À N

How to operate on unsaved Excel file?

I'd like to automate a loop:
ABAQUS generates a Excel file;
Matlab utilises data in Excel file;
loop 1 and 2.
Now my question is: after step 1, the Excel file from ABAQUS is unsaved as Book1. I cannot use Matlab command to save it. Is there a way not to save this ''Book1'' file, but use the data in it? Or if I can find where it is so I can use the data inside? (I assume that Excel always saves the file even though user doesn't?)
Thank you! 
As agentp mentioned, if you are running Abaqus via a Python script, you can just use Python to create a .txt file to save all the relevant information. If well structured, a .txt file can be as readable as an Excel spreadsheet. Because Matlab and Python have intrinsic functions to read and write files this communication can be easily done.
As for Matlab calling Abaqus, you can use something similar to:
system('abaqus cae nogui=YOUR_SCRIPT.py')
Your script that pipes to Excel should have some code similar to this:
abq_ExcelUtilities.excelUtilities.XYtoExcel(
xyDataNames='S:Mises PI: PART-1-1 E: 4309 IP: 1', trueName='')
writing the same data to a report (.rpt) file the code looks like this:
x0 = session.xyDataObjects['S:Mises PI: PART-1-1 E: 4309 IP: 1']
session.writeXYReport(fileName='abaqus.rpt', xyData=(x0, ))
now to "roll your own", use that x0 object: x0.data is a regular python tuple holding the actual data which you can write to a file however you like, eg:
file=open('myfile.csv','w')
for point in x0.data: file.write('%g,%g\n'%point)
file.close()
(you can comment or delete the writeXYReport call )

What is the best way to write a Batch script in?

I am having around 20 scripts, each produce one output file as the output which is fed back as input to the next file. I want to now provide the user with an option to restart the batch script from any point in the script.
My friend suggested using make or ant having targets defined for each python script. I want to know your(advanced hackers) suggestions.
Thank you
Make works like this:
Target: dependencies
commands
Based on your scripts, you might try this type of Makefile:
Step20: output19
script20 #reads output19 and produces final output
Step19: output18
script19 # reads output18 and produces output19
.. etc ..
Step2: output1
script2 # reads output1 and produces output2
Step1:
script1 # produces output1
That way, each script won't be run until the output from the previous step has been produced. Running make Step20 will travel down the entire chain, and start at script1 if none of the outputs exist. Or, if output15 exists, it will start running script16.

Categories

Resources