How to operate on unsaved Excel file?

How to operate on unsaved Excel file? - python

I'd like to automate a loop:
ABAQUS generates a Excel file;
Matlab utilises data in Excel file;
loop 1 and 2.
Now my question is: after step 1, the Excel file from ABAQUS is unsaved as Book1. I cannot use Matlab command to save it. Is there a way not to save this ''Book1'' file, but use the data in it? Or if I can find where it is so I can use the data inside? (I assume that Excel always saves the file even though user doesn't?)
Thank you!

As agentp mentioned, if you are running Abaqus via a Python script, you can just use Python to create a .txt file to save all the relevant information. If well structured, a .txt file can be as readable as an Excel spreadsheet. Because Matlab and Python have intrinsic functions to read and write files this communication can be easily done.
As for Matlab calling Abaqus, you can use something similar to:
system('abaqus cae nogui=YOUR_SCRIPT.py')

Your script that pipes to Excel should have some code similar to this:
abq_ExcelUtilities.excelUtilities.XYtoExcel(
xyDataNames='S:Mises PI: PART-1-1 E: 4309 IP: 1', trueName='')
writing the same data to a report (.rpt) file the code looks like this:
x0 = session.xyDataObjects['S:Mises PI: PART-1-1 E: 4309 IP: 1']
session.writeXYReport(fileName='abaqus.rpt', xyData=(x0, ))
now to "roll your own", use that x0 object: x0.data is a regular python tuple holding the actual data which you can write to a file however you like, eg:
file=open('myfile.csv','w')
for point in x0.data: file.write('%g,%g\n'%point)
file.close()
(you can comment or delete the writeXYReport call )

Related

workbook save failing, not sure why

I apologize for the length of this. I am a relative Neophyte to Excel VBA and even more junior with Python. I have run into an issue with an error that occasionally occurs in python using OpenPyXl (just trying that for the first time).
Background: I have a series of python scripts (12) running and querying an API to gather data and populate 12 different, though similar, workbooks. Separately, I have a equal number of Excel instances periodically looking for that data and doing near-real-time analysis and reporting. Another python script looks for key information to be reported from the spreadsheets and will text it to me when identified. The problem seems to occur between the data gathering python scripts and a copy command in the data analysis workbooks.
The way the python data gathering scripts "talk" to the analysis workbooks is via the sheets they build in their workbooks. The existing vba in the analysis workbooks will copy the data workbooks to another directory (so that they can be opened and manipulated without impacting their use by the python scripts) and then interpret and copy the data into the Excel analysis workbook. Although I recently tested a method to read the data directly from those python-created workbooks without opening them, the vba will require some major surgery to convert to that method and is likely not going to happen soon.
TL,DR: There are data workbooks and analysis workbooks. Python builds the data workbooks and the analysis workbooks use VBA to copy the data workbooks to another directory and load specific data from the copied data workbooks. There is a one-to-one correspondence between the data and analysis workbooks.
Based on the above, I believe that the only "interference" that occurs with the data workbooks is when the macro in the analysis workbook copies the workbook. I thought this would be a relatively safe level of interference, but it apparently is not.
The copy is done in VBA with this set of commands (the actual VBA sub is about 500 lines):
fso.CopyFile strFromFilePath, strFilePath, True
where fso is set thusly:
Set fso = CreateObject("Scripting.FileSystemObject")
and the strFromFilePath and strFilePath both include a fully qualified file name (with their respective paths). This has not generated any errors on the VBA side.
The data is copied about once a minute (though it varies from 40 seconds to about 5 minutes) and seems to work fine from a VBA perspective.
What fails is the python side about 1% of the time (which is probably 12 or fewer times daily. While that seems small, the associated data capture process halts until I notice and restart it. This means anywhere from 1 to all 12 of the data capture processes will fail at some point each day.
Here is what a failure looks like:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
monitor('DLD',1,13,0)
File "<string>", line 794, in monitor
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\abcd\AppData\Local\Programs\Python\Python39\lib\zipfile.py", line 1239, in __init__
self.fp = io.open(file, filemode)
PermissionError: [Errno 13] Permission denied: 'DLD20210819.xlsx'
and I believe it occurs as a result of the following lines of python code (which comes after a while statement with various if conditions to populate the worksheets). The python script itself is about 200 lines long:
time.sleep(1) # no idea why wb.save sometimes fails; trying a delay
wb.save(FileName)
Notice, I left in one of the attempts to correct this. I have tried waiting as much as 3 seconds with no noticeable difference.
I admit I have no idea how to detect errors thrown by OpenPyXl and am quite unskilled at python error handling, but I had tried this code yesterday:
retries = 1
success = False
while not success and retries < 3:
try:
wb.save
success = True
except PermissionError as saveerror:
print ('>>> Save Error: ',saveerror)
wait = 3
print('=== Waiting %s secs and re-trying... ===' % wait)
#sys.stdout.flush()
time.sleep(wait)
retries += 1
My review of the output tells me that the except code never executed while testing the data capture routine over 3000 times. However, the "save" also never happened so the analysis spreadsheets did not receive any information until later when the python code saved the workbook and closed it.
I also tried adding a wb.close after setting the success variable to true, but got the same results.
I am considering either rewriting the VBA to try to grab the data directly from the unopened data workbooks without first copying them (which actually sounds more dangerous) or using an external synching tool to copy them outside of VBA (which could potentially cause exactly the same problem).
Does anyone have an idea of what may be happening and how to address it? It works nearly all the time but just fails several times a day.
Can someone help me to better understand how to trap the error thrown by OpenPyXl so that I can have it retry rather than just abending?
Any suggestions are appreciated. Thank you for reading.

Not sure if this is the best way, but the comment from simpleApp gave me an idea that I may want to use a technique I used elsewhere in the VBA. Since I am new to these tools, perhaps someone can suggest a cleaner approach, but I am going to try using a semaphore file to signal when I am copying the file to alert the python script that it should avoid saving.
In the below I am separating out the directory the prefix and the suffix. The prefix would be different for each of the 12 or more instances I am running and I have not figured out where I want to put these files nor what suffix I should use, so I made them variables.
For example, in the VBA I will have something like this to create a file saying currently available:
Dim strSemaphoreFolder As String
Dim strFilePrefix As String
Dim strFileDeletePath As String
Dim strFileInUseName As String
Dim strFileAvailableName As String
Dim strSemaphoreFileSuffix As String
Dim fso As Scripting.FileSystemObject
Dim fileTemp As TextStream
Set fso = CreateObject("Scripting.FileSystemObject")
strSemaphoreFileSuffix = ".txt"
strSemaphoreFolder = "c:\temp\monitor\"
strFilePrefix = "RJD"
strFileDeletePath = strSemaphoreFolder & strFilePrefix & "*" & strSemaphoreFileSuffix
' Clean up remnants from prior activities
If Len(Dir(strFileDeletePath)) > 0 Then
Kill strFileDeletePath
End If
' files should be gone
' Set the In-use and Available Names
strFileInUseName = strFilePrefix & "InUse" & strSemaphoreFileSuffix
strFileAvailableName = strFilePrefix & "Available" & strSemaphoreFileSuffix
' Create an available file
Set fileTemp = fso.CreateTextFile(strSemaphoreFolder & strFileAvailableName, True)
fileTemp.Close
' available file should be there
Then, when I am about to copy the file, I will briefly change the filename to indicate that the file is in use, perform the potentially problematic copy and then change it back with something like this:
' Temporarily name the semaphore file to "In Use"
Name strSemaphoreFolder & strFileAvailableName As strSemaphoreFolder & strFileInUseName
fso.CopyFile strFromFilePath, strFilePath, True
' After copying the file name it back to "Available"
Name strSemaphoreFolder & strFileInUseName As strSemaphoreFolder & strFileAvailableName
Over in the Python script, before I do the wb.save command, I will insert a check to see whether the file indicates that it is available or in use with something like this:
prefix = 'RJD'
directory = 'c:\\temp\\monitor\\'
suffix = '.txt'
filepathname = directory + prefix + 'Available' + suffix
while not (os.path.isfile(directory + prefix + 'Available' + suffix)):
time.sleep(1)
wb.save
Does this seem like it would work?
I am thinking that it should avoid the failure if I have properly identified it as an attempt to save the file in the Python script while the VBA script is telling the operating system to copy it.
Thoughts?
afterthoughts:
Using the technique I described, I probably need to create the "Available" semaphore file in the Python script and simply assume it will be there in the VBA script since the Python script is collecting the data and may be doing so before the VBA is even started.
A better alternative may be to simply check for the existence of the "In Use" file which will never be there unless the VBA wants it there, like this:
while (os.path.isfile(directory + prefix + 'InUse' + suffix)):
time.sleep(1)
wb.save

How to save Python dataset (previously exported from IDL) back to IDL format

I have a file in IDL, I import it to Python using readsav from scipy, I change a parameter in the file and I want to export / save it back to the original format, IDL readable.
This is how I import it:
from scipy.io.idl import readsav
input = readsav('Original_file.inp')

I haven't tested any of this, but here are a few options to try:
Python-to-IDL/IDL-to-Python Bridge
The Python to IDL bridge provides a way to run IDL routines within python. You could try the following
from idlpy import *
from scipy.io.idl import readsav
input = readsav('Original_file.inp')
**change parameter**
IDL.run("SAVE, /VARIABLES, FILENAME = 'New_file.sav'")
There is also an IDL to Python bridge, which might allow you to perform your desired Python operation within IDL, and skip all the loading and saving of files...
Read/Write JSON
It looks like readsav() just returns a dictionary of the contents of the IDL save file. I'm not sure of the contents of your file, so I don't know if this would work, but perhaps you could just write it as a JSON string,
import json
from scipy.io.idl import readsav
input = readsav('Original_file.inp')
**change parameter**
with open('New_file.txt', 'w') as outfile:
json.dump(modified_input, outfile)
and then read it back into IDL with JSON_PARSE() (documentation here).
Write your own hack
If all else fails, you could look at Craig Markwardt's Unofficial Format Specification
of the IDL "SAVE" File, and write some custom code to write an IDL save file directly from Python. If nothing else, it would be an interesting exercise.

Converting Python script to be able to run in Spark/Hadoop

I have a python script that currently runs on my desktop. It takes a csv file with roughly 25 million lines (Maybe 15 or so columns) and performs line by line operations.
For each line of input, multiple output lines are produced. The results are then output line by line into a csv file, the output ends up at around 100 million lines.
Code looks something like this:
with open(outputfile,"a") as outputcsv:
with open(inputfile,"r") as input csv:
headerlist=next(csv.reader(csvfile)
for row in csv.reader(csvfile):
variable1 = row[headerlist.index("VAR1")]
variableN = row[headerlist.index("VARN")]
while calculations not complete:
do stuff #Some complex calculations are done at this point
outputcsv.write(stuff)
We're now trying to convert the script to run via Hadoop, using pyspark.
I have no idea how to even start. I'm trying to work out how to iterate through an RDD object but don't think it can be done.
Is a line by line calculation like this suitable for distributed processing?

If you directly want to run the script, you could do so via spark-submit:
spark-submit master local[*]/yarn other_parameters path_to_your_script.py
But I would suggest to go for spark API's as they are easy to use. It will lower the coding overhead.
First you have to create a spark session variable so that you could access all spark functions:
spark = SparkSession
.builder()
.appName("SparkSessionZipsExample")
.config("parameters", "value")
.getOrCreate()
Next, if you want to load a csv file:
file = spark.read.csv("path to file")
You can specify optional parameters like headers, inferschema, etc:
file=spark.read.option("header","true").csv("path to your file")
'file' will now be a pyspark dataframe.
You can now write the end output like this:
file.write.csv("output_path")
Please refer to the documentation : spark documentation for transformations and other information.

Saving multiple items to HDFS with (spark, python, pyspark, jupyter)

I´m used to program in Python. My company now got a Hadoop Cluster with Jupyter installed. Until now I never used Spark / Pyspark for anything.
I am able to load files from HDFS as easy as this:
text_file = sc.textFile("/user/myname/student_grades.txt")
And I´m able to write output like this:
text_file.saveAsTextFile("/user/myname/student_grades2.txt")
The thing I´m trying to achieve is to use a simple "for loop" to read text files one-by-one and write it's content into one HDFS file. So I tried this:
list = ['text1.txt', 'text2.txt', 'text3.txt', 'text4.txt']
for i in list:
text_file = sc.textFile("/user/myname/" + i)
text_file.saveAsTextFile("/user/myname/all.txt")
So this works for the first element of the list, but then gives me this error message:
Py4JJavaError: An error occurred while calling o714.saveAsTextFile.
: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
XXXXXXXX/user/myname/all.txt already exists
To avoid confusion I "blured"-out the IP address with XXXXXXXX.
What is the right way to do this?
I will have tons of datasets (like 'text1', 'text2' ...) and want to perform a python function with each of them before saving them into HDFS. But I would like to have the results all together in "one" output file.
Thanks a lot!
MG
EDIT:
It seems like that my final goal was not really clear. I need to apply a function to each text file seperately and then I want to append the output to the existing output directory. Something like this:
for i in list:
text_file = sc.textFile("/user/myname/" + i)
text_file = really_cool_python_function(text_file)
text_file.saveAsTextFile("/user/myname/all.txt")

I wanted to post this as comment but could not do so as I do not have enough reputation.
You have to convert your RDD to dataframe and then write it in append mode. To convert RDD to dataframe please look into this answer:
https://stackoverflow.com/a/39705464/3287419
or this link http://spark.apache.org/docs/latest/sql-programming-guide.html
To save dataframe in append mode below link may be useful:
http://spark.apache.org/docs/latest/sql-programming-guide.html#save-modes
Almost same question is here also Spark: Saving RDD in an already existing path in HDFS . But the answer provided is for scala. I hope something similar can be done in python also.
There is yet another (but ugly) approach. Convert your RDD to string. Let the resulting string be resultString . Use subprocess to append that string to destination file i.e.
subprocess.call("echo "+resultString+" | hdfs dfs -appendToFile - <destination>", shell=True)

you can read multiple files and save them by
textfile = sc.textFile(','.join(['/user/myname/'+f for f in list]))
textfile.saveAsTextFile('/user/myname/all')
you will get all part files within output directory.

If the text files all have the same schema, you could use Hive to read the whole folder as a single table, and directly write that output.

I would try this, it should be fine:
list = ['text1.txt', 'text2.txt', 'text3.txt', 'text4.txt']
for i in list:
text_file = sc.textFile("/user/myname/" + i)
text_file.saveAsTextFile(f"/user/myname/{i}")

Accessing Running Python program from another Python program

I have the following program running
collector.py
data=0
while True:
#collects data
data=data+1
I have another program cool.py which wants to access the current data value. How can I do this?
Ultimately, something like:
cool.py
getData()
*An Idea would be to use a global variable for data?

You can use memory mapping.
http://docs.python.org/2/library/mmap.html
For example you open a file in tmp directore, next u mapping this file to memory in both program and write u data to this file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.