I've build a GUI with wxPython in which I use a process to build a table to feed some charts when I click a button.
I build the table and I store it in to a variable to use the information to feed my matplotlib chart.
My problem is that when my chart is finished, based on the already constructed table stored in a variable and the process is finished, I loose the information of that variable and I need to use that same information to make my plot interactive (i.e. to change the plot from line to bar, or stacked or whatever), but the only way I've found is to re run the process to build the table over and over again.
Is there a way to use the stored information of that variable in other processes / modules / charts? I mean, is there a way to keep "active" my variable no matter the process where it was created was finished?
Thanks a lot for your guidance :)
This is done rather easily with the pickle module Here is a simple working example
from pickle import dumps, loads
a_variable = 15 # arbitrary value
with open("a_file.txt", "wb") as fileobj:
# create a pickle string representation of the data
fileobj.write(dumps(a_variable))
# Then to load it from another process
with open("a_file.txt", "rb") as fileobj:
# load the pickle string representation of the data
a_variable = loads(fileobj.read())
Related
I want to ask you how I can keep a variable open and not refill it when I execute the script. As an example, I read the file and assigned all of its lines to a variable. Then, I created some processes to interact with data executed from a file. I realized I needed to change something in my process after running the file, so I changed a few lines and ran the script again. The file is large, and I need to wait for it to upload, so I considered how I could keep the variable that refers to this file open at all times and easily make changes to my script without having to wait so long for it to upload.
import numpy as np
from tqdm import tqdm
from scipy import spatial
# This is the variable that I want to keep always open
embeddings_dict = {}
# This is the current file
filename = "/some_filename"
with open(filename, 'r', encoding="utf-8") as f:
lines = f.readlines()
for i in tqdm(range(len(lines))):
values = lines[i].split()
word = values[0]
vector = np.asarray(values[1:], "float32")
embeddings_dict[word] = vector
# This is the process
def find_closest_embeddings_euc(embedding):
return sorted(embeddings_dict.keys(),
key=lambda word: spatial.distance.euclidean(embeddings_dict[word], embedding))
print(find_closest_embeddings_euc(embeddings_dict['software'])[:10])
I expect to understand how can I make it.
You can't really persist memory in RAM once a process finishes. What you're describing is a classic workflow in the ML community (having to load in some huge dataset in memory and then apply and tweak a series of transformations to it) and a notebook environment is usually the answer.
You can check out how to setup your environment at either of these links:
https://docs.jupyter.org/en/latest/install/notebook-classic.html
https://code.visualstudio.com/docs/datascience/jupyter-notebooks (I recommend this one if you are already using VS Code)
Once you create your first notebook, you can add two cells to it - one for the data loading and another for the transformations. Now you can execute them independently - you can load your data once and apply the transformations and experiment with them as many times as you'd like.
I'm trying to take an API call response and parse the XML data into list, but I am struggling with the multiple child/parent relationships.
My hope is to export a new XML file that would line up each job ID and tracking number, which I could then import into Excel.
Here is what I have so far
The source XML file looks like this:
<project>
<name>October 2019</name>
<jobs>
<job>
<id>5654206</id>
<tracking>
<mailPiece>
<barCode>00270200802095682022</barCode>
<address>Accounts Payable,1661 Knott Ave,La Mirada,CA,90638</address>
<status>En Route</status>
<dateTime>2019-10-12 00:04:21.0</dateTime>
<statusLocation>PONTIAC,MI</statusLocation>
</mailPiece>
</tracking>...
Code:
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
tree = ET.parse('mailings.xml')
root = tree.getroot()
print(root.tag)
for x in root[1].findall('job'):
id=x.find('id').text
tracking=x.find('tracking').text
print(root[1].tag,id,tracking)
The script currently returns the following:
jobs 5654206 None
jobs 5654203 None
Debugging is your friend...
I am struggling with the multiple child/parent relationships.
The right way to resolve this yourself is through using a debugger. For example, with VS Code, after applying a breakpoint and running the script with the debugger, it will stop at the breakpoint and I can inspect all the variables in memory, and run commands at the debug console just as if they were in my script. The Variable windows output looks like this:
There are various ways to do this at the command-line, or with a REPL like iPython, etc., but I find using debugging in a modern IDE environment like VS Code or PyCharm are definitely the way to go. Their debuggers remove the need to pepper print statements everywhere to test out your code, rewriting your code to expose more variables that must be printed to the console, etc.
A debugger allows you to see all the variables as a snapshot, exactly how the Python interpreter sees them, at any point in your code execution. You can:
step through your code line-by-line and watch the variables changes in real-time in the window
setup a separate watch window with only the variables you care about
and setup breakpoints that will only happen if variables are set to particular values, etc.
Child Hierarchy with the XML find method
Inspecting the variables as I step through your code, it appears that the find() method was walking the children within an Element at all levels, not just at the top level. When you used x.find('tracking') it is finding the mailPiece nodes directly. If you print the tag property, instead of the text property, you will see it is 'mailPiece' (see the debug windows above).
So, one way to resolve your issue is to store each mailPiece element as a variable, then pull out the individual attributes you want from it (i.e. BarCode, address, etc.) using find.
Here is some code that pulls all of this into a combined hierarchy of lists and dictionaries that you can then use to build your Excel outputs.
Note: The most efficient way to do this is line-by-line as you read the xml, but this is better for readability, maintainability, and if you need to do any post-processing that requires knowledge of more than one node at a time.
import xml.etree.ElementTree as ET
from xml.etree.ElementTree import Element, SubElement
from types import SimpleNamespace
tree = ET.parse('mailings.xml')
root = tree.getroot()
jobs = []
for job in root[1].findall('job'):
jobdict = {}
jobdict['id'] = job.find('id').text
jobdict['trackingMailPieces'] = []
for tracking in job.find('tracking'):
if tracking.tag == 'mailPiece':
mailPieceDict = {}
mailPieceDict['barCode'] = tracking.find('barCode').text
mailPieceDict['address'] = tracking.find('address').text
mailPieceDict['status'] = tracking.find('status').text
mailPieceDict['dateTime'] = tracking.find('dateTime').text
mailPieceDict['statusLocation'] = tracking.find('statusLocation').text
jobdict['trackingMailPieces'].append(mailPieceDict)
jobs.append(jobdict)
for job in jobs:
print('Job ID: {}'.format(job['id']))
for mp in job['trackingMailPieces']:
print(' mailPiece:')
for key, value in mp.items():
print(' {} = {}'.format(key, value))
The result is:
Job ID: 5654206
mailPiece:
barCode = 00270200802095682022
address = Accounts Payable,1661 Knott Ave,La Mirada,CA,90638
status = En Route
dateTime = 2019-10-12 00:04:21.0
statusLocation = PONTIAC,MI
Output?
I didn't address what to do with the output as that is beyond the scope of this question, but consider writing out to a CSV file, or even directly to an Excel file, if you don't need to pass on the XML to another program for some reason. There are Python packages that handle writing CSV and Excel files.
No need to create an intermediate format that you then need to manipulate after bringing it into Excel, for example.
I'm extracting extensions from a multi-extension FITS file, manipulate the data, and save the data (with the extension's header information) to a new FITS file.
To my knowledge pyfits.writeto() does the task. However, when I give it a data parameter in the form of an array, it gives me the error:
'AttributeError: 'numpy.ndarray' object has no attribute 'lower''
Here is a sample of my code:
'file = 'hst_11166_54_wfc3_ir_f110w_drz.fits'
hdulist = pyfits.open(dir + file)'
sci = hdulist[1].data # science image data
exp = hdulist[5].data # exposure time data
sci = sci*exp # converts electrons/second to electrons
file = 'test_counts.fits'
hdulist.writeto(file,sci,clobber=True)
hdulist.close()
I appreciate any help with this. Thanks in advance.
You're confusing the HDUList.writeto method, and the writeto function.
What you're calling is a method on the HDUList object that is returned when you call pyfits.open. You can think of this object as something like a file handle to your original drizzled FITS file. You can manipulate this object in place and either write it out to a new file or save updates in place (if you open the file in mode='update').
The writeto function on the other hand is not tied to any existing file. It's just a high-level function for writing an array out to a file. In your example you could write your array of electron counts out like:
pyfits.writeto(filename, data)
This will create a single-HDU FITS file with the array data in the PRIMARY HDU.
Do be aware of the admonishment at the top of this section of the docs: http://docs.astropy.org/en/v1.0.3/io/fits/index.html#convenience-functions
The functions like pyfits.writeto are there for convenience in interactive work, but are not recommendable for use in code that will be run repeatedly, as in a script. Instead have a look at these instructions to start.
It is probably because you should use hdulist.writeto(file, clobber=True). There is only one required argument:
https://pythonhosted.org/pyfits/api_docs/api_hdulists.html#pyfits.HDUList.writeto
If you give a second argument, it is used for output_verify which should be a string, not a numpy array. This probably explains your AttributeError ....
Django and Python newbie here. Ok, so I want to make a webpage where the user can enter a number between 1 and 10. Then, I want to display an image corresponding to that number. Each number is associated with an image filename, and these 10 pairs are stored in a list in a .txt file.
One way to retrieve the appropriate filename is to create a NumToImage model, which has an integer field and a string field, and store all 10 NumToImage objects in the SQL database. I could then retrieve the filename for any query number. However, this does not seem like such a great solution for storing a simple .txt file which I know is not going to change.
So, what is the way to do this in Python, without using a database? I am used to C++, where I would create an array of strings, one for each of the numbers, and load these from the .txt file when the application starts. This vector would then lie within a static object such that I can access it from anywhere in my application.
How can a similar thing be done in Python? I don't know how to instantiate a Python object and then enable it to be accessible from other Python scripts. The only way I can think of doing this is to pass the object instance as an argument for every single function that I call, which is just silly.
What's the standard solution to this?
Thank you.
The Python way is quite similar: you run code at the module level, and create objects in the module namespace that can be imported by other modules.
In your case it might look something like this:
myimage.py
imagemap = {}
# Now read the (image_num, image_path) pairs from the
# file one line at a time and do:
# imagemap[image_num] = image_path
views.py
from myimage import imagemap
def my_view(image_num)
image_path = imagemap[image_num]
# do something with image_path
I have the following program running
collector.py
data=0
while True:
#collects data
data=data+1
I have another program cool.py which wants to access the current data value. How can I do this?
Ultimately, something like:
cool.py
getData()
*An Idea would be to use a global variable for data?
You can use memory mapping.
http://docs.python.org/2/library/mmap.html
For example you open a file in tmp directore, next u mapping this file to memory in both program and write u data to this file.