How to use .format with variables in loop with QGIS/Python - python

I've got the following code, but I can't quite get it to work:
import glob
import os
import processing # for QGIS functionality
os.chdir("/home/mydir/vector")
columnid = ['SAMP300','SAMP100','SAMP30', 'SAMP10']
for x in glob.glob("prefix*.shp"):
for y in columnid:
for z in [1, 2, 3]:
output_alg0='"/home/mydir/vector/samples/{x}_{y}_{z}.shp"'.format(x=x.rstrip('.shp'), y=y, z=z)
output_0='processing.runalg("qgis:randompointsinsidepolygonsvariable","/home/mydir/vector/{x}",0,"{y}",1000,output_alg0)'.format(x=x, y=y)
The script reads through a directory of .shp files, and uses variables to name the outputs. output_alg0 creates the output file name used in the next step. The file name is based on the original file, and two variables within the loop. output_0 is the actual QGIS algorithm that is run, which references each .shp file in the loop, passes the columnid variable and some fixed parameters, and references output_alg0 for naming the output .shp file.
If I append print in front of the two commands within the loop, I get the output that I'm expecting (i.e., the {x},{y}, and {z} variables are correctly populated. Furthermore, the script gives no error when executed, but no output is produced.
Here's an example of the output by appending print and parentheses to the two lines within the loop:
output_alg0="/home/mydir/vector/samples/prefix_SAMP10_3.shp"
output_0=processing.runalg("qgis:randompointsinsidepolygonsvariable","/home/mydir/vector/prefix.shp",0,"SAMP10",1000,output_alg0)
I can copy and paste both lines exactly as they appear above into the QGIS Python Console and the commands are executed as expected (i.e., random point file is generated based on the input shapefile, and an output shapefile is created as specified).
I think it has something to do with how I'm using .format and/or perhaps how I'm using single and/or double quotations, and/or some sort of Python/QGIS interaction that I don't quite understand.
EDIT: Instead of using the two output_ prefixes, I also tried out this one-liner version, and the script executes (without error), but no output is created.
import glob
import os
import processing # for QGIS functionality
os.chdir("/home/mydir/vector")
columnid = ['SAMP300','SAMP100','SAMP30', 'SAMP10']
for x in glob.glob("prefix*.shp"):
for y in columnid:
for z in [1, 2, 3]:
'processing.runalg("qgis:randompointsinsidepolygonsvariable","/home/mydir/vector/{w}",0,"{y}",1000,"/home/mydir/vectorsamples/{x}_{y}_{z}.shp")'.format(w=x, x=x.rstrip('.shp'), y=y, z=z)

Currently your loop is generating the variables how you want them to and assigning them to output_alg0 and output_0 the way you want, but you don't ever DO anything with them after that.
I've asked for your expected output in a comment. If you add that we can probably figure out how to get you there. My GUESS is that you're trying to execute output_0 (which currently is just a string that looks like a python command). In which case you'd have to use exec(output_0) but be wary of using exec in general. It's a scary command.
EDIT:
If I were you, I'd do something more like:
# inside your loops::
output_alg0='"/home/mydir/vector/samples/{x}_{y}_{z}.shp"'.format(x=x.rstrip('.shp'), y=y, z=z)
output_0 = processing.runalg("qgis:randompointsinsidepolygonsvariable", "/home/mydir/vector/{x}".format(x=x), 0, y, 1000, output_alg0)
That way you're running the script directly rather than writing a bit of code then executing it.

Changed my EDIT-ed line written as:
'processing.runalg("qgis:randompointsinsidepolygonsvariable","/home/mydir/vector/{w}",0,"{y}",1000,"/home/mydir/vectorsamples/{x}_{y}_{z}.shp")'.format(w=x, x=x.rstrip('.shp'), y=y, z=z)
to this, which wraps exec with quotations marks around the line above:
exec('processing.runalg("qgis:randompointsinsidepolygonsvariable","/home/mydir/vector/{w}",0,"{y}",1000,"/home/mydir/vectorsamples/{x}_{y}_{z}.shp")'.format(w=x, x=x.rstrip('.shp'), y=y, z=z))
The above modification to the script is running and iterates through the list of .shp files within the specified directory; however, I don't think this is the most kosher answer possible. I'll hold off accepting my own answer if something better and/or more QGIS-friendly comes along.
PS-thanks to #Adam Smith for pointing out exec

Related

Snakemake with unknown names of multiple outputs from seaborn plots in python

I am working snakemake and using seaborn in python.
I have come to a point in the pipeline where snakemake gives missing output.
Pseudocode for snakemake would look a bit like this:
WC1 = Wildcard1
WC2 = Wildcard2
rule all:
expand("/path/to/outputs{wc1}_{wc2}.png", wc1=WC1, wc2=WC2)
checkpoint seaborn:
input:
"/OtherPath/to/file1.csv"
"/AnotherPath/to/file2.tsv"
output:
"/path/to/outputs{wc1}_{wc2}.png"
shell:
"python SeabornPlot.py"
The python script, as somewhat pseudocode, would be like this:
import matplotlib.pyplot as plt
import seaborn as sns
CSV1 = pd.read_csv(snakemake.input[O], delimiter=",")
CSV2 = pd.read_csv(snakemake.input[1], delimiter=",")
for column in CSV1:
g = sns.barplot(x=column, data=CSV2)
fig = g.get_figure()
fig.savefig(snakemake.output[0])
plt.clf()
This does not work. The python script works outside of snakemake (with the correct path and name for the file). I suspect it has something to do with using "snakemake.output[0]", but what can I use instead?
I am quite sure checkpoint is correct here, instead of rule. But please correct me if I am wrong.
Finally, I know something is missing in rule all. I probably should add a function with a new wildcard for each plot.
But the main problem I have is to get python to output the plots to the correct path.
Thanks in advance!
The obvious error in your Snakefile is the rule all. It should have an input.
rule all:
input: expand("/path/to/outputs{wc1}_{wc2}.png", wc1=WC1, wc2=WC2)
There is nothing special about the name all. If you do not specify a rule or an output file, snakemake simply uses the first rule as the target. By convention (carried over from the original make program), the first rule is called all, and it only has dependencies (aka input:). It does not usually have its own outputs or action (eg: script:).

compare whether two python files result in same byte code (are code wise identical)

We're doing some code cleanup.
The cleanup is only about formatting (if an issue, then let's even assume, that line numbers don't change, though ideally I'd like to ignore also line number changes)
In order to be sure, that there is no accidental code change I'd like to find a simple / fast way to compare the two source codes.
So let's assume, that I have file1.py and file2.py
what is working is to use
py_compile.compile(filename) to create .pyc files and then use
uncompyle6 pycfile, then strip off comments and compare the results,
But this is overkill and very slow.
Another approach I imagined is to copy
file1.py for example to file.py,
use py_compile.compile("file.py") and save the .pyc file
then copy file2.py for example to file.py and use
use py_compile.compile("file.py") and save the .pyc file
and finally compare both generated .pyc files
Would this work reliably with all (current) versions >= python 3.6
If I remember at least for python2 the pyc files could contain time stamps or absolute paths, that could make the comparison fail. (at least if the generation of the pyc file was run on two different machines)
Is there a clean way to compare the byte code of py2 files?
As bonus feature (if possible) I'd like to create a hash for each byte code, that I could store for future reference.
You might try using Python's internal compile function, which can compile from string (read in from a file in your case). For example, compiling and comparing the resulting code objects from two equivalent programs and one almost equivalent program and then just for demo purposes (something you would not want to do) executing a couple of the code objects:
import hashlib
import marshal
​
​
def compute_hash(code):
code_bytes = marshal.dumps(code)
code_hash = hashlib.sha1(code_bytes).hexdigest()
return code_hash
​
​
source1 = """x = 3
y = 4
z = x * y
print(z)
"""
source2 = "x=3;y=4;z=x*y;print(z)"
​
source3 = "a=3;y=4;z=a*y;print(z)"
​
obj1 = compile(source=source1, filename='<string>', mode='exec', dont_inherit=1)
obj2 = compile(source=source2, filename='<string>', mode='exec', dont_inherit=1)
obj3 = compile(source=source3, filename='<string>', mode='exec', dont_inherit=1)
​
print(obj1 == obj2)
print(obj1 == obj3)
​
exec(obj1)
exec(obj3)
print(compute_hash(obj1))
Prints:
True
False
12
12
48632a1b64357e9d09d19e765d3dc6863ee67ab9
This will save you from having to copying py files, creating pyc files, comparing pyc files, etc.
Note:
The compute_hash function is if you need a hash function that is repeatable, i.e. returns the same value repeatedly for the same code object when computed in successive program runs.
Might be not the desired answer - but why dont you use a diff tool to compare the if the files are changed?
https://linuxhandbook.com/diff-command/
And if the files are changed use a mergetool like meld to compare the changes http://meldmerge.org/

Python Script that is changes by execution

how can I write a script that it changes by execution?
For example, two a have script exist from two row:
list = [1, 2, 3, 4, 5]
sliced_list = list[0:1]
Executing it, 2nd row should be:
sliced_list = list[1:2]
and then,
sliced_list = list[2:3]
I want to modify variable "sliced_list" everytime I run this file.
Generally this is not something you should ever want to do, since it is likely to result in non-deterministic behavior, and in the event of a bug its possible to completely overwrite your script and lose data.
If you want to change the date your script is operating on you should store it persistently in some fashion. This could be in a separate file somewhere or in an environment variable.
But to do what your asking you would need to open the script, copy the contents, and modify the content as you desire like this:
with open("/path/to/script.py", 'r+') as script:
contents = script.read()
# ... some string logic here
# Point cursor to the beginning of the file
# If the original contents were longing than the new contents
# you'll have unwanted data at the end of the file.
script.seek(0)
script.write(contents)
You could save the start index into a file when you run the script. Then increment it and save it. Something like what is shown below.
import os
List=[1,2,3,4,5]
file=open('file.txt','r+')
start_index=int(file.read())
print(List[start_index:start_index+1])
file.close()
os.remove('file.txt')
file=open('file.txt','w')
file.write(str(start_index+1))
file.close()

Looping through ipynb files in python

So if I have the same piece of code inside of 10 separate .ipynb files with different names and lets say that the code is as follows.
x = 1+1
so pretty simple stuff, but I want to change the variable x to y. Is their anyway using python to loop through each .ipynb file and do some sort of find and replace anywhere it sees x to change it or replace it with y? Or will I have to open each file up in Jupiter notebook and make the change manually?
I never tried this before, but the .ipynb files are simply JSONs. These pretty much function like nested dictionaries. Each cell is contained within the key 'cells', and then the 'cell_type' tells you if the cell is code. You then access the contents of the code cell (the code part) with the 'source' key.
In a notebook I am writing I can look for a particular piece of code like this:
import json
with open('UW_Demographics.ipynb') as f:
ff = json.load(f)
for cell in ff['cells']:
if cell['cell_type'] == 'code':
for elem in cell['source']:
if "pd.read_csv('UWdemographics.csv')" in elem:
print("OK")
You can iterate over your ipynb files, identify the code you want to change using the above, change it and save using json.dump in the normal way.

How to load .mat file into workspace using Matlab Engine API for Python?

I have a .mat workspace file containing 4 character variables. These variables contain paths to various folders I need to be able to cd to and from relatively quickly. Usually, when using only Matlab I can load this workspace as follows (provided the .mat file is in the current directory).
load paths.mat
Currently I am experimenting with the Matlab Engine API for Python. The Matlab help docs recommend using the following Python formula to send variables to the current workspace in the desktop app:
import matlab.engine
eng = matlab.engine.start_matlab()
x = 4.0
eng.workspace['y'] = x
a = eng.eval('sqrt(y)')
print(a)
Which works well. However the whole point of the .mat file is that it can quickly load entire sets of variables the user is comfortable with. So the above is not efficient when trying to load the workspace.
I have also tried two different variations in Python:
eng.load("paths.mat")
eng.eval("load paths.mat")
The first variation successfully loads a dict variable in Python containing all four keys and values but this does not propagate to the workspace in Matlab. The second variation throws an error:
File "", line unknown SyntaxError: Error: Unexpected MATLAB
expression.
How do I load up a workspace through the engine without having to manually do it in Matlab? This is an important part of my workflow....
You didn't specify the number of output arguments from the MATLAB engine, which is a possible reason for the error.
I would expect the error from eng.load("paths.mat") to read something like
TypeError: unsupported data type returned from MATLAB
The difference in error messages may arise from different versions of MATLAB, engine API...
In any case, try specifying the number of output arguments like so,
eng.load("paths.mat", nargout=0)
This was giving me fits for a while. A few things to try. I was able to get this working on Matlab 2019a with Python 3.7. I had the most trouble trying to create a string and using the string as an argument for load and eval/evalin, so there might be some trickiness with the single or double quotes, or needing to have an additional set of quotes in the string.
Make sure the MAT file is on the Matlab Path. You can use addpath and rmpath really easily with pathlib objects:
from pathlib import Path
mat_file = Path('local/path/from/cwd/example.mat').resolve # get absolute path
eng.addpath(str(mat_file.parent))
# Execute other commands
eng.rmpath(str(mat_file.parent))
Per dML's answer, make sure to specify the nargout=0 when there are no outputs from the function, and always when calling a script. If there are 1 or more outputs you don't have to have an output in Python, and there is more than one it will be output as a tuple.
You can also turn your script into a function (just won't have access to base workspace without using evalin/assignin):
function load_example_matfile()
evalin('base','load example.mat')
end
eng.feval('load_example_matfile')
And, it does seem to work on the plain vanilla eval and load as well, but if you leave off the nargout=0 it either errors out or gives you the output of the file in python directly.
Both of these work.
eng.eval('load example.mat', nargout=0)
eng.load('example.mat', nargout=0)

Categories

Resources