Can the Python read() command be structured with two input variables? - python

I've been given some python code (at least I was told it was in python and it doesn't match matlab code structure) to get running and one of the lines is
data = f.read(1024x1024, 'int32')
I'm getting a syntax error which doesn't surprise me as I thought read() could only take one input and that was size...
I checked the docs https://docs.python.org/2/tutorial/inputoutput.html
and had a general look around, for example here:
http://www.tutorialspoint.com/python/python_files_io.htm and here: http://pymbook.readthedocs.org/en/latest/file.html
There are no indications that read() can take two inputs, nevermind one with a 'x' in it.
(I am also not clear on what the intentions of the 1024x1024 was, which is why I'm questioning if it's python, it looks like they're trying to set the size but it doesn't work like that for the read method)
Does anyone know what I'm missing? (or can work out what was originally meant by the command?)
Whole script section:
f = open(filename, 'r')
out = open(outfile, 'w')
data = f.read(1024x1024, 'int32')
result = out.write(data[0:256000])
out.closed
f.closed
It's basically notes on what they want to happen in a particular section of the script but they wrote it as if it was code and I have no idea what the intention of the data line is.

It looks more like pseudocode than anything; specifying "int32" makes me think they are reading from a binary file. You probably need something like
import numpy as np
def load_array(filename, dtype="int32", shape=(1024,1024)):
return np.fromfile(filename, dtype).reshape(shape)

Your syntax error has nothing to do with the read "command". Syntax error means that the interpreter/parser couldn't make sense of what you're writing at all. When that happens in python it will normally point at what's confusing the interpreter, fx:
data = f.read(1024x1024, 'int32')
^
SyntaxError: invalid syntax
Note the ^ pointing at 1024x1024 which is the fault, it simply doesn't understand what 1024x1024 is (so it won't get to the point to actually try to call the read method). If you meant to multiply the numbers you should have written 1024*1024 instead.
When you change to 1024*1024 you'll get other errors (for not reading the documentation for read - it doesn't take those arguments).
As for the language I'd suspect that there's no sane language with such a construct. The problem here is that x doesn't work well as a multiplication operator since that would be problematic with things like axe (did he mean a*e or the variable named axe?). It looks more like it's pseudo code.

Related

Sublime Text 3 : How to display the line numbers of the evaluated lines in REPL Python?

using Sublime-Text-3
When i evaluate some selected python code, in PEPL Python (with SublimeREPL), i get something like that:
*>>> print("thank you")
thank you*
(i already set, "show_transferred_text": true,)
Instead of send and see the whole code that evaluated, i would like to see the line numbers of this evaluated code. Have you any ideas about this ?
(for example to display sth like this: >>> evaluated lines (1:30))
Thanks!
Unfortunately this is not possible with the default install of SublimeREPL. Additionally, the project is no longer being maintained, so until someone steps forward to steward the project, this feature won't get added unless you add it yourself. It's all open-source and written in Python.

Divide and Conquer Lists in Python (to read sav files using pyreadstat)

I am trying to read sav files using pyreadstat in python but for some rare scenarios I am getting error of UnicodeDecodeError since the string variable has special characters.
To handle this I think instead of loading the entire variable set I will load only variables which do not have this error.
Below is the pseudo-code that I have with me. This is not a very efficient code since I check for error in each item of list using try and except.
# Reads only the medata to get information about the variables
df, meta = pyreadstat.read_sav('Test.sav', metadataonly=True)
list = meta.column_names # All variables are stored in list
result = []
for var in list:
print(var)
try:
df, meta = pyreadstat.read_sav('Test.sav', usecols=[str(var)])
# If no error that means we can store this variable in result
result.append(var)
except:
pass
# This will finally load the sav for non error variables
df, meta = pyreadstat.read_sav('Test.sav', usecols=result)
For a sav file with 1000+ variables it takes a long amount of time to process this.
I was thinking if there is a way to use divide and conquer approach and do it faster. Below is my suggested approach but I am not very good in implementing recursion algorithm. Can someone please help me with pseudo code it would be very helpful.
Take the list and try to read sav file
In case of no error then output can be stored in result and then we read the sav file
In case of error then split the list into 2 parts and run these again ....
Step 3 needs to run again until we have a list where it does not give any error
Using the second approach 90% of my sav files will get loaded on the first pass itself hence I think recursion is a good method
You can try to reproduce the issue for sav file here
For this specific case I would suggest a different approach: you can give an argument "encoding" to pyreadstat.read_sav to manually set the encoding. If you don't know which one it is, what you can do is iterate over the list of encodings here: https://gist.github.com/hakre/4188459 to find out which one makes sense. For example:
# here codes is a list with all the encodings in the link mentioned before
for c in codes:
try:
df, meta = p.read_sav("Test.sav", encoding=c)
print(encoding)
print(df.head())
except:
pass
I did and there were a few that may potentially make sense, assuming that the string is in a non-latin alphabet. However the most promising one is not in the list: encoding="UTF8" (the list contains UTF-8, with dash and that fails). Using UTF8 (no dash) I get this:
నేను గతంలో వాడిన బ
which according to google translate means "I used to come b" in Telugu. Not sure if that fully makes sense, but it's a way.
The advantage of this approach is that if you find the right encoding, you will not be loosing data, and reading the data will be fast. The disadvantage is that you may not find the right encoding.
In case you would not find the right encoding, you anyway would be reading the problematic columns very fast, and you can discard them later in pandas by inspecting which character columns do not contain latin characters. This will be much faster than the algorithm you were suggesting.

Pweave - putting LaTeX output within python functions

I'm putting together a fairly complex python script with functions that may or may not be called depending on the data that's being analysed.
In pure python, all works well. As soon as I break out of the code block to create a LaTeX section for the results, I get undefined variable errors. Stripping this back to its most simple case:
<<echo=False,complete=False>>=
def getValues(title, start, end):
#
\section{<%= title %>
... more LaTeX code...
<<echo=False,complete=False>>=
return
#
stripping out the pweave code block tags and the LaTeX markup, this works correctly. As soon as I add the markup the \section line returns that title is undefined when I attempt to pweave the file.
My understanding from the documentation was that the complete=False would combine the code blocks although I get the same error with or without this.
Since I want the output documentation to be dependent on the functions called, how can I achieve this?
I'd be very grateful if anyone can point me to a missed example, but I've been unable to find a way of doing what I need.
This does not work as ˋcomplete=Falseˋ does not apply to inline blocks so ˋtitleˋ is undefined when your code runs. You could generate your Latex output inside python chunks using results="tex" chunk option.

How to write back to a PDB file after doing Superimposer for atoms of a protein in PDB.BIO python

I read and extracted information of atoms from a PDB file and did a Superimposer() to align a mutation to wild-type. How can I write the aligned values of atoms back to PDB file? I tried to use PDBIO() library but it doesn't work since it doesn't accept a list as an input. Anyone has an idea how to do it?
mutantAtoms = []
mutantStructure = PDBParser().get_structure("name",pdbFile)
mutantChain = mutStructure[0]["B"]
# Extract information of atoms
for residues in mutantChain:
mutantAtoms.append(residues)
# Do alignment
si =Superimposer()
si.set_atoms(wildtypeAtoms, mutantAtoms)
si.apply(mutantAtoms)
Now mutantAtoms is the aligned atom to wild-type atom. I need to write this information to a PDB file. My question is how to convert from list of aligned atoms to a structure and use PDBIO() or some other ways to write to a PDB file.
As I see in an example in the PDBIO package documentation in Biopython documentation:
p = PDBParser()
s = p.get_structure("1fat", "1fat.pdb")
io = PDBIO()
io.set_structure(s)
io.save("out.pdb")
Seems like PDBIO module needs an object of class Structure to work, which is in principle what I understand Superimposer works with. When you say it does not accept a list do you mean you have a list of structures? In that case you could simply do it by iterating throught the structures as in:
for s in my_results_list:
io.set_structure(s)
io.save("out.pdb")
If what you have is a list of atoms, I guess you could create a Structure object with that and then pass it to PDBIO.
However, it is difficult to tell more without knowing more about your problem. You could put on your question the code lines where you get the problem.
Edit: Now I have better understood what you want to do. So I have seen in an interesting Biopython Structural Bioinformatics FAQ some information about the Structure class, which is a little complex apparently. At first sight, I do not see a very easy way to create Structure objects from scratch, but what you could do is modify the structure you get from PDBIO substituting the atoms list with the result you get from Superimposer and then write the .pdb file using the same modified structure. So you could try to put your mutantAtoms list into the mutantStructure object you already have.

error with gdalbuildvrt, in Python

I am new to python/GDAL and am running into perhaps a trivial issue. This may stem from the fact that I don't really understand how to use GDAL properly in python, or something careless, but even though I think I am following the help doc, I keep getting a syntax error when trying to use "gdalbuildvrt".
What I want to do is take several (amount varies for each set, call it N) geotagged 1-band binary rasters [all values are either 0 or 1] of different sizes (each raster in the set overlaps for the most part though), and "stack" them on top of each other so that they are aligned properly according to their coordinate information. I want this "stack" simply so I can sum the values and produce a 'total' tiff that has an extent to match the exclusive extent (meaning not just the overlap region) of all the original rasters. The resulting tiff would have values ranging from 0 to N, to represent the total number of "hits" the pixel in that location received over the course of the N rasters.
I was led to gdalbuildvrt [http://www.gdal.org/gdalbuildvrt.html] and after reading about it, it seemed that by using the keyword -separate, I would be able to achieve what I need. However, each time I try to run my program, I get a syntax error. The following shows two of the several different ways I tried calling gdalbuildvrt:
gdalbuildvrt -separate -input_file_list stack.vrt inputlist.txt
gdalbuildvrt -separate stack.vrt inclassfiles
Where inputlist.txt is a text file with a path to the tif on every line, just like the help doc specifies. And inclassfiles is a python list of the pathnames. Every single time, no matter which way I call it, I get a syntax error on the first word after the keywords (i.e. 'inputlist' in inputlist.txt, or 'stack' in stack.vrt).
Could someone please shed some light on what I might be doing wrong? Alternatively, does anyone know how else I could use python to get what I need?
Thanks so much.
gdalbuildvrt is a GDAL command line utility. From your example its a bit unclear how you actually run it, but when running from within Python you should execute it as a subprocess.
And in your first line you have the .vrt and the .txt in the wrong order. The textfile containing the files should follow directly after the -input_file_list.
From within Python you can call gdalbuildvrt like:
import os
os.system('gdalbuildvrt -separate -input_file_list inputlist.txt stack.vrt')
Note that the command is provided as a string. Using a Python list with the files can be done with something like:
os.system('gdalbuildvrt -separate stack.vrt %s') % ' '.join(data)
The ' '.join(data) part converts the list to a string with a space between the items.
Depending on how your GDAL is build, its sometimes possible to use wildcards as well:
os.system('gdalbuildvrt -separate stack.vrt *.tif')

Categories

Resources