python 2 created h5 file data into python 3 - python

I am in the progress of transferring some piece of code from python 2 (2.7) to python 3 (3.7 or later)
However this piece of code reads a h5 file which was created by code in python 2.7. This piece of code will also be transferred to python 3, but not by me. I need the data in the h5 file to check whether the conversion to python 3 on my end works well (internally the data is a pandas dataframe).
Therefore I am looking for a trick (using either python 2 or python 3) to convert this h5 file into something that I can than read with python 3. It does not need to be a neat solution since it will only be temporarily.
The data is rather sizable.

So what I ended up doing is using python 2 to read the h5 and storing it as a json (one per key in the h5)
Then I used a python 3 script to read the jsons and store them as an h5 file again
(in python 2)
foo = pandas.read_hdf('file.h5', key='bla', mode='r')
foo.to_json('file.json')
(in python 3)
foo = pandas.read_json('file.json')
foo.to_hdf('file2.h5', key='bla', mode='w')
So it ended up being rahter simple. Hopefully this asnwer will help someone being stuck with the same.

Related

How to Decompress a TAR file into TXT (read a CEL file) in either Python or R

I was wondering if anyone knows how to decompress TAR files in R and how to extrapolate data from large numbers of GZ files? In addition, does anyone know how to read large amounts of data (around the 100's) simultaneously while maintaining the integrity of the data files (at some point, my computer can't handle the amount of data and begins to write down scribbles)?
As a novice programmer still learning about programming. I was given an assignment to analyze and cross-reference data on similar genes found between different cell structures for a disease trait. I managed to access TXT dataset files to work and formatted it to be recognized by another program known as GSEA.
1.) I installed a software known as "WinZip" and it helped me decompress my TAR files into GZ files.
I stored these files into an newly created folder under "Downloads"
2.) I then tried to use R to access the files with this code:
>untar("file.tar", list=TRUE)
And it produced approximately 170 results (it converted TAR -> GZ files)
3.) When I tried to input one of the GZ files, it generated over a thousand lines of single alpha-numerical letters and numbers unintelligible to me.
>989 ™šBx
>990 33BŸ™šC:LÍC\005€
>991 LÍB¬
>992 B«™šBꙚB™™šB¯
>993 B¡
>994 BŸ
>995 C\003
>996 BŽ™šBð™šB¦
>997 B(
>998 LÍAòffBó
>999 LÍBñ™šBó
>1000 €
> [ reached 'max' / getOption("max.print") -- omitted 64340 rows ]
Warning messages:
>1: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 1 appears to contain embedded nulls
>2: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 2 appears to contain embedded nulls
>3: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 3 appears to contain embedded nulls
>4: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 4 appears to contain embedded nulls
>5: In read.table("GSM2458563_Control_1_0.CEL.gz") :
line 5 appears to contain embedded nulls
>6: In scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
embedded nul(s) found in input
What I am trying to do is simultaneously access all of these files without information overload on the computer and maintain the integrity of the data. Then, I want to access the information properly where it would resemble some sort of data table (ideally, I was wondering if conversions from TAR to TXT file would have been possible for GSEA to read and identify such data).
Does anyone know any programs compatible with window that could properly decompress and read such files or any R commands that would help me generate or convert such data files?
Backgound Research
So I've been working on it around an hour - here are the results.
The file that you are trying to open is GSM2458563_Control_1_0 is compressed inside .gz file, which contains a .CELL file, therefore it's unreadable.
Such files are published by the "National Center for Biotechnology Information".
Seen a Python 2 code to open them:
from Bio.Affy import CelFile
with open('GSM2458563_Control_1_0.CEL') as file:
c = CelFile.read(file)
I've found documentation about Bio.Affy on version 1.74 of biopython.
Yet current biopython readme says:
"...Biopython 1.76 was our final release to support Python 2.7 and Python 3.5."
Nowadays Python 2 is deprecated, not to mention that the library mentioned above has evolved and changed tremendously.
Solution
So I found another way around it, using R.
My Specs:
Operation System : Windows 64
RStudio : Version 1.3.1073
R Version : R-4.0.2 for Windows
I've pre-installed the dependencies mentioned below.
Use the GEOquery.getGEO function to fetch from NCBI GEO the file.
# Presequites
# Download and install Rtools custom from http://cran.r-project.org/bin/windows/Rtools/
# Install BiocManager
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("GEOquery")
library(GEOquery)
# Download and open the data
gse <- getGEO("GSM2458563", GSEMatrix = TRUE)
show(gse)
# ****** Data Table ******
# ID_REF VALUE
# 1 7892501 1.267832
# 2 7892502 3.254963
# 3 7892503 1.640587
# 4 7892504 7.198422
# 5 7892505 2.226013

How to convert .pkl file into .csv or read .pkl into R as dataframe [duplicate]

This question already has answers here:
Converting .pkl file to .csv file
(2 answers)
Closed 1 year ago.
I've never used python and I've received some .pkl files which have some tracking data and in the data set are a training set with 7500 sequences and two separate set of sequences for testing the format of each sequence is as follows:
- Each sequence is a matrix (numpy 2D array) with 46 columns. Each row contains 23 pairs of (x,y) coordinates...and so on.
I've tried to use the reticulate package and then for example having the file in my working directory running this code hasn't worked and I don't know what else to do...
> data_1 = py_load_object(test_data_1.pkl, pickle = "pickle")
Error in py_resolve_dots(list(...)) : object 'test_data_1.pkl' not found
You are probably pretty close. I am not familiar with reticulate but if the files you have were serialized with the pickle module, you should be able to de-serialize with the same module.
import pickle
with open('test_data_1.pkl','rb') as f:
data_1 = pickle.load(f)
You must give pickle.load() a file handle using the built-in open. If you don't want to keep all the pickle files in the same working folder as your script, you can use an absolute or relative path given as a string. There are more details about open here. You can also use pathlib.Path objects for the filepath if you want to get fancier.

How to pass data(Args,parameter,list ..etc) from python2 to python3 in memory? [duplicate]

This question already has answers here:
What is the preferred way of passing data between two applications on the same system?
(6 answers)
Closed 5 years ago.
Owing to some reason,I must use "win32com" this module to output my Dataframe from another appication (But this module seems to be only in python2).
However , I want to do some calculate in python3 with this Dataframe(output by python2).
How can I send the Dataframe from python2 to python3 in memory? (Except to output the data to the file)
Supplementary explanation 1:
My os is Win10 64bit (both contain python2 and python3)
In short , I want to know how can I pass the Data(produced by python2) to python3.
Supplementary explanation 2:
I have a A python script and it need to run in python2.
A python script will generated some data(maybe json , dataframe ..)
And then I want pass this data to B python script
B python script must run in python3.
My os is win10 64bits(both have python2 and 3).
I am a new to python , I have tried "out the data to the file ,then B.py read the file". However this I/O way is too slow , so I want pass data in memory , how can I do that?
(My English is not very good, please entertain me )
JSON is a convenient option.
import json
read the docs for specifics and examples
https://docs.python.org/2/library/json.html
https://docs.python.org/3/library/json.html

How to load .mat file into workspace using Matlab Engine API for Python?

I have a .mat workspace file containing 4 character variables. These variables contain paths to various folders I need to be able to cd to and from relatively quickly. Usually, when using only Matlab I can load this workspace as follows (provided the .mat file is in the current directory).
load paths.mat
Currently I am experimenting with the Matlab Engine API for Python. The Matlab help docs recommend using the following Python formula to send variables to the current workspace in the desktop app:
import matlab.engine
eng = matlab.engine.start_matlab()
x = 4.0
eng.workspace['y'] = x
a = eng.eval('sqrt(y)')
print(a)
Which works well. However the whole point of the .mat file is that it can quickly load entire sets of variables the user is comfortable with. So the above is not efficient when trying to load the workspace.
I have also tried two different variations in Python:
eng.load("paths.mat")
eng.eval("load paths.mat")
The first variation successfully loads a dict variable in Python containing all four keys and values but this does not propagate to the workspace in Matlab. The second variation throws an error:
File "", line unknown SyntaxError: Error: Unexpected MATLAB
expression.
How do I load up a workspace through the engine without having to manually do it in Matlab? This is an important part of my workflow....
You didn't specify the number of output arguments from the MATLAB engine, which is a possible reason for the error.
I would expect the error from eng.load("paths.mat") to read something like
TypeError: unsupported data type returned from MATLAB
The difference in error messages may arise from different versions of MATLAB, engine API...
In any case, try specifying the number of output arguments like so,
eng.load("paths.mat", nargout=0)
This was giving me fits for a while. A few things to try. I was able to get this working on Matlab 2019a with Python 3.7. I had the most trouble trying to create a string and using the string as an argument for load and eval/evalin, so there might be some trickiness with the single or double quotes, or needing to have an additional set of quotes in the string.
Make sure the MAT file is on the Matlab Path. You can use addpath and rmpath really easily with pathlib objects:
from pathlib import Path
mat_file = Path('local/path/from/cwd/example.mat').resolve # get absolute path
eng.addpath(str(mat_file.parent))
# Execute other commands
eng.rmpath(str(mat_file.parent))
Per dML's answer, make sure to specify the nargout=0 when there are no outputs from the function, and always when calling a script. If there are 1 or more outputs you don't have to have an output in Python, and there is more than one it will be output as a tuple.
You can also turn your script into a function (just won't have access to base workspace without using evalin/assignin):
function load_example_matfile()
evalin('base','load example.mat')
end
eng.feval('load_example_matfile')
And, it does seem to work on the plain vanilla eval and load as well, but if you leave off the nargout=0 it either errors out or gives you the output of the file in python directly.
Both of these work.
eng.eval('load example.mat', nargout=0)
eng.load('example.mat', nargout=0)

How to operate on unsaved Excel file?

I'd like to automate a loop:
ABAQUS generates a Excel file;
Matlab utilises data in Excel file;
loop 1 and 2.
Now my question is: after step 1, the Excel file from ABAQUS is unsaved as Book1. I cannot use Matlab command to save it. Is there a way not to save this ''Book1'' file, but use the data in it? Or if I can find where it is so I can use the data inside? (I assume that Excel always saves the file even though user doesn't?)
Thank you! 
As agentp mentioned, if you are running Abaqus via a Python script, you can just use Python to create a .txt file to save all the relevant information. If well structured, a .txt file can be as readable as an Excel spreadsheet. Because Matlab and Python have intrinsic functions to read and write files this communication can be easily done.
As for Matlab calling Abaqus, you can use something similar to:
system('abaqus cae nogui=YOUR_SCRIPT.py')
Your script that pipes to Excel should have some code similar to this:
abq_ExcelUtilities.excelUtilities.XYtoExcel(
xyDataNames='S:Mises PI: PART-1-1 E: 4309 IP: 1', trueName='')
writing the same data to a report (.rpt) file the code looks like this:
x0 = session.xyDataObjects['S:Mises PI: PART-1-1 E: 4309 IP: 1']
session.writeXYReport(fileName='abaqus.rpt', xyData=(x0, ))
now to "roll your own", use that x0 object: x0.data is a regular python tuple holding the actual data which you can write to a file however you like, eg:
file=open('myfile.csv','w')
for point in x0.data: file.write('%g,%g\n'%point)
file.close()
(you can comment or delete the writeXYReport call )

Categories

Resources