This question already has answers here:
Saving plots (AxesSubPlot) generated from python pandas with matplotlib's savefig
(6 answers)
Pandas plotting in Windows terminal
(2 answers)
Pandas plot doesn't show
(4 answers)
Closed 7 months ago.
I am aware that pandas offer the opportunity to visualize data with plots. Most of the examples I can find and even pandas docu itself use Jupyter Notebook examples.
This code doesn't work in a row python shell.
#!/usr/bin/env python3
import pandas as pd
df = pd.DataFrame({'A': range(100)})
obj = df.hist(column='A')
# array([[<AxesSubplot:title={'center':'A'}>]], dtype=object)
How can I "show" that?
This scripts runs not in an IDE. It runs in a Python 3.9.10 shell interpreter in Windows "Dos-Box" on Windows 10.
Installing jupyter or transfering the data to an external service is not an option in my case.
Demonstrating a solution building on code provided by OP:
Save this as a script named save_test.py in your working directory:
import pandas as pd
df = pd.DataFrame({'A': range(100)})
the_plot_array = df.hist(column='A')
fig = the_plot_array [0][0].get_figure()
fig.savefig("output.png")
Run that script on command line using python save_test.py.
You should see it create a file called output.png in your working directory. Open the generated image with your favorite image file viewer on your machine. If you are doing this remote, download the image file and view on your local machine.
You should also be able to run those lines in succession in a interpreter if the OP prefers.
Explanation:
Solution provided based on the fact Pandas plotting uses matplotlib as the default plotting backend (which can be changed), so you can use Matplotlib's ability to save generated plots as images, combined with Wael Ben Zid El Guebsi's answer to 'Saving plots (AxesSubPlot) generated from python pandas with matplotlib's savefig' and using type() to drill down to see that pandas histogram is returned as an numpy array of arrays. (The first item in the inner array is an matplotlib.axes._subplots.AxesSubplot object, that the_plot_array [0][0] gets. The get_figure() method gets the plot from that matplotlib.axes._subplots.AxesSubplot object.)
Try something like this
df = pd.DataFrame({'A': list(range(100))})
df.plot(kind='line')
This question already has an answer here:
Treat binary data as a file object?
(1 answer)
Closed 2 years ago.
I'm given a byte array for a .tdms file and I need to read this data but from documentation nptdms.read() and nptdms.open() require a path to a file.
I could write the byte array to disk and use one of these methods but that seems inefficient.
Other than enhancing nptdms to accept a byte array and the solution I described above, do I have any good options?
from io import BytesIO
from nptdms import TdmsFile
def tdms_bytearray_to_tdms_obj(tdms_bytearray):
file_like_tdms = BytesIO(tdms_bytearray)
return TdmsFile.read(file_like_tdms)
Is it possible to read binary MATLAB .mat files in Python?
I've seen that SciPy has alleged support for reading .mat files, but I'm unsuccessful with it. I installed SciPy version 0.7.0, and I can't find the loadmat() method.
An import is required, import scipy.io...
import scipy.io
mat = scipy.io.loadmat('file.mat')
Neither scipy.io.savemat, nor scipy.io.loadmat work for MATLAB arrays version 7.3. But the good part is that MATLAB version 7.3 files are hdf5 datasets. So they can be read using a number of tools, including NumPy.
For Python, you will need the h5py extension, which requires HDF5 on your system.
import numpy as np
import h5py
f = h5py.File('somefile.mat','r')
data = f.get('data/variable1')
data = np.array(data) # For converting to a NumPy array
First save the .mat file as:
save('test.mat', '-v7')
After that, in Python, use the usual loadmat function:
import scipy.io as sio
test = sio.loadmat('test.mat')
There is a nice package called mat4py which can easily be installed using
pip install mat4py
It is straightforward to use (from the website):
Load data from a MAT-file
The function loadmat loads all variables stored in the MAT-file into a simple Python data structure, using only Python’s dict and list objects. Numeric and cell arrays are converted to row-ordered nested lists. Arrays are squeezed to eliminate arrays with only one element. The resulting data structure is composed of simple types that are compatible with the JSON format.
Example: Load a MAT-file into a Python data structure:
from mat4py import loadmat
data = loadmat('datafile.mat')
The variable data is a dict with the variables and values contained in the MAT-file.
Save a Python data structure to a MAT-file
Python data can be saved to a MAT-file, with the function savemat. Data has to be structured in the same way as for loadmat, i.e. it should be composed of simple data types, like dict, list, str, int, and float.
Example: Save a Python data structure to a MAT-file:
from mat4py import savemat
savemat('datafile.mat', data)
The parameter data shall be a dict with the variables.
Having MATLAB 2014b or newer installed, the MATLAB engine for Python could be used:
import matlab.engine
eng = matlab.engine.start_matlab()
content = eng.load("example.mat", nargout=1)
Reading the file
import scipy.io
mat = scipy.io.loadmat(file_name)
Inspecting the type of MAT variable
print(type(mat))
#OUTPUT - <class 'dict'>
The keys inside the dictionary are MATLAB variables, and the values are the objects assigned to those variables.
There is a great library for this task called: pymatreader.
Just do as follows:
Install the package: pip install pymatreader
Import the relevant function of this package: from pymatreader import read_mat
Use the function to read the matlab struct: data = read_mat('matlab_struct.mat')
use data.keys() to locate where the data is actually stored.
The keys will usually look like: dict_keys(['__header__', '__version__', '__globals__', 'data_opp']). Where data_opp will be the actual key which stores the data. The name of this key can ofcourse be changed between different files.
Last step - Create your dataframe: my_df = pd.DataFrame(data['data_opp'])
That's it :)
There is also the MATLAB Engine for Python by MathWorks itself. If you have MATLAB, this might be worth considering (I haven't tried it myself but it has a lot more functionality than just reading MATLAB files). However, I don't know if it is allowed to distribute it to other users (it is probably not a problem if those persons have MATLAB. Otherwise, maybe NumPy is the right way to go?).
Also, if you want to do all the basics yourself, MathWorks provides (if the link changes, try to google for matfile_format.pdf or its title MAT-FILE Format) a detailed documentation on the structure of the file format. It's not as complicated as I personally thought, but obviously, this is not the easiest way to go. It also depends on how many features of the .mat-files you want to support.
I've written a "small" (about 700 lines) Python script which can read some basic .mat-files. I'm neither a Python expert nor a beginner and it took me about two days to write it (using the MathWorks documentation linked above). I've learned a lot of new stuff and it was quite fun (most of the time). As I've written the Python script at work, I'm afraid I cannot publish it... But I can give some advice here:
First read the documentation.
Use a hex editor (such as HxD) and look into a reference .mat-file you want to parse.
Try to figure out the meaning of each byte by saving the bytes to a .txt file and annotate each line.
Use classes to save each data element (such as miCOMPRESSED, miMATRIX, mxDOUBLE, or miINT32)
The .mat-files' structure is optimal for saving the data elements in a tree data structure; each node has one class and subnodes
To read mat file to pandas dataFrame with mixed data types
import scipy.io as sio
mat=sio.loadmat('file.mat')# load mat-file
mdata = mat['myVar'] # variable in mat file
ndata = {n: mdata[n][0,0] for n in mdata.dtype.names}
Columns = [n for n, v in ndata.items() if v.size == 1]
d=dict((c, ndata[c][0]) for c in Columns)
df=pd.DataFrame.from_dict(d)
display(df)
Apart from scipy.io.loadmat for v4 (Level 1.0), v6, v7 to 7.2 matfiles and h5py.File for 7.3 format matfiles, there is anther type of matfiles in text data format instead of binary, usually created by Octave, which can't even be read in MATLAB.
Both of scipy.io.loadmat and h5py.File can't load them (tested on scipy 1.5.3 and h5py 3.1.0), and the only solution I found is numpy.loadtxt.
import numpy as np
mat = np.loadtxt('xxx.mat')
Can also use the hdf5storage library. official documentation here for details on matlab version support.
import hdf5storage
label_file = "./LabelTrain.mat"
out = hdf5storage.loadmat(label_file)
print(type(out)) # <class 'dict'>
from os.path import dirname, join as pjoin
import scipy.io as sio
data_dir = pjoin(dirname(sio.__file__), 'matlab', 'tests', 'data')
mat_fname = pjoin(data_dir, 'testdouble_7.4_GLNX86.mat')
mat_contents = sio.loadmat(mat_fname)
You can use above code to read the default saved .mat file in Python.
After struggling with this problem myself and trying other libraries (I have to say mat4py is a good one as well but with a few limitations) I have built this library ("matdata2py") that can handle most variable types and most importantly for me the "string" type. The .mat file needs to be saved in the -V7.3 version. I hope this can be useful for the community.
Installation:
pip install matdata2py
How to use this lib:
import matdata2py as mtp
To load the Matlab data file:
Variables_output = mtp.loadmatfile(file_Name, StructsExportLikeMatlab = True, ExportVar2PyEnv = False)
print(Variables_output.keys()) # with ExportVar2PyEnv = False the variables are as elements of the Variables_output dictionary.
with ExportVar2PyEnv = True you can see each variable separately as python variables with the same name as saved in the Mat file.
Flag descriptions
StructsExportLikeMatlab = True/False structures are exported in dictionary format (False) or dot-based format similar to Matlab (True)
ExportVar2PyEnv = True/False export all variables in a single dictionary (True) or as separate individual variables into the python environment (False)
scipy will work perfectly to load the .mat files.
And we can use the get() function to convert it to a numpy array.
mat = scipy.io.loadmat('point05m_matrix.mat')
x = mat.get("matrix")
print(type(x))
print(len(x))
plt.imshow(x, extent=[0,60,0,55], aspect='auto')
plt.show()
To Upload and Read mat files in python
Install mat4py in python.On successful installation we get:
Successfully installed mat4py-0.5.0.
Importing loadmat from mat4py.
Save file actual location inside a variable.
Load mat file format to a data value using python
pip install mat4py
from mat4py import loadmat
boston = r"E:\Downloads\boston.mat"
data = loadmat(boston, meta=False)
This question already has answers here:
Converting .pkl file to .csv file
(2 answers)
Closed 1 year ago.
I've never used python and I've received some .pkl files which have some tracking data and in the data set are a training set with 7500 sequences and two separate set of sequences for testing the format of each sequence is as follows:
- Each sequence is a matrix (numpy 2D array) with 46 columns. Each row contains 23 pairs of (x,y) coordinates...and so on.
I've tried to use the reticulate package and then for example having the file in my working directory running this code hasn't worked and I don't know what else to do...
> data_1 = py_load_object(test_data_1.pkl, pickle = "pickle")
Error in py_resolve_dots(list(...)) : object 'test_data_1.pkl' not found
You are probably pretty close. I am not familiar with reticulate but if the files you have were serialized with the pickle module, you should be able to de-serialize with the same module.
import pickle
with open('test_data_1.pkl','rb') as f:
data_1 = pickle.load(f)
You must give pickle.load() a file handle using the built-in open. If you don't want to keep all the pickle files in the same working folder as your script, you can use an absolute or relative path given as a string. There are more details about open here. You can also use pathlib.Path objects for the filepath if you want to get fancier.
This question already has answers here:
What is the preferred way of passing data between two applications on the same system?
(6 answers)
Closed 5 years ago.
Owing to some reason,I must use "win32com" this module to output my Dataframe from another appication (But this module seems to be only in python2).
However , I want to do some calculate in python3 with this Dataframe(output by python2).
How can I send the Dataframe from python2 to python3 in memory? (Except to output the data to the file)
Supplementary explanation 1:
My os is Win10 64bit (both contain python2 and python3)
In short , I want to know how can I pass the Data(produced by python2) to python3.
Supplementary explanation 2:
I have a A python script and it need to run in python2.
A python script will generated some data(maybe json , dataframe ..)
And then I want pass this data to B python script
B python script must run in python3.
My os is win10 64bits(both have python2 and 3).
I am a new to python , I have tried "out the data to the file ,then B.py read the file". However this I/O way is too slow , so I want pass data in memory , how can I do that?
(My English is not very good, please entertain me )
JSON is a convenient option.
import json
read the docs for specifics and examples
https://docs.python.org/2/library/json.html
https://docs.python.org/3/library/json.html