Transform pandas dataframe into numpy array for fast plotting - python

I am writting a script to plot some data.
I am using python 3.7.1 on windows and have the following code to plot:
import pandas as pd
import matplotlib.pyplot as plt
files=['path']
for i in range(len(files)):
data = pd.read_csv(files[i], sep=';', skiprows=17, header=None,engine='python', decimal=",")
c=files[0].split('\\')
path='\\'.join(c[:-1])
x= data.loc[:,0].values
y= data.loc[:,1].values
c,data=None,None
plt.ioff() #turns off the plotting
plt.plot(x,y)
plt.xlabel('x]')
plt.ylabel('y')
plt.savefig(path+'\\ title123') #saves image
I want to transform the dataframe from pandas into a numpy array dtype float64.
Currently, the code I have transforms the data into an object type. I cannot plot this because the code is taking too long to run.
An example of what I am trying to achieve is:
import numpy as np
import matplotlib.pyplot as plt
x=np.linspace(0,10,1000000)
y=np.sin(x)
plt.plot(x,y)
I will leave a link to the file.
https://drive.google.com/open?id=1kir-cGlk3bZSLmvD_tfnbGUaTYzvcW-3
Can anyone give me a help?
Kind Regards!

I just noticed that it was a problem with ',' and '.'. Sort of a math "language" conflict.
However, the for loop runs extremely slow when more than one file is loaded.
Kind regards to all!

Related

plt.plot TypeError: unhashable type: 'numpy.ndarray'

I want to implement the butterworthfilter with python in jupyter Notebook. Python is new to me and i dont know why i get a error. I search here but i didnt find a solution.
The data are from a CSV-File, it calls Samples.csv
The Data in Samples.csv are like
998,4778415
1009,209592
1006,619094
1001,785406
993,9426543
990,1408991
992,736118
995,8127334
1002,381664
1006,094429
1000,634799
999,3287747
1002,318812
999,3287747
1004,427698
1008,516733
1007,964781
1002,680906
1000,14449
994,257009
The column calls Euclidian Norm. The range of the data are from 0 to 1679.286158 and theyre are 1838 rows.
I wrote this code, it was from a tutorial.
from scipy.signal import filtfilt
from scipy import stats
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
def plot():
data=pd.read_csv('Samples.csv',sep=";")
sensor_data=data[['Euclidian Norm']]
sensor_data=np.array(sensor_data)
time=np.linspace(0,1679.286158,1838)
plt.plot(time, sensor_data)
plot.show()
plot()
I get the error TypeError: unhashable type: 'numpy.ndarray'and the line of plt.plot(time, sensor_data) it marks yellow.
I dont know what is wrong, because i dont see a type failure in the code, does anyone know what could be wrong in the code?
The problem is that you are using , as the decimal separator in your CSV file but you haven't told Pandas that you are doing that.
Try replacing the line
data=pd.read_csv('Samples.csv',sep=";")
with
data=pd.read_csv('Samples.csv',sep=";", decimal=",")

Saving a plot from multiple subplots

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
fig,ax=plt.subplots(2,2,figsize=(15,10))
x=np.linspace(-3,3)
ax[0,0].plot(x,foo-function)
now I need a way to save each of the 4 plots into one file like this:
plt1=topleft_plot.saveNOTfigBUTplot('quadfunction.pdf')
how?
Using the answer here: https://stackoverflow.com/a/4328608/16299117
We can do the following to save a SINGLE subplot from the overall figure:
import matplotlib.pyplot as plt
import numpy as np
fig,ax=plt.subplots(2,2,figsize=(15,10))
x=np.linspace(-3,3)
ax[0,0].plot(x,x**2) # This is just to make an actual plot.
# I am not using jupyter notebook, so I use this to show it instead of %inline
plt.show()
# Getting only the axes specified by ax[0,0]
extent = ax[0,0].get_window_extent().transformed(fig.dpi_scale_trans.inverted())
# Saving it to a pdf file.
fig.savefig('ax2_figure.pdf', bbox_inches=extent.expanded(1.1, 1.2))
EDIT: I believe I may have misunderstood what you want. If you want to save EACH plot individually, say as 4 different pages in a pdf, you can do the following adapted from this answer: https://stackoverflow.com/a/29435953/16299117
This will save each subplot from the figure as a different page in a single pdf.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
fig,ax=plt.subplots(2,2,figsize=(15,10))
x=np.linspace(-3,3)
ax[0,0].plot(x,x**2) # This is just to make an actual plot.
with PdfPages('foo.pdf') as pdf:
for x in range(ax.shape[0]):
for y in range(ax.shape[1]):
extent = ax[x, y].get_window_extent().transformed(fig.dpi_scale_trans.inverted())
pdf.savefig(bbox_inches=extent.expanded(1.1, 1.2))

pandas DataFrame.plot() method

I'm new to data science and trying some python libraries. I know it sound a bit silly but I'm confused with the code below, which i found on the pandas docs. I'm assuming that 'ts' is a pd obj, but how exactly a pd object can use matplotlib method here? What's the connection between pandas and matplotlib? Can someone explain that to me, thank you.
In [3]: ts = pd.Series(np.random.randn(1000),index=pd.date_range('1/1/2000', periods=1000))
In [4]: ts = ts.cumsum()
In [5]: ts.plot()
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x7fa17967caf0>`
Matplotlib is a library that makes it easy to generate plots in Python. Pandas is a library that helps you perform vector and matrix operations in Python.
According to the Pandas docs:
The plot method on Series and DataFrame is just a simple wrapper around plt.plot()
So the only connection between Pandas and Matplotlib is that Pandas uses Matplotlib to generate the plot for you.
If you want to see that plot, you have to add a couple of extra lines:
import matplotlib.pyplot as plt
plt.show()

Create heatmap of matrix using Seaborn matplotlib in Python

I have exported a large Matrix from Matlab to a data.dat file, which is tab delimited. I am importing this data into a iPython script to use seaborn to create a heatmap of the matrix using the following script:
import numpy as np
import seaborn as sns
import matplotlib.pylab as plt
uniform_data = np.loadtxt("data.dat", delimiter="\t")
ax = sns.heatmap(uniform_data, linewidth=0.0)
plt.show()
This code runs fine and outputs a correct heatmap. For small matrices, the output has a nice variation indicating the matrix elements:
However, if the size of the matrix increases in size, the result seems to have a uniform colour, which indicates that the result needs to be normalised:
which does not seem to contain any extractable information. How can I address this?

Plotting a graph takes a long time in Python

For plotting 100,000 to 500,000 data point in a text file I use the following code.
The problem is:
If I copy and paste the data points in a plotting software, reaching the plot takes just 30 seconds but with the following code it may take 1 hour or more to plot by Python.
import numpy as np
import matplotlib.pyplot as plt
from math import *
cmin=502.8571071527562
c,O=np.genfromtxt('textfile.txt',unpack=True)
for i in range(len(O)):
q=exp(-0.5*(c[i]-cmin))
plt.plot(O[i], q, 'bo')
plt.show()
What is the problem? How could I solve it?
I appreciate your help.
Some general rules:
use numpy, not math
avoid for-loops
Do not create unnecessary artists.
Here you want to create a single artist with all points, instead of 500000 single artists with one point each.
import numpy as np
import matplotlib.pyplot as plt
cmin=502.8571071527562
c,O=np.genfromtxt('textfile.txt',unpack=True)
q=np.exp(-0.5*(c-cmin))
plt.plot(O, q, 'bo')
plt.show()

Categories

Resources