I have the following issue with matplotlib. I have this Numpy-Matrix and now I plot plt.plot(wp[:, 0]) the first column which works flawlessly. Now on the x-axis I have written (1..2..3..4..5..6..7..8..9) (for this example))
But instead I would like to have there (0.1..2...3.4) So it should display me the current value of the second column. (The size of wp varies, so I need a general solution..)
wp=[[x1,0],
[x2,1],
[x3,1],
[x4,2],
[x5,2],
[x6,2],
[x7,3],
[x8,3],
[x9,4]]
Edit: Sorry, I made a huge mistake, when I was lazy and created the example matrix. The x-values are all different.
Edit2: To be more precise. In this example the x-value of x1 should be 0, and the stick also 0. Then x2 should be right to x1 and should have the x-tick 1. x3 should be right to x2 and there should be no x-tick displayed. x4 should be right of x3 and there should be the x-tick 2, and so forth. So it should be plotted like plt.plot(wp[:, 0]) does, but on the x-axis I want to see in which area, the second column is 0 or 1 or 2 or ...
import matplotlib.pyplot as plt
import numpy as np
# creating random values for x1,x2 and x3
x1 = 1
x2 = 2
x3 = 3
wp=[[x1,0],
[x2,1],
[x3,1],
[x1,2],
[x2,2],
[x3,2],
[x1,3],
[x2,3],
[x3,4]]
my_xticks = [x[1] for x in wp] # taking the second values from tuple
my_xticks = list(set(my_xticks)) # removing duplicates
# In [16]: my_xticks
# Out[16]: [0, 1, 2, 3, 4] # values of my_xticks after removing duplicates
x = [x[0] for x in wp]
y = x
plt.xticks(my_xticks)
plt.plot(x, y)
plt.show()
Related
I'm processing joystick data. There are two time series, one for the joystick's X motion and another for its Y motion. The two data sets have different time stamps. In the end, I hope to use matplotlib to plot a parametric 2D graph of the joystick data (where time is implicit, and the X and Y motion make up the points on the graph). However, before this end goal, I have to "merge" the two time series. For convenience, I'm going to assume that joystick motion is linear between timestamps.
I've coded something that can complete this (see below), but it seems needlessly complex. I'm hoping to find a more simplistic approach to accomplish this linear interpolation if possible.
import numpy as np
import matplotlib.pyplot as plt
# Example data
X = np.array([[0.98092103, 1013],
[1.01400101, 375],
[1.0561214, -8484],
[1.06982589, -17181],
[1.09453125, -16965]])
Y = np.array([[0.98092103, 534],
[1.00847602, 1690],
[1.0392499, -5327],
[1.06982589, -27921],
[1.10026598, -28915]])
data = []
# keep track of which index was used last
current_indices = [-1, -1]
# make ordered list of all timestamps between both data sets, no repeats
all_timestamps = sorted(set(X[:, 0]).union(set(Y[:, 0])))
for ts in all_timestamps:
# for each dimension (X & Y), index where timestamp exists, if timestamp exists. Else None
ts_indices = tuple(indx[0] if len(indx := np.where(Z[:, 0] == ts)[0]) > 0 else None
for Z in (X, Y))
# Out of range timesteps assumed to be zero
ts_vals = [0, 0]
for variable_indx, (current_z_indx, Z) in enumerate(zip(ts_indices, (X, Y))):
last_index_used = current_indices[variable_indx]
if current_z_indx is not None:
# If timestep is present, get value
current_indices[variable_indx] = current_z_indx
ts_vals[variable_indx] = Z[current_z_indx, 1]
elif last_index_used not in (-1, len(Z[:, 0]) - 1):
# If timestep within range of data, linearly interpolate
t0, z0 = Z[last_index_used, :]
t1, z1 = Z[last_index_used + 1, :]
ts_vals[variable_indx] = z0 + (z1 - z0) * (ts - t0) / (t1 - t0)
data.append([ts, *ts_vals])
merged_data = np.array(data)
plt.plot(merged_data[:,1],merged_data[:,2])
plt.show()
You are looking for np.interp to simplify the linear interpolation.
Following your example:
import numpy as np
import matplotlib.pyplot as plt
# Example data
X = np.array([[0.98092103, 1013],
[1.01400101, 375],
[1.0561214, -8484],
[1.06982589, -17181],
[1.09453125, -16965]])
Y = np.array([[0.98092103, 534],
[1.00847602, 1690],
[1.0392499, -5327],
[1.06982589, -27921],
[1.10026598, -28915]])
#extract all timestamps
all_timestamps = sorted(set(X[:, 0]).union(set(Y[:, 0])))
#linear interpolation
valuesX = np.interp(all_timestamps, X[:,0], X[:,1])
valuesY = np.interp(all_timestamps, Y[:,0], Y[:,1])
#plotting
plt.plot(valuesX, valuesY)
plt.show()
My experience with Python is pretty basic. I have written Python code to import data from an external file and perform a calculation. My result looks something like this (except much larger in reality).
1 1
1 1957
1 0.15
2 346
2 0.90
2 100
3 1920
3 100
3 40
What I want to do is plot these two columns as a single series, but then distinguish each data point according to a certain pattern. I know this sounds unnecessarily complicated, but it's something I need to do to help out the people who will use my code. Unfortunately, my Python skills fail me here. More specifically:
1. The first column has "1," "2," or "3." So first I want to make all the "1" data points circles (for example), all the "2" data points some other symbol, and likewise for the "3" data points.
2. Next. There are three rows for each distinct number. So for "1," the "0.15" in the second column is the average value, the "1957" is the maximum value, the "1" is the minimum value. I want to make the data point associated with each number's average value (the top row for each number) green (for example). I want the maximum and minimum values to have their own colors too.
So I will end up with a plot that shows one series only, but where each data point looks distinct. If anyone could please point me in the right direction, I would be very grateful. If I have not said this clearly, please let me know and I'll try again!
For different marker styles you currently need to create different plot instances (see this github issue). Using different colors can be done by passing an array as the color argument. So for example:
import matplotlib.pyplot as plt
import numpy as np
data = np.array([
[1, 0.15],
[1, 1957],
[1, 1],
[2, 346],
[2, 0.90],
[2, 100],
[3, 1920],
[3, 100],
[3, 40],
])
x, y = np.transpose(data)
symbols = ['o', 's', 'D']
colors = ['blue', 'orange', 'green']
for value, marker in zip(np.unique(x), symbols):
mask = (x == value)
plt.scatter(x[mask], y[mask], marker=marker, color=colors)
plt.show()
What I would do is to separate the data into three different columns so you have a few series. Then I'd use the plt.scatter with different markers to get the desired effect.
code
import matplotlib.pyplot as plt
import numpy as np
# Fixing random state for reproducibility
np.random.seed(19680801)
N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2 # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x ** 2 + y ** 2)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))
plt.show()
source: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
I have a data set of 2 1D arrays. My goal is to count the points in each section of a grid (with a size of my choosing).
plt.figure(figsize=(8,7))
np.random.seed(5)
x = np.random.random(100)
y = np.random.random(100)
plt.plot(x,y,'bo')
plt.grid(True)
My Plot
I would like to be able to split each section into is own unique set of 2 1D or 1 2D arrays.
import numpy as np
def split(arr, cond):
return [arr[cond], arr[~cond]]
a = np.array([1,3,5,7,2,4,6,8])
print split(a, a<5)
this will return a list of two arrays containing [1,2,3,4] and [5,6,7,8].
Try using this function based on the conditions you set (intervals of 0.2 it seems)
NOTE: to implement this correctly for your problem, you'll have to modify the split function seeing that you want to split the data into more than two sections. I'll leave that as an exercise for you to do :)
This function takes in two 1D arrays and returns a 2D matrix, in which each element is the number of points in the grid section corresponding to your image:
import numpy as np
def count_points(arr1, arr2, bin_width):
x = np.floor(arr1/bin_width).astype(int) # Bin number for each value
y = np.floor(arr2/bin_width).astype(int) # Bin number for each value
counts = np.zeros(shape=(max(x)+1, max(y)+1), dtype=int)
for i in range(x.shape[0]):
row = max(y) - y[i]
col = x[i]
counts[row, col] += 1
return counts
Note that x and y don't line up with the column and row index, since the origin is at the bottom left in the plot but the "origin" (index [0,0]`) of the matrix is the top left. I rearranged the matrix so that the elements line up with what you see in the photo.
Example:
np.random.seed(0)
x = np.random.random(100)
y = np.random.random(100)
print count_points(x, y, 0.2) # 0.2 matches the default gridlines in matplotlib
# Output:
#[[8 4 5 4 0]
# [2 5 5 7 4]
# [7 1 3 8 3]
# [4 2 5 3 4]
# [4 4 3 1 4]]
Which matches the counts here:
I am having trouble parsing a text file that I created with another program. The text file looks something like this:
velocity 4
0 0
0.0800284750334461 0.0702333599787275
0.153911082737118 0.128537103048848
0.222539323234924 0.176328826156044
0.286621942300277 0.21464146333504
0.346732028739683 0.244229944930359
0.403339781262399 0.265638972071027
...
velocity 8
0 0
0.169153136373962 0.124121036173475
0.312016311613761 0.226778846267302
0.435889653693839 0.312371513797743
0.545354054604357 0.383832483710643
0.643486956562741 0.443203331839287
...
I want to grab the number in the same row as velocity (the header) and save it as the title of the plot of the subsequent data. Every other row apart from the header represents the x and y coordinates of a shooting ball.
So if I have five different headers, I would like to see five different lines on a single graph with a legend displaying the different velocities.
Here is my python code so far. I am close to what I want to get, but I am missing the first set of data (velocity = 4 m/s) and the colors on my legend don't match the line colors.
import matplotlib.pyplot as plt
xPoints = []
yPoints = []
fig, ax = plt.subplots()
with open('artilleryMotion.txt') as inf:
for line in inf:
column = line.split()
if line.startswith("v"):
velocity = column[1]
ax.plot(xPoints, yPoints, label = '%s m/s' % velocity)
else:
xPoints.append(column[0])
yPoints.append(column[1])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)
I have been struggling with this for a while.
Edit_1: This is my output at the moment:
Artillery motion plot
Edit_2: I removed the indentation of the last lines of code. The color problem still occurs.
Edit_3: How would I go about saving the x and y points to a new array for each velocity? This may solve my issues.
Edit_4: Thanks to Charles Morris, I was able to create these plots. I just need to now determine if the initial upwards "arcing" motion by the ping pong ball for the higher velocities is representative of the physics or is a limitation of my code.
Artillery Motion Final
Edit: Ignore the old information, and see Solved solution below:
The following code works an example text file: input.txt
velocity 4
0 0
0.0800284750334461 0.0702333599787275
0.153911082737118 0.128537103048848
0.222539323234924 0.176328826156044
0.286621942300277 0.21464146333504
0.346732028739683 0.244229944930359
0.403339781262399 0.265638972071027
velocity 8
0 0
0.169153136373962 0.124121036173475
0.312016311613761 0.226778846267302
0.435889653693839 0.312371513797743
0.545354054604357 0.383832483710643
0.643486956562741 0.443203331839287
1) Import our text file
We use np.genfromtxt() for imports. In this case, we can Specify that dtype = float. This has the effect that the affect that Numbers are imported as 'Float' and thus, strings (in this case 'Velocity'), are imported as NaN.
Source:
https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html
How to use numpy.genfromtxt when first column is string and the remaining columns are numbers?
from matplotlib import pyplot as plt
from itertools import groupby
from numpy import NaN as nan
A = np.genfromtxt('input.txt',dtype=float)
>>>
array([[ nan, 4. ],
[ 0. , 0. ],
[ 0.08002848, 0.07023336],
[ 0.15391108, 0.1285371 ],
[ 0.22253932, 0.17632883],
[ 0.28662194, 0.21464146],
[ 0.34673203, 0.24422994],
[ 0.40333978, 0.26563897],
[ nan, 8. ],
[ 0. , 0. ],
[ 0.16915314, 0.12412104],
[ 0.31201631, 0.22677885],
[ 0.43588965, 0.31237151],
[ 0.54535405, 0.38383248],
[ 0.64348696, 0.44320333]])
2) Slice the imported array A
We can slice these arrays into separate X and Y arrays representing our X and Y values. Read up on array slicing in numpy here: https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
In this case, we take all values with index = 0 (X) and all values with index 1 (Y):
# x values
# y values
X = A[:,0]
Y = A[:,1]
>>> X = array([ nan, 0. , 0.08002848, 0.15391108, 0.22253932,
0.28662194, 0.34673203, 0.40333978, nan, 0. ,
0.16915314, 0.31201631, 0.43588965, 0.54535405, 0.64348696])
>>> Y = array([ 4. , 0. , 0.07023336, 0.1285371 , 0.17632883,
0.21464146, 0.24422994, 0.26563897, 8. , 0. ,
0.12412104, 0.22677885, 0.31237151, 0.38383248, 0.44320333])
3) Split the data for each velocity.
Here we desire to separate our X and Y values into those for each Velocity. Our X values are separated by Nan and our Y values are separated by 4,8,16....
Thus: For x, we split by nan. nan is a result of the genfromtxt() parsing Velocity as a float and returning nan.
Sources:
numpy: split 1D array of chunks separated by nans into a list of the chunks
Split array at value in numpy
For y, we split our array up on the numbers 4,8,16 etc. To do this, we exclude values that, when divided by 4, have zero remainder (using the % Python operator).
Sources:
Split array at value in numpy
How to check if a float value is a whole number
Split NumPy array according to values in the array (a condition)
Find the division remainder of a number
How do I use Python's itertools.groupby()?
XX = [list(v) for k,v in groupby(X,np.isfinite) if k]
YY = [list(v) for k,v in groupby(Y,lambda x: x % 4 != 0 or x == 0) if k]
>>>
XX = [[0.0,
0.080028475033446095,
0.15391108273711801,
0.22253932323492401,
0.28662194230027699
0.34673202873968301,
0.403339781262399],
[0.0,
0.16915313637396201,
0.31201631161376098,
0.43588965369383897,
0.54535405460435704,
0.64348695656274102]]
>>> YY =
[[0.0,
0.070233359978727497,
0.12853710304884799,
0.17632882615604401,
0.21464146333504,
0.24422994493035899,
0.26563897207102699],
[0.0,
0.124121036173475,
0.22677884626730199,
0.31237151379774297,
0.38383248371064299,
0.44320333183928701]]
4) Extract labels
Using a similar technique as above, we accept values = to our velocities 4,8,16 etc. In this case, we accept only those numbers which, when divided by 4, have 0 remainder, and are not 0. We then convert to a string and add m/s.
Ylabels = [list(v) for k,v in groupby(Y,lambda x: x % 4 == 0 and x != 0) if k]
Velocities = [str(i[0]) + ' m/s' for i in Ylabels]
>>> Y labels = [[4.0], [8.0]]
>>> Velocities = ['4.0 m/s', '8.0 m/s']
5) Plot
Plot values by index for each velocity.
fig, ax = plt.subplots()
for i in range(0,len(XX)):
plt.plot(XX[i],YY[i],label = Velocities[i])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)
Code Altogether:
import numpy as np
from matplotlib import pyplot as plt
from itertools import groupby
from numpy import NaN as nan
A = np.genfromtxt('input.txt',dtype=float)
X = A[:,0]
Y = A[:,1]
Ylabels = [list(v) for k,v in groupby(Y,lambda x: x % 4 == 0 and x != 0) if k]
Velocities = [str(i[0]) + ' m/s' for i in Ylabels]
XX = [list(v) for k,v in groupby(X,np.isfinite) if k]
YY = [list(v) for k,v in groupby(Y,lambda x: x % 4 != 0 or x == 0) if k]
fig, ax = plt.subplots()
for i in range(0,len(XX)):
plt.plot(XX[i],YY[i],label = Velocities[i])
ax.legend()
plt.title("Ping-Pong Ball Artillery Motion")
plt.xlabel("distance")
plt.ylabel("height")
plt.ylim(ymin = 0)
ax.set_autoscaley_on(1)
Old Answer:
The first time you iterate over all lines in the file, your xPoints and yPoints arrays are empty. Therefore, when you try and plot values for v = 4, you are plotting an empty array - hence your missing line.
You need to populate the arrays first, and then plot them. At the moment, you are plotting the values for v = 4 in the line labelled v = 8, and for v = 8, the values for v = 16 and so on.
Ignore:
For the array population, try the following:
xPoints = []
yPoints = []
with open('artilleryMotion.txt') as inf:
# initialize placeholder velocity variable
velocity = 0
for line in inf:
column = line.split()
if line.startswith("v"):
velocity = column[1]
else:
xPoints.append({velocity: column[0]})
yPoints.append({velocity: column[1]})
In the above, you save the data as a list of dictionaries (separate for x and y points), where the key is equal to the velocity that has been read in most recently, and the values are the x and y coordinates.
As a new velocity is read in, the placeholder variable velocity is updated and so the x and y values can be identified according the key that they have.
This allows you to Seaprate your plots by dictionary key (look up D.iteritems() D.items() ) and you can plot each set of points individually.
After running a multiple linear regression using numpy.linalg.lstsq I get 4 arrays as described in the documentation, however it is not clear to me how do I get the intercept value. Does anyone know this? I'm new to statistical analysis.
Here is my model:
X1 = np.array(a)
X2 = np.array(b)
X3 = np.array(c)
X4 = np.array(d)
X5 = np.array(e)
X6 = np.array(f)
X1l = np.log(X1)
X2l = np.log(X2)
X3l = np.log(X3)
X6l = np.log(X6)
Y = np.array(g)
A = np.column_stack([X1l, X2l, X3l, X4, X5, X6l, np.ones(len(a), float)])
result = np.linalg.lstsq(A, Y)
This is a sample of what my model is generating:
(array([ 654.12744154, -623.28893569, 276.50269246, 11.52493817,
49.92528734, -375.43282832, 3852.95023087]), array([ 4.80339071e+11]),
7, array([ 1060.38693842, 494.69470547, 243.14700033, 164.97697748,
58.58072929, 19.30593045, 13.35948642]))
I believe the intercept is the second array, still I'm not sure about that, as its value is just too high.
The intersect is the coefficient that corresponds to the column of ones, which in this case is:
result[0][6]
To make it clearer to see, consider your regression, which is something like:
y = c1*x1 + c2*x2 + c3*x3 + c4*x4 + m
written in matrix form as:
[[y1], [[x1_1, x2_1, x3_1, x4_1, 1], [[c1],
[y2], [x1_2, x2_2, x3_2, x4_2, 1], [c2],
[y3], = [x1_3, x2_3, x3_3, x4_3, 1], * [c3],
... ... [c4],
[yn]] [x1_n, x2_n, x3_n, x4_n, 1]] [m]]
or:
Y = A * C
where A is the so called "Coefficient' matrix and C the vector containing the solution for your regression. Note that m corresponds to the column of ones.