How to iterate distance calculation for different vehicles from coordinates - python

I am new to coding and need help developing a Time Space Diagram (TSD) from a CSV file which I got from a VISSIM simulation as a result.
A general TSD looks like this: TSD and I have a CSV which looks like this:
CSV.
I want to take "VEHICLE:SIMSEC" which represent the simulation time which I want it represented as the X axis on TSD, "NO" which represent the vehicle number (there are 185 different vehicles and I want to plot all 185 of them on the plot) as each of the line represented on TSD, "COORDFRONTX" which is the x coordinate of the simulation, and "COORDFRONTY" which is the y coordinate of the simulation as positions which would be the y axis on TSD.
I have tried the following code but did not get the result I want.
import pandas as pd
import matplotlib.pyplot as mp
# take data
data = pd.read_csv(r"C:\Users\hk385\Desktop\VISSIM_DATA_CSV.csv")
df = pd.DataFrame(data, columns=["VEHICLE:SIMSEC", "NO", "DISTTRAVTOT"])
# plot the dataframe
df.plot(x="NO", y=["DISTTRAVTOT"], kind="scatter")
# print bar graph
mp.show()
The plot came out to be uninterpretable as there were too many dots. The diagram looks like this: Time Space Diagram. So would you be able to help me or guide me to get a TSD from the CSV I have?
Suggestion made by mitoRibo,
The top 20 rows of the csv is the following:
VEHICLE:SIMSEC,NO,LANE\LINK\NO,LANE\INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT
5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,8.42
6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86
6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29
6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73
6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18
6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64
6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1
6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56
6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04
6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51
6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,93.0
7.0,1,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,101.49
7.1,1,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,109.99
7.2,1,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,118.49
7.3,1,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,127.0
7.4,1,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,135.51
7.5,1,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,144.03
7.6,1,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,152.56
7.7,1,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,161.09
Thank you.

You can groupby and iterate through different vehicles, adding each one to your plot. I changed your example data so there were 2 different vehicles.
import pandas as pd
import io
import matplotlib.pyplot as plt
df = pd.read_csv(io.StringIO("""
VEHICLE:SIMSEC,NO,LANE_LINK_NO,LANE_INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT
5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,0
6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86
6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29
6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73
6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18
6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64
6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1
6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56
6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04
6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51
6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,90
6.0,2,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,0
6.1,2,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,30
6.2,2,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,40
6.3,2,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,50
6.4,2,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,60
6.5,2,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,70
6.6,2,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,80
6.7,2,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,90
"""),sep=',')
fig = plt.figure()
#Iterate through each vehicle, adding it to the plot
for vehicle_no,vehicle_df in df.groupby('NO'):
plt.plot(vehicle_df['VEHICLE:SIMSEC'],vehicle_df['DISTTRAVTOT'], label=vehicle_no)
plt.legend() #comment this out if you don't want a legned
plt.show()
plt.close()

If you don't mind could you please try this.
mp.scatter(x="NO", y=["DISTTRAVTOT"])
If still not work please attach your data for me to test from my side.

Related

Barplot with significant differences and interactions in python?

I started to use python 6 months ago and may be my question is a naive one. I would like to visualize my data and ANOVA statistics. It is common to do this using a barplot with added lines indicating significant differences and interactions. How do you make plot like this using python ?
enter image description here
Here is a simple dataframe, with 3 columns (A,B and the p_values already calculated with a t-test)
mport pandas as pd
import matplotlib.pyplot as plt
import numpy as np
ar = np.array([ [565.0, 81.0, 1.630947e-02],
[1006.0, 311.0, 1.222740e-27],
[2929.0, 1292.0, 5.559912e-12],
[3365.0, 1979.0, 2.507474e-22],
[2260.0, 1117.0, 1.540305e-01]])
df = pd.DataFrame(ar,columns = ['A', 'B', 'p_value'])
ax = plt.subplot()
# I calculate the percentage
(df.iloc[:,0:2]/df.iloc[:,0:2].sum()*100).plot.bar(ax=ax)
for container, p_val in zip(ax.containers,df['p_value']):
labels = [f"{round(v,1)}%" if (p_val > 0.05) else f"(**)\n{round(v,1)}%" for v in container.datavalues]
ax.bar_label(container,labels=labels, fontsize=10,padding=8)
plt.show()
Initially I just wanted to add a "**" each time a significant difference is observed between the 2 columns A & B. But the initial code above is not really working.
Now I would prefer having the added lines indicating significant differences and interactions between the A&B columns. But I have no ideas how to make it happen.
Regards
JYK

How can I calculate the time lag between two similar time series?

I'm trying to compute/visualize the time lag between 2 time series (I want to know the time lag between the humidity progression of outside and inside a room).
Each data point of my series was taken hourly. Plotting the 2 series together, I can clearly see a shift between them: Sorry for hiding the axis
Here are a part of my time series data. I will pack them in 2 arrays:
inside_humidity =
[11.77961297, 11.59755268, 12.28761522, 11.88797553, 11.78122077, 11.5694668,
11.70421932, 11.78122077, 11.74272005, 11.78122077, 11.69438733, 11.54126933,
11.28460592, 11.05624965, 10.9611012, 11.07527934, 11.25417308, 11.56040908,
11.6657186, 11.51171572, 11.49246536, 11.78594142, 11.22968373, 11.26840678,
11.26840678, 11.29447992, 11.25553344, 11.19711371, 11.17764047, 11.11922075,
11.04132778, 10.86996123, 10.67410607, 10.63493504, 10.74922916, 10.74922916,
10.6294765, 10.61011497, 10.59075345, 10.80373021, 11.07479154, 11.15223764,
11.19711371, 11.17764047, 11.15816723, 11.22250051, 11.22250051, 11.202915,
11.18332948, 11.16374396, 11.14415845, 11.12457293, 11.10498742, 11.14926578,
11.16896413, 11.16896413, 11.14926578, 10.8307902, 10.51742195, 10.28187137,
10.12608544, 9.98977276, 9.62267727, 9.31289289, 8.96438546, 8.77077022,
8.69332413, 8.51907042, 8.30609366, 8.38353975, 8.4513867, 8.47085994,
8.50980642, 8.52927966, 8.50980642, 8.55887037, 8.51969934, 8.48052831,
8.30425867, 8.2177078, 7.98402891, 7.92560918, 7.89950166, 7.83489682,
7.75789537, 7.5984808, 7.28426807, 7.39778913, 7.71943214, 8.01149931,
8.18276652, 8.23009255, 8.16215295, 7.93822471, 8.00350215, 7.93843482,
7.85072729, 7.49778011, 7.31782649, 7.29862668, 7.60162032, 8.29665484,
8.58797834, 8.50011383, 8.86757784, 8.76600556, 8.60491125, 8.4222628,
8.24923231, 8.14470714, 8.17351638, 8.52530093, 8.72220151, 9.26745883,
9.1580007, 8.61762692, 8.22187405, 8.43693644, 8.32414835, 8.32463974,
8.46833012, 8.55865487, 8.72647164, 9.04112806, 9.35578449, 9.59465974,
10.47339785, 11.07218093, 10.54091351, 10.56138918, 10.46099958, 10.38129168,
10.16434831, 10.10612612, 10.009246, 10.53502351, 10.8307902, 11.13420052,
11.64337309, 11.18958511, 10.49630791, 10.60856932, 10.37029108, 9.86281478,
9.64699826, 9.95341012, 10.24329812, 10.6848196, 11.47604231, 11.30505352,
10.72194974, 10.30058448, 10.05022037, 10.06318411, 9.90118897, 9.68530059,
9.47790657, 9.48585784, 9.61639418, 9.86244265, 10.29009361, 10.28297229,
10.32073088, 10.65389513, 11.09656351, 11.20188562, 11.24124169, 10.40503955,
9.74632512, 9.07606098, 8.85145589, 9.37080152, 9.65082743, 10.0707891,
10.68776091, 11.25879751, 11.0416348, 10.89558456, 10.7908258, 10.66539685,
10.7297755, 10.77571398, 10.9268264, 11.16021492, 11.60961709, 11.43827534,
11.96155427, 12.16116437, 12.80412266, 12.52540805, 11.96752965, 11.58099292]
outside_humidity =
[10.17449206, 10.4823292, 11.06818167, 10.82768699, 11.27582592, 11.4196233,
10.99393027, 11.4122507, 11.18192837, 10.87247831, 10.68664321, 10.37949651,
9.57155882, 10.86611665, 11.62547196, 11.32004266, 11.75537602, 11.51292063,
11.03107569, 10.7297755, 10.4345622, 10.61271497, 9.49271162, 10.15594248,
9.99053828, 9.80915398, 9.6452438, 10.06900573, 11.18075689, 11.8289847,
11.83334752, 11.27480708, 11.14370467, 10.88149985, 10.73930381, 10.7236597,
10.26210496, 11.01260226, 11.05428228, 11.58321342, 12.70523808, 12.5181118,
11.90023799, 11.67756426, 11.28859471, 10.86878222, 9.73984486, 10.18253902,
9.80915398, 10.50980784, 11.38673459, 11.22751685, 10.94171823, 10.56484228,
10.38220753, 10.05388847, 9.96147203, 9.90698862, 9.7732203, 9.85262125,
8.7412938, 8.88281702, 8.07919545, 8.02883587, 8.32341424, 8.07357711,
7.27302616, 6.73660684, 6.66722819, 7.29408637, 7.00046542, 6.46322019,
6.07150988, 6.00207234, 5.8818402, 6.82443881, 7.20212882, 7.52167696,
7.88857771, 8.351627, 8.36547023, 8.24802846, 8.18520693, 7.92420816,
7.64926024, 7.87944972, 7.82118727, 8.02091833, 7.93071882, 7.75789457,
7.5416447, 6.94430133, 6.65907535, 6.67454591, 7.25493614, 7.76939457,
7.55357806, 6.61479472, 7.17641357, 7.24664082, 8.62732387, 8.66913548,
8.70925667, 9.0477017, 8.24558224, 8.4330502, 8.44366397, 8.17995798,
8.1875752, 9.33296518, 9.66567041, 9.88581085, 8.95449382, 8.3587624,
9.20584448, 8.90605388, 8.87494884, 9.12694892, 8.35055177, 7.91879933,
7.78867253, 8.22800878, 9.03685287, 12.49630018, 11.11819755, 10.98869374,
10.65897176, 10.36444573, 10.052609, 10.87627021, 10.07379564, 10.02233847,
9.62022856, 11.21575473, 10.85483543, 11.67324627, 11.89234248, 11.10068132,
10.06942096, 8.50405894, 8.13168561, 8.83616476, 8.35675085, 8.33616802,
8.35675085, 9.02209801, 9.5530404, 9.44738836, 10.89645958, 11.44771721,
11.79943601, 10.7765335, 11.1453622, 10.74874776, 10.55195175, 10.34494483,
9.83813522, 11.26931785, 11.20641798, 10.51555027, 10.90808954, 11.80923545,
11.68300879, 11.60313809, 7.95163365, 7.77213815, 7.54209557, 7.30603673,
7.17842173, 8.25899805, 8.56494995, 10.44245578, 11.08542758, 11.74129079,
11.67979686, 12.94362214, 11.96285343, 11.8289847, 11.01388413, 10.6793698,
11.20662595, 11.97684701, 12.46383177, 11.34178655, 12.12477078, 12.48698059,
12.89325064, 12.07470295, 12.6777319, 10.91689448, 10.7676326, 10.66710434]
I know cross correlation is the right term to use, but after a while I still don't get the idea of using scipy.signal.correlate and numpy.correlate, because all I got is an array full of NaNs. So clearly I need some more knowledge in this area.
What I expect to achieve is probably a plot like those in the answer section of this thread How to make a correlation plot with a certain lag of two time series where I can see at how many hours the time lag is most likely.
Thank you a lot in advance!
With the given data, you can use the numpy and matplotlib modules to achieve the desired result.
so, you can do something like this:
import numpy as np
from matplotlib import pyplot as plt
x = np.array(inside_humidity)
y = np.array(outside_humidity)
fig = plt.figure()
# fit a curve of your choice
a, b = np.polyfit(inside_humidity, outside_humidity, 1)
y_fit = a * x + b
# scatter plot, and fitted plot (best fit used)
plt.scatter(inside_humidity, outside_humidity)
plt.plot(x, y_fit)
plt.show()
which gives this:

Plotting data with matplotlib takes forever & plot crashes with higher number of samples

got an issue with plotting x,y data. x is the time series, y the value as y(x). Data3.txt is a text file simply containing all the data with no headers in a matrix with 20 columns and 65534 rows.
Here is the code
import csv
import numpy as np
import matplotlib.pyplot as plt
dates = []
with open('Data3.txt') as csvDataFile:
csvReader = csv.reader(csvDataFile,quoting=csv.QUOTE_NONNUMERIC)
for row in csvReader:
dates.append(row)
np.array(map(float, dates))
time=[]
value=[]
samples=8000
for row in dates:
time.append(row[0])
value.append(row[1])
print(len(time))
print(len(time[:samples]))
plt.plot(time[:samples], value[:samples])
plt.ylim(0,40)
plt.xlim(0,1200)
plt.show()
The plot is shown until I set samples to 7000 - see attached Figure_1. As soon as I set samples to 8000 it takes a lot longer to plot & the outcome is Figure_2.
print(len(time))
gives 65543
print(len(time[:samples]))
gives 8000
Really confused by that. Can anyone explain where this error might come from? Any hint concerning a smarter way of plotting is much appreciated, as well. I would now align every column one listname & make the plots.

Problem with matplotlib.pyplot with matplotlib.pyplot.scatter in the argument s

My name is Luis Francisco Gomez and I am in the course Intermediate Python > 1 Matplotlib > Sizes that belongs to the Data Scientist with Python in DataCamp. I am reproducing the exercises of the course where in this part you have to make a scatter plot in which the size of the points are equivalent to the population of the countries. I try to reproduce the results of DataCamp with this code:
# load subpackage
import matplotlib.pyplot as plt
## load other libraries
import pandas as pd
import numpy as np
## import data
gapminder = pd.read_csv("https://assets.datacamp.com/production/repositories/287/datasets/5b1e4356f9fa5b5ce32e9bd2b75c777284819cca/gapminder.csv")
gdp_cap = gapminder["gdp_cap"].tolist()
life_exp = gapminder["life_exp"].tolist()
# create an np array that contains the population
pop = gapminder["population"].tolist()
pop_np = np.array(pop)
plt.scatter(gdp_cap, life_exp, s = pop_np*2)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
# Display the plot
plt.show()
However a get this:
But in theory you need to get this:
I don't understand what is the problem with the argument s in plt.scatter .
You need to scale your s,
plt.scatter(gdp_cap, life_exp, s = pop_np*2/1000000)
The marker size in points**2.
Per docs
This is because your sizes are too large, scale it down. Also, there's no need to create all the intermediate arrays:
plt.scatter(gapminder.gdp_cap,
gapminder.life_exp,
s=gapminder.population/1e6)
Output:
I think you should use
plt.scatter(gdp_cap, life_exp, s = gdp_cap*2)
or maybe reduce or scale pop_np

Choosing the correct values in excel in Python

General Overview:
I am creating a graph of a large data set, however i have created a sample text document so that it is easier to overcome the problems.
The Data is from an excel document that will be saved as a CSV.
Problem:
I am able to compile the data a it will graph (see below) However how i pull the data will not work for all of the different excel sheet i am going to pull off of.
More Detail of problem:
The Y-Values (Labeled 'Value' and 'Value1') are being pulled for the excel sheet from the numbers 26 and 31 (See picture and Code).
This is a problem because the Values 26 and 31 will not be the same for each graph.
Lets take a look for this to make more sense.
Here is my code
import pandas as pd
import matplotlib.pyplot as plt
pd.read_csv('CSV_GM_NB_Test.csv').T.to_csv('GM_NB_Transpose_Test.csv,header=False)
df = pd.read_csv('GM_NB_Transpose_Test.csv', skiprows = 2)
DID = df['SN']
Value = df['26']
Value1 = df['31']
x= (DID[16:25])
y= (Value[16:25])
y1= (Value1[16:25])
"""
print(x,y)
print(x,y1)
"""
plt.plot(x.astype(int), y.astype(int))
plt.plot(x.astype(int), y1.astype(int))
plt.show()
Output:
Data Set:
Below in the comments you will find the 0bin to my Data Set this is because i do not have enough reputation to post two links.
As you can see from the Data Set
X- DID = Blue
Y-Value = Green
Y-Value1 = Grey
Troublesome Values = Red
The problem again is that the data for the Y-Values are pulled from Row 10&11 from values 26,31 under SN
Let me know if more information is needed.
Thank you
Not sure why you are creating the transposed CSV version. It is also possible to work directly from your original data. For example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('CSV_GM_NB_Test.csv', skiprows=8)
data = df.ix[:,19:].T
data.columns = df['SN']
data.plot()
plt.show()
This would give you:
You can use pandas.DataFrame.ix() to give you a sliced version of your data using integer positions. The [:,19:] says to give you columns 19 onwards. The final .T transposes it. You can then apply the values for the SN column as column headings using .columns to specify the names.

Categories

Resources