I have a code which will plot multiple plots using matplotlib.My code is give below.
index = LFRO_tdms["Measurement Config"].as_dataframe()["Frequencies"]
for vdd in set_vdds:
for DUT in unique_DUTs:
temp_df = df[ (df["Serial"] == str(DUT)) & (df["VDD (V)"]==vdd) ]
temp_df["Serial ID - Temp(C)"] = temp_df["Serial"] + " - " + temp_df["Target Temp (°C)"].astype(dtype=str)
df_LFRO_data_to_plot = df_LFRO[temp_df["#Magnitude"].to_numpy(dtype=str)]
df_LFRO_data_to_plot.index = index
df_LFRO_data_to_plot.columns = temp_df["Serial ID - Temp(C)"]
df_LFRO_data_to_plot.plot(logx=True, colormap="jet")
plt.title("Unit: "+ DUT + " Vdd: " + str(vdd))
afetr running this code it will output multiple plots 12 0r 13 nos as shown below.
I need to implement the same using plotly.below is my code,but it is outputting plots in a different way.
fig = go.Figure()
index = LFRO_tdms["Measurement Config"].as_dataframe()["Frequencies"]
for vdd in set_vdds:
for DUT in unique_DUTs:
temp_df = df[ (df["Serial"] == str(DUT)) & (df["VDD (V)"]==vdd) ]
temp_df["Serial ID - Temp(C)"] = temp_df["Serial"] + " - " + temp_df["Target Temp (°C)"].astype(dtype=str)
df_LFRO_data_to_plot = df_LFRO[temp_df["#Magnitude"].to_numpy(dtype=str)]
df_LFRO_data_to_plot.index = index
df_LFRO_data_to_plot.columns = temp_df["Serial ID - Temp(C)"]
# df_LFRO_data_to_plot.plot(logx=True, colormap="jet")
# plt.title("Unit: "+ DUT + " Vdd: " + str(vdd))
fig.add_traces(go.Scatter(x=df_LFRO_data_to_plot.index, y=df_LFRO_data_to_plot.columns, mode='lines', name = ("Unit: "+ DUT + " Vdd: " + str(vdd))))
fig.show()
I need the plots to come as shown in the 1st image. May I know what mistake I am making.
Test Station Position Serial Timestamp ETC Temp (°C) ETC Pressure (kPa) ETC Humidity (%RH) Ref Mic Temp (°C) Site Temp (°C) Target Temp (°C) VDD (V) LFRO (Hz) #Magnitude
0 1 1 1 07LX-1 2022-11-17 23:05:51.591926 23.151848 99.334515 3.564379 14.349645 -30.041135 -35.0 1.60 9.221194 0
1 1 1 2 07LX-2 2022-11-17 23:05:51.591926 23.151848 99.334515 3.564379 14.349645 -30.257592 -35.0 1.60 8.995556 1
2 1 1 3 07LX-3 2022-11-17 23:05:51.591926 23.151848 99.334515 3.564379 14.349645 -30.511629 -35.0 1.60 9.452866 2
3 1 1 4 07LX-4 2022-11-17 23:05:51.591926 23.151848 99.334515 3.564379 14.349645 -29.863173 -35.0 1.60 9.299079 3
4 1 1 1 07LX-1 2022-11-17 23:09:41.373825 22.475499 99.338778 3.574306 12.311989 -28.114924 -35.0 1.66 7.390171 4
Related
I'm trying to calculate daily returns using the time weighted rate of return formula:
(Ending Value-(Beginning Value + Net Additions)) / (Beginning value + Net Additions)
My DF looks like:
Account # Date Balance Net Additions
1 9/1/2022 100 0
1 9/2/2022 115 10
1 9/3/2022 117 0
2 9/1/2022 50 0
2 9/2/2022 52 0
2 9/3/2022 40 -15
It should look like:
Account # Date Balance Net Additions Daily TWRR
1 9/1/2022 100 0
1 9/2/2022 115 10 0.04545
1 9/3/2022 117 0 0.01739
2 9/1/2022 50 0
2 9/2/2022 52 0 0.04
2 9/3/2022 40 -15 0.08108
After calculating the daily returns for each account, I want to link all the returns throughout the month to get the monthly return:
((1 + return) * (1 + return)) - 1
The final result should look like:
Account # Monthly Return
1 0.063636
2 0.12432
Through research (and trial and error), I was able to get the output I am looking for but as a new python user, I'm sure there is an easier/better way to accomplish this.
DF["Numerator"] = DF.groupby("Account #")[Balance].diff() - DF["Net Additions"]
DF["Denominator"] = ((DF["Numerator"] + DF["Net Additions"] - DF["Balance"]) * -1) + DF["Net Additions"]
DF["Daily Returns"] = (DF["Numerator"] / DF["Denominator"]) + 1
DF = DF.groupby("Account #")["Daily Returns"].prod() - 1
Any help is appreciated!
I have written a code to perform some data cleaning to get the final columns and values from a tab spaced file.
import matplotlib.image as image
import numpy as np
import tkinter as tk
import matplotlib.ticker as ticker
from tkinter import filedialog
import matplotlib.pyplot as plt
root = tk.Tk()
root.withdraw()
root.call('wm', 'attributes', '.', '-topmost', True)
files1 = filedialog.askopenfilename(multiple=True)
files = root.tk.splitlist(files1)
List = list(files)
%gui tk
for i,file in enumerate(List,1):
d = pd.read_csv(file,sep=None,engine='python')
h = d.drop(d.index[19:])
transpose = h.T
header =transpose.iloc[0]
df = transpose[1:]
df.columns =header
df.columns = df.columns.str.strip()
all_columns = list(df)
df[all_columns] = df[all_columns].astype(str)
k =df.drop(columns =['Op:','Comment:','Mod Type:', 'PN', 'Irradiance:','Irr Correct:', 'Lamp Voltage:','Corrected To:', 'MCCC:', 'Rseries:', 'Rshunt:'], axis=1)
k.head()
I want to run this code to multiple files and do the same and concatenate all the results to one data frame.
for eg, If I select 20 files, then new data frame with one line of header and all the 20 results below with increasing order of the value from the column['Module Temp:'].
It would be great if someone could provide a solution to this problem
Please find the link to sample data:https://drive.google.com/drive/folders/1sL2-CwCGeGm0-fvcpzMVzgFnYzN3wzVb?usp=sharing
The following code shows how to parse the files and extract the data. It doesn't show the tkinter GUI component. files will represent your selected files.
Assumptions:
The first 92 rows of the files are always the measurement parameters
Rows from 93 are the measurements.
The 'Module Temp' for each file is different
The lists will be sorted based on the sort order of mod_temp, so the data will be in order in the DataFrame.
The list sorting uses the accepted answer to Sorting list based on values from another list?
import pandas as p
from patlib import Path
# set path to files
path_ = Path('e:/PythonProjects/stack_overflow/data/so_data/2020-11-16')
# select the correct files
files = path_.glob('*.ivc')
# create lists for metrics
measurement_params = list()
mod_temp = list()
measurements = list()
# iterate through the files
for f in files:
# get the first 92 rows with the measurement parameters
mp = pd.read_csv(f, sep='\t', nrows=91, index_col=0)
# remove the whitespace and : from the end of the index names
mp.index = mp.index.str.replace(':', '').str.strip().str.replace('\\s+', '_')
# get the column header
col = mp.columns[0]
# get the module temp
mt = mp.loc['Module_Temp', col]
# add Modult_Temp to mod_temp
mod_temp.append(float(mt))
# get the measurements
m = pd.read_csv(f, sep='\t', skiprows=92, nrows=3512)
# remove the whitespace and : from the end of the column names
m.columns = m.columns.str.replace(':', '').str.strip()
# add Module_Temp column
m['mod_temp'] = mt
# store the measure parameters
measurement_params.append(mp.T)
# store the measurements
measurements.append(m)
# sort lists based on mod_temp sort order
measurement_params = [x for _, x in sorted(zip(mod_temp, measurement_params))]
measurements = [x for _, x in sorted(zip(mod_temp, measurements))]
# create a dataframe for the measurement parameters
df_mp = pd.concat(measurement_params)
# create a dataframe for the measurements
df_m = pd.concat(measurements).reset_index(drop=True)
df_mp
Title: Comment Op ID Mod_Type PN Date Time Irradiance IrrCorr Irr_Correct Lamp_Voltage Module_Temp Corrected_To MCCC Voc Isc Rseries Rshunt Pmax Vpm Ipm Fill_Factor Active_Eff Aperture_Eff Segment_Area Segs_in_Ser Segs_in_Par Panel_Area Vload Ivld Pvld Frequency SweepDelay SweepLength SweepSlope SweepDir MCCC2 MCCC3 MCCC4 LampI IntV IntV2 IntV3 IntV4 LoadV PulseWidth1 PulseWidth2 PulseWidth3 PulseWidth4 TRef1 TRef2 TRef3 TRef4 MCMode Irradiance2 IrrCorr2 Voc2 Isc2 Pmax2 Vpm2 Ipm2 Fill_Factor2 Active_Eff2 ApertureEff2 LoadV2 PulseWidth12 PulseWidth22 Irradiance3 IrrCorr3 Voc3 Isc3 Pmax3 Vpm3 Ipm3 Fill_Factor3 Active_Eff3 ApertureEff3 LoadV3 PulseWidth13 PulseWidth23 RefCellID RefCellTemp RefCellIrrMM RefCelIscRaw RefCellIsc VTempCoeff ITempCoeff PTempCoeff MismatchCorr Serial_No Soft_Ver
Nease 345W N345M72 STC Admin MCIND2021-058 ModuleType1 NaN 10-09-2020 19:12:52 100.007 100 Ref Cell 2400 25.2787 25 1.3669 46.4379 9.13215 0.43411 294.467 331.924 38.3403 8.65732 0.78269 1.89434 1.7106 243.36 72 1 19404 0 0 0 218000 10 100 0.025 0 1 1.155 1.155 20.4736 6.87023 6.8645 6 6 6.76 107.683 109.977 0 0 27.2224 0 0 0 False -1.#INF 70 0 0 0 0 0 0 0 0 5 107.683 109.977 -1.#INF 40 0 0 0 0 0 0 0 0 5 107.683 109.977 WPVS mono C-Si Ref Cell 25.9834 1001.86 0.15142 0.15135 -0.31 0.05 -0.4 0.9985 S91-00052 5.5.1
Solarium SGE24P330 STC Admin MCIND_2021_0074 ModuleType1 NaN 17-09-2020 15:06:12 99.3671 100 Ref Cell 2400 25.3380 25 1.3669 45.2903 8.87987 0.48667 216.763 311.031 36.9665 8.41388 0.77338 1.77510 1.60292 243.36 72 1 19404 0 0 0 218000 10 100 0.025 0 1 1.155 1.155 20.405 6.82362 6.8212 6 6 6.6 107.660 109.977 0 0 25.9418 0 0 0 False -1.#INF 70 0 0 0 0 0 0 0 0 4.943 107.660 109.977 -1.#INF 40 0 0 0 0 0 0 0 0 4.943 107.660 109.977 WPVS mono C-Si Ref Cell 25.3315 998.370 0.15085 0.15082 -0.31 0.05 -0.4 0.9985 S91-00052 5.5.1
Nease 345W N345M72 STC Admin MCIND2021-058 ModuleType1 NaN 10-09-2020 19:11:32 100.010 100 Ref Cell 2400 25.3557 25 1.3669 46.4381 9.11368 0.41608 299.758 331.418 38.3876 8.63345 0.78308 1.89144 1.70798 243.36 72 1 19404 0 0 0 218000 10 100 0.025 0 1 1.155 1.155 20.3820 6.87018 6.8645 6 6 6.76 107.683 109.977 0 0 27.2535 0 0 0 False -1.#INF 70 0 0 0 0 0 0 0 0 5 107.683 109.977 -1.#INF 40 0 0 0 0 0 0 0 0 5 107.683 109.977 WPVS mono C-Si Ref Cell 25.9614 1003.80 0.15171 0.15164 -0.31 0.05 -0.4 0.9985 S91-00052 5.5.1
Nease 345W N345M72 STC Admin MCIND2021-058 ModuleType1 NaN 10-09-2020 19:14:09 99.9925 100 Ref Cell 2400 25.4279 25 1.3669 46.4445 9.14115 0.43428 291.524 332.156 38.2767 8.67776 0.78236 1.89566 1.71179 243.36 72 1 19404 0 0 0 218000 10 100 0.025 0 1 1.155 1.155 20.5044 6.87042 6.8645 6 6 6.76 107.660 109.977 0 0 27.1989 0 0 0 False -1.#INF 70 0 0 0 0 0 0 0 0 5 107.660 109.977 -1.#INF 40 0 0 0 0 0 0 0 0 5 107.660 109.977 WPVS mono C-Si Ref Cell 26.0274 1000.93 0.15128 0.15121 -0.31 0.05 -0.4 0.9985 S91-00052 5.5.1
df_m.head()
Voltage Current mod_temp
0 -1.193405 9.202885 25.2787
1 -1.196560 9.202489 25.2787
2 -1.193403 9.201693 25.2787
3 -1.196558 9.201298 25.2787
4 -1.199711 9.200106 25.2787
df_m.tail()
Voltage Current mod_temp
14043 46.30869 0.315269 25.4279
14044 46.31411 0.302567 25.4279
14045 46.31949 0.289468 25.4279
14046 46.32181 0.277163 25.4279
14047 46.33039 0.265255 25.4279
Plot
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 8))
sns.scatterplot(x='Current', y='Voltage', data=df_m, hue='mod_temp', s=10)
plt.show()
Note
After doing this, I was having trouble plotting the data because the columns were not float type. However, an error occurred when trying to set the type. Looking back at the data, after row 92, there are multiple headers throughout the two columns.
Row 93: Voltage: Current:
Row 3631: Ref Cell: Lamp I:
Row 7169: Voltage2: Current2:
Row 11971: Ref Cell2: Lamp I2:
Row 16773: Voltage3: Current3:
Row 21575: Ref Cell3: Lamp I3:
Row 26377: Raw Voltage: Raw Current :
Row 29915: WPVS Voltage: WPVS Current:
I went back and used the nrows parameter when creating m, so only the first set of headers and associated measurements are extracted from the file.
I recommend writing a script using the csv module to read each file, and create a new file beginning at each blank row, this will make the files have consistent types of measurements.
This should be a new question, if needed.
There are various ways to do it. You can append one dataframe to another (basically stack one on top of the other), and you can do it in the loop. Here is an example. I use fake dfs but you will use your own
import pandas as pd
import numpy as np
combined = None
for _ in range(5):
# stub df creation -- you will use your real code here
df = pd.DataFrame(columns = ['Module Temp','A', 'B'], data = np.random.random((5,3)))
if combined is None:
# initialize with the first one
combined = df.copy()
else:
# add the next one
combined = combined.append(df, sort = False, ignore_index = True)
combined.sort_values('Module Temp', inplace = True)
Here combined will have all the dfs, sorted by 'Module Temp'
I'm trying to calculate a weighted percentage in a Django query.
This is an example of what my data looks like:
id start_date agency_id area_id housetype_id no_of_changed price_change_percentage total
6716 2017-08-26 11 1 1 16 -0.09 35
6717 2017-08-26 11 1 3 44 -0.11 73
6718 2017-08-26 11 1 4 7 -0.1 12
6719 2017-08-26 11 1 5 0 0 4
6720 2017-08-26 11 1 6 0 0 1
6721 2017-08-26 21 1 1 0 0 1
6722 2017-08-26 34 1 1 0 0 1
6723 2017-08-26 35 1 1 0 0 1
6724 2017-08-26 38 1 1 0 0 1
and this is my current code:
from django.db.models import F, FloatField, ExpressionWrapper
from app.models import PriceChange
def weighted_percentage(area_id, date_range, agency_id, housetype):
data = PriceChange.objects.filter(area_id=area_id,
start_date__range=date_range,
agency_id=agency_id,
)
if housetype:
data = data.filter(housetype=housetype) \
.values('start_date') \
.annotate(price_change_total=ExpressionWrapper((F('price_change_percentage') * F('no_of_changed')) / F('total'), output_field=FloatField())) \
.order_by('start_date')
else:
# what to do?
pass
x = [x['start_date'] for x in data]
y = [y['price_change_total'] for y in data]
return x, y
I figured out how to do the calculation when housetype is defined and I only need to data from one row. I can't figure out how to done when I need to calculate for multiple rows with the same start_date. I don't want a value for each row but for each start_date.
As an example (two rows with same start_date, area_id, agency_id but different housetype_ids):
no_of_changed price_change_percentage total
16 -0.09 35
44 -0.11 73
The calculation is in pseudocode:
((no_of_changed[0] * price_changed_percentage[0]) + (no_of_changed[0] * price_changed_percentage[0])) / (total[0] + total[1]) = price_change_total
((16 * -0.09) + (44 * -0.11)/ (35 + 73) = -0.03148148
I'm using Django 1.11 and Python 3.6.
You need to wrap the expression in a Sum expression.
Add the following import:
from django.db.models import Sum
Then add the following query
else:
data = data.values('start_date') \
.annotate(
price_change_total=ExpressionWrapper(
Sum(F('price_change_percentage') * F('no_of_changed')) / Sum(F('total')),
output_field=FloatField()
)
) \
.order_by('start_date')
What is happening here is that when you use an aggregation expression such as Sum inside an annotate() call, it is translated into a group_by query in the database. All columns listed in the previous values() clause are used to create the group_by query.
See this blog post for further explanation and a breakdown of the resulting SQL query.
I am trying to plot a number of bar charts with matplotlib having exactly 26 timestamps / slots at the x-axis and two integers for the y-axis. For most data sets this scales fine, but in some cases matplotlib lets the bars overlap:
Left overlapping and not aligned to xticks, right one OK
Overlapping
So instead of giving enough space for the bars they are overlapping although my width is set to 0.1 and my datasets have 26 values, which I checked.
My code to plot these charts is as follows:
# Plot something
rows = len(data_dict) // 2 + 1
fig = plt.figure(figsize=(15, 5*rows))
gs1 = gridspec.GridSpec(rows, 2)
grid_x = 0
grid_y = 0
for dataset_name in data_dict:
message1_list = []
message2_list = []
ts_list = []
slot_list = []
for slot, counts in data_dict[dataset_name].items():
slot_list.append(slot)
message1_list.append(counts["Message1"])
message2_list.append(counts["Message2"])
ts_list.append(counts["TS"])
ax = fig.add_subplot(gs1[grid_y, grid_x])
ax.set_title("Activity: " + dataset_name, fontsize=24)
ax.set_xlabel("Timestamps", fontsize=14)
ax.set_ylabel("Number of messages", fontsize=14)
ax.xaxis_date()
hfmt = matplotdates.DateFormatter('%d.%m,%H:%M')
ax.xaxis.set_major_formatter(hfmt)
ax.set_xticks(ts_list)
plt.setp(ax.get_xticklabels(), rotation=60, ha='right')
ax.tick_params(axis='x', pad=0.75, length=5.0)
rects = ax.bar(ts_list, message2_list, align='center', width=0.1)
rects2 = ax.bar(ts_list, message1_list, align='center', width=0.1, bottom=message2_list)
# update grid position
if (grid_x == 1):
grid_x = 0
grid_y += 1
else:
grid_x = 1
plt.tight_layout(0.01)
plt.savefig(r"output_files\activity_barcharts.svg",bbox_inches='tight')
plt.gcf().clear()
The input data looks as follows (example of a plot with overlapping bars, second picture)
slot - message1 - message2 - timestamp
0 - 0 - 42 - 2017-09-11 07:59:53.517000+00:00
1 - 0 - 4 - 2017-09-11 09:02:28.827875+00:00
2 - 0 - 0 - 2017-09-11 10:05:04.138750+00:00
3 - 0 - 0 - 2017-09-11 11:07:39.449625+00:00
4 - 0 - 0 - 2017-09-11 12:10:14.760500+00:00
5 - 0 - 0 - 2017-09-11 13:12:50.071375+00:00
6 - 0 - 13 - 2017-09-11 14:15:25.382250+00:00
7 - 0 - 0 - 2017-09-11 15:18:00.693125+00:00
8 - 0 - 0 - 2017-09-11 16:20:36.004000+00:00
9 - 0 - 0 - 2017-09-11 17:23:11.314875+00:00
10 - 0 - 0 - 2017-09-11 18:25:46.625750+00:00
11 - 0 - 0 - 2017-09-11 19:28:21.936625+00:00
12 - 0 - 0 - 2017-09-11 20:30:57.247500+00:00
13 - 0 - 0 - 2017-09-11 21:33:32.558375+00:00
14 - 0 - 0 - 2017-09-11 22:36:07.869250+00:00
15 - 0 - 0 - 2017-09-11 23:38:43.180125+00:00
16 - 0 - 0 - 2017-09-12 00:41:18.491000+00:00
17 - 0 - 0 - 2017-09-12 01:43:53.801875+00:00
18 - 0 - 0 - 2017-09-12 02:46:29.112750+00:00
19 - 0 - 0 - 2017-09-12 03:49:04.423625+00:00
20 - 0 - 0 - 2017-09-12 04:51:39.734500+00:00
21 - 0 - 0 - 2017-09-12 05:54:15.045375+00:00
22 - 0 - 0 - 2017-09-12 06:56:50.356250+00:00
23 - 0 - 0 - 2017-09-12 07:59:25.667125+00:00
24 - 0 - 20 - 2017-09-12 09:02:00.978000+00:00
25 - 0 - 0 - 2017-09-12 10:04:36.288875+00:00
Does anyone know how to prevent this from happening?
I calculated exactly 26 bars for every chart and actually expected them to have equally width. I also tried to replace the 0 with 1e-5, but that did not prevent any overlapping (which another post proposed).
The width of the bar is the width in data units. I.e. if you want to have a bar of width 1 minute, you would set the width to
plt.bar(..., width=1./(24*60.))
because the numeric axis unit for datetime axes in matplotlib is days and there are 24*60 minutes in a day.
For an automatic determination of the bar width, you may say that you want to have the bar width the smallest difference between any two successive values from the input time list. In that case, something like the following will do the trick
import numpy as np
import matplotlib.pyplot as plt
import datetime
import matplotlib.dates
t = [datetime.datetime(2017,9,12,8,i) for i in range(60)]
x = np.random.rand(60)
td = np.diff(t).min()
s1 = matplotlib.dates.date2num(datetime.datetime.now())
s2 = matplotlib.dates.date2num(datetime.datetime.now()+td)
plt.bar(t, x, width=s2-s1, ec="k")
plt.show()
I am using pandas library and I am having some problems with performance using .iloc on pandas.
The idea for main software is to search in each row and column of dataframe and if reach in any condition, update this specific row and column of this dataframe with a new value.
Below follow some lines of this code:
for cont, val in enumerate(id_truck_list):
print cont
for index, row in all_travel.iterrows():
id_tr = int(all_travel.iloc[index, 0])
begin = all_travel.iloc[index, 5]
end = all_travel.iloc[index, 11]
if int(val) == id_tr:
#print "test1"
#print id_tr
#print begin_list[cont]
#print begin
#print end_list[cont]
#print end
if begin_list[cont] >= begin:
if end_list[cont] <= begin:
pass
else:
#print 'h1'
all_travel.iloc[index, 18] = all_travel.iloc[index, 18] + 3
else:
if begin < end_list[cont] :
if end <= end_list[cont]:
#print 'h2'
#print(all_travel.iloc[index, 18])
all_travel.iloc[index, 18] = all_travel.iloc[index, 18] + 5
#print(all_travel.iloc[index, 18])
#print str(index)
else:
#print 'h3'
all_travel.iloc[index, 18] = all_travel.iloc[index, 18] + 7
else:
pass
This idea is performing in very slow way (more or less 10 rows per minute). Do you have any idea using pandas library
Below follow the all_travel.head()
truck_id id_farm gatec_dist gps_go_dist gps_ret_dist t1gatec \
0 2010028.0 76.0 11 11.8617 0.211655 2016-03-09 00:24:00
1 2010028.0 1.0 16.2 9.86 0.0637544 2016-03-13 23:57:00
2 2010028.0 75.0 18 10.78 9.65 2016-03-18 09:17:00
3 2010028.0 62.0 6 8.51291 3.99291 2016-03-19 20:16:00
4 2010028.0 62.0 6 2.91 0.0428008 2016-03-21 03:00:00
t1gps t2gatec t2gps t3gatec \
0 03/09/2016 00:09:58 0 03/09/2016 00:43:46 0
1 03/13/2016 23:46:00 0 03/14/2016 00:53:10 0
2 03/18/2016 09:13:15 0 03/18/2016 10:17:14 0
3 03/19/2016 20:29:59 0 03/19/2016 21:22:40 0
4 03/21/2016 02:49:34 0 03/21/2016 03:38:59 0
t3gps t4gatec t4gps wait_mill \
0 03/09/2016 07:00:15 2016-03-09 02:14:55 03/09/2016 02:14:55 154.500000
1 03/14/2016 13:54:30 2016-03-14 01:12:58 03/14/2016 01:12:58 124.733333
2 03/18/2016 12:07:00 2016-03-18 12:37:41 03/18/2016 12:44:01 408.316667
3 03/19/2016 23:57:22 2016-03-19 22:00:08 03/19/2016 22:00:08 256.083333
4 03/22/2016 00:09:56 2016-03-21 04:01:20 03/21/2016 04:01:20 47.333333
go_field wait_field ret_mill tot_trav maintenance_level
0 33.800000 376.483333 -285.333333 124.950000 1
1 67.166667 781.333333 -761.533333 86.966667 1
2 63.983333 109.766667 37.016667 210.766667 1
3 52.683333 154.700000 -117.233333 90.150000 1
4 49.416667 1230.950000 -1208.600000 71.766667 1
I have done another solution that has improved a lot my speed performance.
I changed parts of dataframe to list, due the better performance using lists against normal dataframe.
The conclusion, now I need to wait two minutes for the answer, not 3 days.
Bellow follow the modification
for cont, val in enumerate(id_truck_list):
for cont2, val2 in enumerate(id_truck_list2):
id_tr = int(id_truck_list2[cont2])
begin = begin_list2[cont2]
end = end_list2[cont2]
if int(id_truck_list[cont]) == id_tr:
if begin_list[cont] >= begin:
if begin_list[cont] >= end:
pass
else:
maintenance_list[cont2] = maintenance_list[cont2] + 3
else:
if begin < end_list[cont] :
if end <= end_list[cont]:
#print 'h2'
maintenance_list[cont2] = maintenance_list[cont2] +
#print str(index)
else:
#print 'h3'
maintenance_list[cont2] = maintenance_list[cont2] +
else:
pass
print 'list size ' + str(len(maintenance_list))
for cont3, val3 in enumerate(maintenance_list):
print 'list update ' + str(cont3)
all_travel.iloc[cont3, 18] = maintenance_list[cont3]