Export data efficiently to CSV using python

Export data efficiently to CSV using python - python

I am using my arduino to analyze analog inputs and I am accessing the arduino using the pyfirmata library and I ambasically measuring voltages using the 6 analog inputs on my arduino Uno. I need to find a way to live time feed this data into a CSV efficiently... I am not sure on the best way to do that
Any suggestion would help but please write out the code you suggest. I would prefer to use Pandas if possible because it's easier
voltage0 through voltage5 are my variables and I am trying to report those in a nice format that will later have to be analyzed
import time
from datetime import datetime
import pyfirmata
import pandas as pd
board = pyfirmata.Arduino('/dev/ttyACM1')
analog_pin0 = board.get_pin('a:0:i')
analog_pin1 = board.get_pin('a:1:i')
analog_pin2 = board.get_pin('a:2:i')
analog_pin3 = board.get_pin('a:3:i')
analog_pin4 = board.get_pin('a:4:i')
analog_pin5 = board.get_pin('a:5:i')
it = pyfirmata.util.Iterator(board)
it.start()
analog_pin0.enable_reporting()
analog_pin1.enable_reporting()
analog_pin2.enable_reporting()
analog_pin3.enable_reporting()
analog_pin4.enable_reporting()
analog_pin5.enable_reporting()
data = []
count = 0
x = 0
start = 0
while x <= 1000:
reading0 = analog_pin0.read()
if reading0 != None:
voltage0 = reading0 * 5
voltage0 = round(voltage0,2)
else:
voltage0 = float('nan')
reading1 = analog_pin1.read()
if reading1 != None:
voltage1 = reading1 * 5
voltage1 = round(voltage1,2)
else:
voltage1 = float('nan')
reading2 = analog_pin2.read()
if reading2 != None:
voltage2 = reading2 * 5
voltage2 = round(voltage2,2)
else:
voltage2 = float('nan')
reading3 = analog_pin3.read()
if reading3 != None:
voltage3 = reading3 * 5
voltage3 = round(voltage3,2)
else:
voltage3 = float('nan')
reading4 = analog_pin4.read()
if reading4 != None:
voltage4 = reading4 * 5
voltage4 = round(voltage4,2)
else:
voltage4 = float('nan')
reading5 = analog_pin5.read()
if reading5 != None:
voltage5 = reading5 * 5
voltage5 = round(voltage5,2)
else:
voltage5 = float('nan')
datarow = {'Voltage0': voltage0, 'Voltage1': voltage1, 'Voltage2' : voltage2, 'Voltage3': voltage3, 'Voltage4' : voltage4, 'Voltage5' : voltage5, 'Time' : time.strftime("%Y-%m-%d_%H:%M:%S")}
data.append(datarow)
if count%500 == 0:
dataframe = pd.DataFrame(data)
dataframe.to_csv('data.csv')
x += 1
count += 1
#time.sleep(1)enter code here

Your code seems to work, but it's not very efficient. Every 500 iterations, you rewrite all your data instead of updating your file with the new data in the end. You might consider saving it this way instead:
if count%500 == 0:
dataframe = pd.DataFrame(data)
dataframe.to_csv('data.csv',mode='a',header=False)
data = []
If it's still not fast enough, you might consider saving your data to a binary format such as .npy (numpy format), and convert it later to csv.

Related

How to optimize the finding of divergences between 2 signals

I am trying to create an indicator that will find all the divergences between 2 signals.
The output of the function so far looks like this
But the problem is that is painfully slow when I am trying to use it with long signals. Could any of you guys help me to make it faster if is possible?
My code:
def find_divergence(price: pd.Series, indicator: pd.Series, width_divergence: int, order: int):
div = pd.DataFrame(index=range(price.size), columns=[
f"Bullish_{width_divergence}_{order}",
f"Berish_{width_divergence}_{order}"
])
div[f'Bullish_idx_{width_divergence}_{order}'] = False
div[f'Berish_idx_{width_divergence}_{order}'] = False
def calc_argrelextrema(price_: np.numarray):
return argrelextrema(price_, np.less_equal, order=order)[0]
price_ranges = []
for i in range(len(price)):
price_ranges.append(price.values[0:i + 1])
f = []
with ThreadPoolExecutor(max_workers=16) as exe:
for i in price_ranges:
f.append(exe.submit(calc_argrelextrema, i))
prices_lows = SortedSet()
for r in concurrent.futures.as_completed(f):
data = r.result()
for d in reversed(data):
if d not in prices_lows:
prices_lows.add(d)
else:
break
price_lows_idx = pd.Series(prices_lows)
for idx_1 in range(price_lows_idx.size):
min_price = price[price_lows_idx[idx_1]]
min_indicator = indicator[price_lows_idx[idx_1]]
for idx_2 in range(idx_1 + 1, idx_1 + width_divergence):
if idx_2 >= price_lows_idx.size:
break
if price[price_lows_idx[idx_2]] < min_price:
min_price = price[price_lows_idx[idx_2]]
if indicator[price_lows_idx[idx_2]] < min_indicator:
min_indicator = indicator[price_lows_idx[idx_2]]
consistency_price_rd = min_price == price[price_lows_idx[idx_2]]
consistency_indicator_rd = min_indicator == indicator[price_lows_idx[idx_1]]
consistency_price_hd = min_price == price[price_lows_idx[idx_1]]
consistency_indicator_hd = min_indicator == indicator[price_lows_idx[idx_2]]
diff_price = price[price_lows_idx[idx_1]] - price[price_lows_idx[idx_2]] # should be neg
diff_indicator = indicator[price_lows_idx[idx_1]] - indicator[price_lows_idx[idx_2]] # should be pos
is_regular_divergence = diff_price > 0 and diff_indicator < 0
is_hidden_divergence = diff_price < 0 and diff_indicator > 0
if is_regular_divergence and consistency_price_rd and consistency_indicator_rd:
div.at[price_lows_idx[idx_2], f'Bullish_{width_divergence}_{order}'] = (price_lows_idx[idx_1], price_lows_idx[idx_2])
div.at[price_lows_idx[idx_2], f'Bullish_idx_{width_divergence}_{order}'] = True
elif is_hidden_divergence and consistency_price_hd and consistency_indicator_hd:
div.at[price_lows_idx[idx_2], f'Berish_{width_divergence}_{order}'] = (price_lows_idx[idx_1], price_lows_idx[idx_2])
div.at[price_lows_idx[idx_2], f'Berish_idx_{width_divergence}_{order}'] = True
return div

Making Excel histograms using python

This is the output of my python script so far.
Excel Table
The vertical axis of the table are road names. The horizontal axis are dates. The values indicate if a road was under construction at the time and why. I'd like to make a line graph that groups the dates by years 2017, 2018, 2019 etc... and plots the longest amount a time within those groups that a road was under construction and the average amount for the whole year. I'm a complete novice in excel and don't know how to leverage it's features to achieve my goal, though I suspect that there may be built in functions that do what I want without much difficulty. Any suggestions on how can achieve my desired output would be much appreciated. EDIT: It was suggested that I post my code so far.
import re
import time
startTime = time.time()
import collections
import xlsxwriter as xlswr
import scipy.spatial as spy
from itertools import islice
from itertools import groupby
from natsort import natsorted
from functools import partial
from collections import Counter
from datetime import date as DATE
from indexed import IndexedOrderedDict
from multiprocessing.dummy import Pool as ThreadPool
import multiprocessing as mp
workBook = xlswr.Workbook("testfix.xlsx")
cell_format = workBook.add_format()
format1 = workBook.add_format({'num_format': 'mm/dd/yy'})
sheet = workBook.add_worksheet()
def to_raw(string):
return fr"{string}"
def cvrt(x):
ans = re.split(r'(\d+)(?!.*\d)', x)
return int(ans[1])
def indexer(s):
pattern = re.compile(r'I, [0-9]+, ')
gm = re.split(pattern, s);
values = s.rsplit(gm[1])
gm = gm[1]
values[1] = gm
return values
def int2Date(x):
string = str(x)
Y = int(string[0:4])
M = int(string[4:6])
D = int(string[6:8])
return DATE(Y,M,D)
def dDelta(x, y):
string1 = str(x)
string2 = str(y)
Y1 = int(string1[0:4])
M1 = int(string1[4:6])
D1 = int(string1[6:8])
Y2 = int(string2[0:4])
M2 = int(string2[4:6])
D2 = int(string2[6:8])
f_date = DATE(Y1,M1,D1)
l_date = DATE(Y2,M2,D2)
delta = l_date - f_date
if isinstance(y, int):
return float(int((delta.days)/30.44))
else:
return int((delta.days)/30.44)
def Book(path):
file = open(path,'r')
lines = file.readlines()
file.close()
book = IndexedOrderedDict()
for line in lines:
if re.match("I", line):
IDs = indexer(line)[1]
if re.match(" 0.00,", line):
rID = line
#"GM_FINAL_AUTH,0,[1-9]"
if re.search("GM_FINAL_AUTH,0,[1-9]", line):
book.update({(rID, line): to_raw(IDs)})
return sort_book(book)
def dUpdate(dic, key, value):
return dic.update({(key[0], "GM_FINAL_AUTH,0,0"): value})
def valSplt(s):
pattern = re.compile(r'(\d+)')
gm = re.split(pattern, s)
values = s.rsplit(gm[1])
gm = gm[1]
values[1] = gm
return values
def sort_book(book):
book = natsorted([value, key] for key, value in book.items())
book = IndexedOrderedDict((data[1], data[0]) for data in book)
return book
def alph_order(word1, word2):
for i in range(min(len(word1), len(word2))):
if ord(word1[i]) == ord(word2[i]):
pass
elif ord(word1[i]) > ord(word2[i]):
return word2
else:
return word1
return word1
def read(cpdm, date_list):
sCnt = [0] * len(cpdm)
lowest_number = 999999999999
terminationCondition = [True] * len(cpdm)
saved_results = [0] * len(cpdm)
current_prefix = None
cnt = 0
while any(terminationCondition) is True:
saved_results = [0] * len(cpdm)
last_prefix = None
lowest_number = 999999999999
for dicIdx, dicVal in enumerate(sCnt):
if dicVal < len(cpdm[dicIdx]):
ID = cpdm[dicIdx].values()[dicVal]
# print(entry)
current_prefix, road_number = valSplt(ID)
road_number = int(road_number)
if last_prefix is None:
last_prefix = current_prefix
higherOrder_prefix = alph_order(last_prefix, current_prefix)
# print('check:',[higherOrder_prefix, last_prefix, current_prefix])
if current_prefix == higherOrder_prefix:
if current_prefix != last_prefix:
lowest_number = road_number
last_prefix = current_prefix
elif road_number < lowest_number:
lowest_number = road_number
last_prefix = current_prefix
for dicIdx, dicVal in enumerate(sCnt):
if dicVal < len(cpdm[dicIdx]):
# print(dicIdx, dicVal, len(cpdm[dicIdx]))
ID = cpdm[dicIdx].values()[dicVal]
VALUE = cpdm[dicIdx].keys()[dicVal]
# print(entry)
road_name, road_number = valSplt(ID)
road_number = int(road_number)
if road_name == last_prefix and lowest_number == road_number:
saved_results[dicIdx] = [ID, VALUE[1], date_list[dicIdx], VALUE[0]]
if dicVal < len(cpdm[dicIdx]):
sCnt[dicIdx] += 1
else:
terminationCondition[dicIdx] = False
else:
terminationCondition[dicIdx] = False
for rst in range(len(saved_results)):
if saved_results[rst] == 0:
pass
else:
sheet.write(cnt+1, 0, str(saved_results[rst][0]))
sheet.write(cnt+1, rst+1, cvrt(saved_results[rst][1]))
#sheet.write(cnt+1, 2*et+3, int2Date(saved_results[et][2]), format1)
#sheet.write(cnt+1, 0, saved_results[rst][3])
cnt += 1
def main():
# 2018 MAPS
path1 = "W:\\Scripting\\2018\\DBData_84577881.txt"
path2 = "W:\\Scripting\\2018\\DBData_84639568.txt"
path3 = "W:\\Scripting\\2018\\DBData_84652483.txt"
path4 = "W:\\Scripting\\2018\\DBData_84670490.txt"
# 2019 MAPS
path5 = "W:\\Scripting\\2019\\DBData_84706383.txt"
path6 = "W:\\Scripting\\2019\\DBData_84715201.txt"
path7 = "W:\\Scripting\\2019\\DBData_84743195.txt"
path8 = "W:\\Scripting\\2019\\DBData_84777742.txt"
path9 = "W:\\Scripting\\2019\\DBData_84815446.txt"
path10 = "W:\\Scripting\\2019\\DBData_84835743.txt"
# 2020 MAPS
path11 = "W:\\Scripting\\2020\\DBData_84882849.txt"
path12 = "W:\\Scripting\\2020\\DBData_84966202.txt"
path13 = "W:\\Scripting\\2020\\DBData_84988789.txt"
p_list = [path1, path2, path3, path4, path5, path6, path7,
path8, path9, path10, path11, path12, path13]
pool = mp.Pool(mp.cpu_count())
CPDM = pool.map(Book, p_list)
pool.close()
#pool.join()
date_list = [20180809, 20180913, 20181011, 20181204, 20190222, 20190325,
20190501, 20190628, 20190815, 20190925, 20200207, 20200501, 20200617]
#CPDM = [b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13]
for i in CPDM:
print(len(i))
#sheet.write("A1", "Lat Long")
sheet.write("A1", "ID")
#for i in range(len(CPDM)):
cn = 0
for i in date_list:
#sheet.write(0, 3*i+1, "ID" + str(i+1))
sheet.write(0, cn+1, int2Date(i), format1)
cn += 1
#sheet.write(0, 2*i+3, "Date" + str(i+1))
read(CPDM, date_list)
workBook.close()
if __name__ == "__main__":
main()
executionTime = (time.time() - startTime)
print('Execution time in minutes: ' + str(executionTime/60))

Long story short, what you want is not exactly possible. Your data contains spot measurements, so what happened in between? Or after? Was the road under construction or not? This makes it impossible to calculate an accurate number of days that the road was under construction.
It is possible to do something that approximates what you want, but that will require some choices from your side. For example, if you measure that the road is under construction on 08/15/2019 but not anymore on 05/01/2020, do you count all the days between those 2 dates as closed? Or only until new years?
To help you get started I've added a little script that does some formatting on your data. It should give you an idea of how to handle the data.
import pandas
import plotly.express as px
# Read the Excel file
df = pandas.read_excel("./test.xlsx", index_col="ID")
# Flip the dataframe (dates should be on the index)
df = df.transpose()
# Fill any empty cells with 0
df = df.fillna(0)
# Combine columns with the same name
df = df.groupby(df.columns, axis=1).agg(lambda column: column.max(axis=1))
# Make sure the dates are sorted
df = df.sort_index()
# Create a list to hold all the periods per road
roads = []
for road_name in df.columns:
# Group by consecutive 1's
groups = df.loc[df[road_name] == 1, road_name].groupby((df[road_name] != 1).cumsum())
# Every group denotes a period for which the road was under construction
for _, group in groups:
# Get the start and finish for each group
roads.append({
"road": road_name,
"start": group.index[0],
"finish": group.index[-1] + pandas.Timedelta(1, unit="D"), # Add one day because groups with same start and finish will not be visible on the plot
})
# Convert back to a dataframe
roads_df = pandas.DataFrame(roads)
# Create a Gantt chart with Plotly (NOTE: you'll need version 4.9+ of Plotly)
fig = px.timeline(roads_df, x_start="start", x_end="finish", y="road")
fig.update_yaxes(autorange="reversed") # otherwise tasks are listed from the bottom up
fig.show()

Python: A value is trying to be set on a copy of a slice from a DataFrame

What am I doing to deserve this warning from Python? Would like to avoid errors, if the warning is telling me something...
if DELfile.exists():
print(DELfile)
sectorDEL = pd.read_csv(DELfile, sep=';', header=0, float_precision='round_trip')
if i == 1:
sectorDELmax = sectorDEL
i = i + 1
else:
k = 0
for sensor in sectorDEL['SENSOR_NO']:
if sectorDEL['DAMAGE_EQUIVALENT_LOAD'][k] > sectorDELmax['DAMAGE_EQUIVALENT_LOAD'][k]:
sectorDELmax['DAMAGE_EQUIVALENT_LOAD'][k] = sectorDEL['DAMAGE_EQUIVALENT_LOAD'][k]
k = k + 1
else:
print('Could not find ' +str(DELfile))
return None

The while loop cannot progress pass i = 1

I have a small algorithm like this one.
I coded it in python
import pandas as pd
raw = pd.read_csv
i = 0
T = pd.DataFrame(columns = ['Values'])
singleT = raw.mean() + raw.std()
T = T.append(singleT, ignore_index=True)
if i == 0:
raw = raw.where(raw<T.iloc[i,:])
i += 1
while True:
singleT = raw.mean() + raw.std()
T = T.append(singleT, ignore_index=True)
if T.iloc[i,:].values == T.iloc[i-1,:].values:
break
background = T.iloc[i,:].values
else:
raw = raw.where(raw<T.iloc[i,:])
i += 1
print ('iteration{:02}'.format(i))
However, the loop didn't get pass i = 1 and keep repeating, the whole T array is filled with value when i = 1. I tried several modifications to my code but they also didn't work at all.
Any advice on how to fix this problem would be appreciated!
Thank you very much
Edit: I have inserted one tab for the last 4 lines, as you guys suggested to make sure the else belong to while, but now it has syntax invalid error
Edit2: Here is the correct code for this problem, it will not enter the infinite loop.
i = 0
T = pd.DataFrame(columns = ['Pulse counts'])
singleT = raw.mean() + raw.std()
T = T.append(singleT, ignore_index=True)
if i == 0:
filtered = raw.where(raw<T.iloc[i,:])
i += 1
while True:
singleT = filtered.mean() + filtered.std()
T = T.append(singleT, ignore_index=True)
if T.iloc[i,:].values == T.iloc[i-1,:].values or T.iloc[i-1,:].values == 0:
background = T.iloc[i-1,:].values
break
else:
filtered = filtered.where(filtered<T.iloc[i,:])
print ('iteration{}'.format(i))
i += 1

Try this way :
import pandas as pd
raw = pd.read_csv
i = 0
T = pd.DataFrame(columns = ['Values'])
singleT = raw.mean() + raw.std()
T = T.append(singleT, ignore_index=True)
while True:
singleT = raw.mean() + raw.std()
T = T.append(singleT, ignore_index=True)
if T.iloc[i,:].values == T.iloc[i-1,:].values:
background = T.iloc[i,:].values
break
else:
raw = raw.where(raw<T.iloc[i,:])
#i += 1
print ('iteration{:02}'.format(i))
i += 1

if your indentation is correct, the else part:
else:
raw = raw.where(raw<T.iloc[i,:])
i += 1
print ('iteration{:02}'.format(i))
doesn't belong to the if but to the while, which has a different meaning: it is executed only if loop ends without a break (less-known feature, also available with for which saves the need to define a flag when break has been reached)
So i is never incremented: infinite loop.

How to do assignment in Pandas without warning?

I'm trying to port this code in R to Python using Pandas.
This is my R code (assume data is a data.frame):
transform <- function(data) {
baseValue <- data$baseValue
na.base.value <- is.na(baseValue)
baseValue[na.base.value] <- 1
zero.base.value <- baseValue == 0
baseValue[zero.base.value] <- 1
data$adjustedBaseValue <- data$baseRatio * baseValue
baseValue[na.base.value] <- -1
baseValue[zero.base.value] <- 0
data$baseValue <- baseValue
return(data)
}
This is my attempt to port the R code in Python (assume data is pandas.DataFrame):
import pandas as pd
def transform(data):
base_value = data['baseValue']
na_base_value = base_value.isnull()
base_value.loc[na_base_value] = 1
zero_base_value = base_value == 0
base_value.loc[zero_base_value] = 1
data['adjustedBaseValue'] = data['baseRatio'] * base_value
base_value.loc[na_base_value] = -1
base_value.loc[zero_base_value] = 0
return data
But then I got this warning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
I have read through and don't understand how to fix it. What should I do to fix the code so that there is no more warning? I don't want to suppress the warning though.

If you want to modify the same object that was passed to the function, then this should work so long as what's passed in as data isn't already a view of another dataframe.
def transform(data):
base_value = data['baseValue']
na_base_value = base_value.isnull()
data.loc[na_base_value, 'baseValue'] = 1
zero_base_value = base_value == 0
data.loc[zero_base_value, 'baseValue'] = 1
data['adjustedBaseValue'] = data['baseRatio'] * base_value
data.loc[na_base_value, 'baseValue'] = -1
data.loc[zero_base_value, 'baseValue'] = 0
return data
If you want to work with a copy and return that manipulated copied data then this is your answer.
def transform(data):
data = data.copy()
base_value = data['baseValue'].copy()
na_base_value = base_value.isnull()
base_value.loc[na_base_value] = 1
zero_base_value = base_value == 0
base_value.loc[zero_base_value] = 1
data['adjustedBaseValue'] = data['baseValue'] * base_value
base_value.loc[na_base_value] = -1
base_value.loc[zero_base_value] = 0
return data

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Export data efficiently to CSV using python - python

Related

How to optimize the finding of divergences between 2 signals

Making Excel histograms using python

Python: A value is trying to be set on a copy of a slice from a DataFrame

The while loop cannot progress pass i = 1

How to do assignment in Pandas without warning?

Categories

Resources