for my espresso machine I am programming a GUI in Python with PyQt5.
I want to save different "profiles" which I can recall depending on the bean I am using.
There is a firmware (from the machine called "decent espresso machine") written in TCL which shows what I want (I will attach 2 images and a snippet of one of those description files, line 1 in this file contains the different steps).
I am not sure which parser I should use. I obviously always want to save different values for the same parameters (in c I would say the same struct), so somehow it feels wrong to use someting like configparser where you always have independent sections.
Can somebody give me a hint what lib I should use. There are always the same parameters, but the different recipes may contain different amount of steps.
I would like to have something like this pseudocode:
recipe=open('recipe1.txt')
currentstep=0
for steps in recipe:
pressure = step[currentstep].pressure
flow = step[currentstep].flow
...
brew()
currentstep = currentstep + 1
Steps
Overview
This is in the TCL file:
advanced_shot {{exit_if 1 flow 6.0 volume 100 transition fast exit_flow_under 0 temperature 90.0 name infuse pressure 1 sensor coffee pump flow exit_type pressure_over exit_flow_over 6 exit_pressure_over 3.0 exit_pressure_under 0 seconds 20.0} {exit_if 0 volume 100 transition fast exit_flow_under 0 temperature 90.0 name {rise and hold} pressure 9.0 sensor coffee pump pressure exit_flow_over 6 exit_pressure_over 11 seconds 10.0 exit_pressure_under 0} {exit_if 1 volume 100 transition smooth exit_flow_under 0 temperature 90.0 name decline pressure 4.0 sensor coffee pump pressure exit_type pressure_under exit_flow_over 1.2 exit_pressure_over 11 seconds 20.0 exit_pressure_under 4.0} {exit_if 1 flow 1.2 volume 100 transition smooth exit_flow_under 0 temperature 90.0 name {pressure limit} pressure 4.0 sensor coffee pump pressure exit_type flow_over exit_flow_over 1.0 exit_pressure_over 11 exit_pressure_under 0 seconds 10.0} {exit_if 0 flow 1.0 volume 100 transition smooth exit_flow_under 0 temperature 90.0 name {flow limit} pressure 3.0 sensor coffee pump flow exit_flow_over 6 exit_pressure_over 11 seconds 30.0 exit_pressure_under 0}}
author Decent
beverage_type espresso
espresso_decline_time 30
espresso_hold_time 15
espresso_pressure 6.0
espresso_temperature 90.0
final_desired_shot_volume 32
final_desired_shot_volume_advanced 0
final_desired_shot_weight 32
final_desired_shot_weight_advanced 36
flow_profile_decline 1.2
flow_profile_decline_time 17
flow_profile_hold 2
flow_profile_hold_time 8
flow_profile_minimum_pressure 4
flow_profile_preinfusion 4
flow_profile_preinfusion_time 5
preinfusion_flow_rate 4
preinfusion_guarantee 1
preinfusion_stop_pressure 4.0
preinfusion_time 20
pressure_end 4.0
profile_hide 0
profile_language en
profile_notes {An advanced spring lever profile by John Weiss that addresses a problem with simple spring lever profiles, by using both pressure and flow control. The last two steps keep pressure/flow under control as the puck erodes, if the shot has not finished by the end of step 3. Please consider this as a starting point for tweaking.}
profile_title {Advanced spring lever}
settings_profile_type settings_2c
tank_desired_water_temperature 0
water_temperature 80
Related
I have several csv files which have data of voltage over time and each csv files are approximately 7000 rows and the data looks like this:
Time(us) Voltage (V)
0 32.96554106
0.5 32.9149649
1 32.90484966
1.5 32.86438874
2 32.8542735
2.5 32.76323642
3 32.74300595
3.5 32.65196886
4 32.58116224
4.5 32.51035562
5 32.42943376
5.5 32.38897283
6 32.31816621
6.5 32.28782051
7 32.26759005
7.5 32.21701389
8 32.19678342
8.5 32.16643773
9 32.14620726
9.5 32.08551587
10 32.04505495
10.5 31.97424832
11 31.92367216
11.5 31.86298077
12 31.80228938
12.5 31.78205891
13 31.73148275
13.5 31.69102183
14 31.68090659
14.5 31.67079136
15 31.64044567
15.5 31.59998474
16 31.53929335
16.5 31.51906288
I read the csv file with pandas dataframe and after plotting a figure in matplotlib with data from one csv file, the figure looks like below.
I would like to split every single square waveform/bit and store the corresponding voltage values for each bit separately. So the resulting voltage values of each bit would be stored in a row and should look like this:
I don't have any idea how to do that. I guess I have to write a function where I have to assign a threshold value that, if the voltage values are going down for maybe 20 steps of time than capture all the values or if the voltage level is going up for 20 steps of time than capture all the voltage values. Could someone help?
If you get the gradient of your Voltage (here using diff as the time is regularly spaced), this gives you the following:
You can thus easily use a threshold (I tested with 2) to identify the peak starts. Then pivot your data:
# get threshold of gradient
m = df['Voltage (V)'].diff().gt(2)
# group start = value above threshold preceded by value below threshold
group = (m&~m.shift(fill_value=False)).cumsum().add(1)
df2 = (df
.assign(id=group,
t=lambda d: d['Time (us)'].groupby(group).apply(lambda s: s-s.iloc[0])
)
.pivot(index='id', columns='t', values='Voltage (V)')
)
output:
t 0.0 0.5 1.0 1.5 2.0 2.5 \
id
1 32.965541 32.914965 32.904850 32.864389 32.854273 32.763236
2 25.045314 27.543777 29.182444 30.588462 31.114454 31.984364
3 25.166697 27.746081 29.415095 30.719960 31.326873 32.125977
4 25.277965 27.877579 29.536477 30.912149 31.367334 32.206899
5 25.379117 27.978732 29.667975 30.780651 31.670791 32.338397
6 25.631998 27.634814 28.959909 30.173737 30.659268 31.053762
7 23.528030 26.137759 27.948386 29.253251 30.244544 30.649153
8 23.639297 26.380525 28.464263 29.971432 30.902034 31.458371
9 23.740449 26.542369 28.707028 30.295120 30.881803 31.862981
10 23.871948 26.673867 28.889103 30.305235 31.185260 31.873096
11 24.387824 26.694097 28.342880 29.678091 30.315350 31.134684
...
t 748.5 749.0
id
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 21.059913 21.161065
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 NaN NaN
11 NaN NaN
[11 rows x 1499 columns]
plot:
df2.T.plot()
I'm struggling to solve this problem. I'm creating a data frame that is generated by data that can vary from one day to another but I need to save the first version and block new updates.
This is the code:
# Create data frame for the ideal burndown line
df_ideal_burndown = pd.DataFrame(columns=['dates', 'ideal_trend'])
df_ideal_burndown['dates'] = range_sprint
#### Dates preparation
df_ideal_burndown['dates'] = pd.to_datetime(df_ideal_burndown['dates'], dayfirst=True)
df_ideal_burndown['dates'] = df_ideal_burndown['dates'].dt.strftime('%Y-%m-%d')
# Define the sprint lenght
days_sprint = int(len(range_sprint)) - int(cont_nonworking)
# Get how many items are in the current sprint
commited = len(df_current_sprint)
# Define the ideal number of items should be delivered by day
ideal_burn = round(commited/days_sprint,1)
# Create a list of remaining items to be delivered by day
burndown = [commited - ideal_burn]
# Day of the sprint -> starts with 2, since the first day is already in the list above
sprint_day = 2
# Iterate to create the ideal trend line in numbers
for i in range(1, len(df_ideal_burndown), 1):
burndown.append(round((commited - (ideal_burn * sprint_day)),1))
sprint_day += 1
# Add the ideal burndown to the column
df_ideal_burndown['ideal_trend'] = burndown
df_ideal_burndown
This is the output:
dates ideal_trend
0 2022-03-14 18.7
1 2022-03-15 17.4
2 2022-03-16 16.1
3 2022-03-17 14.8
4 2022-03-18 13.5
5 2022-03-21 12.2
6 2022-03-22 10.9
7 2022-03-23 9.6
8 2022-03-24 8.3
9 2022-03-25 7.0
10 2022-03-28 5.7
11 2022-03-29 4.4
12 2022-03-30 3.1
13 2022-03-31 1.8
14 2022-04-01 0.5
My main problem is related to commited = len(df_current_sprint), since the df_current_sprint is (and needs to be) used by other parts of my code.
Basically, even if the API returns new data that should be stored in the df_current_sprint I should use the version I'd just created.
I am pretty new to Python and I do not know if there is a way to store and, let's say, cache this information until I need to use fresh new data.
I appreciate your support, clues, and guidance.
Marcelo
I want to create a loop to automate finding MACD divergence with specific scenario/criterion, but I am finding it difficult to execute although its very easy to spot when looking at chart by eyes. Note: you can easily get this as ready available scanner but i want to improve my python knowledge, hope someone will be able to help me here with this mission.
My main issue is how to make it reference 40 rows up, and test forward - couldn't get my head around the logic itself.
The rules are as follow: lets say we have the table below
Date
Price
MACD Hist
04/08/2021
30
1
05/08/2021
29
0.7
06/08/2021
28
0.4
07/08/2021
27
0.1
08/08/2021
26
-0.15
09/08/2021
25
-0.70
10/08/2021
26
-0.1
11/08/2021
27
0.2
12/08/2021
28
0.4
13/08/2021
29
0.5
14/08/2021
30
0.55
15/08/2021
31
0.6
16/08/2021
30
0.55
17/08/2021
29
0.5
18/08/2021
28
0.4225
19/08/2021
27
0.4
20/08/2021
26
0.35
21/08/2021
25
0.3
22/08/2021
24
0.25
23/08/2021
23
0.2
24/08/2021
22
0.15
25/08/2021
21
0.1
26/08/2021
20
0.05
27/08/2021
19
0
28/08/2021
18
-0.05
29/08/2021
17
-0.1
30/08/2021
16
-0.25
i want the code to:
look back 40 days from today, within these 40 days get the lowest
point reached in MACDHist and Price corresponding to it(i.e. price 25$ on
09/08/2021 in this example and the MACDHist -0.7)
compare it with today's price & MACDHist and give divergence or not based on below 3 rules:
If today's price < the recorded price in point 1 (16$ < 25$ in this example) AND
Today's MACDHist > the recorded MACD in Absolute terms in point 1 (ABS(-0.7) > ABS(-0.20)) AND
During the same period we recorded those Price and MACDHist (between 09/08/2021 and today) the MACDHist was positive at least once.
I am sorry if my explanation isn't very clear, for that the below picture might help illustrate the scenario I am after:
A. The Lowes MACDHist in specfied period
B. Within the same period, MACDHist were positive at least once
C. Price is lower than in point A (Price C is lower than A) and MACDHist was higher than MACDHist in Point A (i.e. Lower in ABS terms)
In a similar case i have used backtrader. Its a feature-rich Python framework for backtesting and trading and you can also use it in order to generate lots of predefined indicators. In addition with this framework you are able to develop your own custom indicator as shown here. Its very easy to use and it supports lots of data formats like pandas data frames. Please take a look!
I found the answer in this great post. its not direct implementation but at least the logic is the same and by replacing RSI info with MACDHist you get to the same conclusion.
How to implement RSI Divergence in Python
I have a data bank of PDF files that I've downloaded through webscraping. I can extract the tables from these PDF files and visualise them in jupyter notebook like this:
import os
import camelot.io as camelot
n = 1
arr = os.listdir('D:\Test') # arr ist die Liste der PDF-Titel
for item in arr:
tables = camelot.read_pdf(item, pages='all', split_text=True)
print(f'''DATENBLATT {n}: {item}
''')
n += 1
for tabs in tables:
print(tabs.df, "\n==============================================================================\n")
in this way I get the results for two PDF files in the data bank as follows.
(PDf1, PDF2)
Now I would like to ask how I can get only the specific data from tables that contain for example "Voltage" and "Current" info. More specifically I would like to extract user-defined or targeted info and make charts with this values instead of printing them as whole.
Thanks in advance.
DATENBLATT 1: HY-Energy-Plus-Peak-Pack-HYP-00-2972-R2.pdf
0 1
0 Part Number HYP-00-2972
1 Voltage Nominal 51.8V
2 Voltage Range Min/Max 43.4V/58.1V
3 Charge Current 160A maximum \nDe-rated by BMS message over CA...
4 Discharge Current 300A maximum \nDe-rated by BMS message over CA...
5 Maximum Capacity 5.76kWh/111.4Ah
6 Maximum Energy Density 164Wh/kg
7 Useable capacity Limited to 90% by BMS to improve cell life
8 Dimensions W: 243 x L: 352 x H: 300.5mm
9 Weight 37kg
10 Mounting Fixtures 4x M8 mounting points for easy secure mounting
11
==============================================================================
0 \
0 Communication Protocol
1 Reported Information
2 Pack Protection Mechanism
3 Balancing Method
4 Multi-Pack Behaviour
5 Compatible Chargers as standard
6 Charger Control
7 Auxiliary Connectors
8 Power connectors
9
1
0 CAN bus at user selectable baud rate (propriet...
1 Cell Temperatures and Voltages, Pack Current, ...
2 Interlock to control external protection devic...
3 Actively controlled dissipative balancing
4 BMS implements a single master and multi-slave...
5 Zivan, Victron, Delta-Q, TC-Charger, SPE. For ...
6 Direct current control based on cell voltage/t...
7 Binder 720-Series 8-way male & female
8 4x Amphenol SurLok Plus 8mm \nWhen using batte...
9
==============================================================================
0 \
0 Max no of packs in series
1 Max Number of Parallel Packs
2 External System Requirements
3
1
0 10
1 127
2 External Protection Device (e.g. Contactor) co...
3
==============================================================================
DATENBLATT 2: HY-Energy-Standard-Pack-HYP-00-2889-R2.pdf
0 1
0 Part Number HYP-00-2889
1 Voltage Nominal 44.4V
2 Voltage Range Min/Max 37.2V/49.8V
3 Charge Current 132A maximum \nDe-rated by BMS message over CA...
4 Discharge Current 132A maximum \nDe-rated by BMS message over CA...
5 Maximum Capacity 4.94kWh/111Ah
6 Maximum Energy Density 152Wh/kg
7 Useable capacity Limited to 90% by BMS to improve cell life
8 Dimensions W: 243 x L: 352 x H: 265mm
9 Weight 32kg
10 Mounting Fixtures 4x M8 mounting points for easy secure mounting
11
==============================================================================
0 \
0 Communication Protocol
1 Reported Information
2 Pack Protection Mechanism
3 Balancing Method
4 Multi-Pack Behaviour
5 Compatible Chargers as standard
6 Charger Control
7 Auxiliary Connectors
8 Power connectors
9
1
0 CAN bus at user selectable baud rate (propriet...
1 Cell Temperatures and Voltages, Pack Current, ...
2 Interlock to control external protection devic...
3 Actively controlled dissipative balancing
4 BMS implements a single master and multi-slave...
5 Zivan, Delta-Q, TC-Charger, SPE, Victron, Bass...
6 Direct current control based on cell voltage/t...
7 Binder 720-Series 8-way male & female
8 4x Amphenol SurLok Plus 8mm \nWhen using batte...
9
==============================================================================
0 \
0 Max no of packs in series
1 Max Number of Parallel Packs
2 External System Requirements
3
1
0 12
1 127
2 External Protection Device (e.g. Contactor) co...
3
==============================================================================
You can define a list of the strings of interest;
then select only the tables which contain at least one of these strings.
import os
import camelot.io as camelot
n = 1
# define your strings of interest
interesting_strings=["voltage", "current"]
arr = os.listdir('D:\Test') # arr ist die Liste der PDF-Titel
for item in arr:
tables = camelot.read_pdf(item, pages='all', split_text=True)
print(f'''DATENBLATT {n}: {item}
''')
n += 1
for tabs in tables:
# select only tables which contain at least one of the interesting strings
if any(s in tabs.df.to_string().lower() for s in interesting_strings) :
print(tabs.df, "\n==============================================================================\n")
If you want to search for interesting strings only in specific places (for example, in the first column), you can use Pandas dataframes properties, such as iloc:
any(s in tabs.df.iloc[0].to_string().lower() for s in interesting_strings)
I am clumsy but adequate with python. I have referenced stack often, but this is my first question. I have built a decaying average function to act on a pandas data frame with about 10000 rows, but it takes 40 minutes to run. I would appreciate any thoughts on how to speed it up. Here is a sample of actual data, simplified a bit.
sub = pd.DataFrame({
'user_id':[101,101,101,101,101,102,101],
'class_section':['Modern Biology - B','Spanish Novice 1 - D', 'Modern Biology - B','Spanish Novice 1 - D','Spanish Novice 1 - D','Modern Biology - B','Spanish Novice 1 - D'],
'sub_skill':['A','A','B','B','B','B','B'],
'rating' :[2.0,3.0,3.0,2.0,3.0,2.0,2.0],
'date' :['2019-10-16','2019-09-04','2019-09-04', '2019-09-04','2019-09-13','2019-10-16','2019-09-05']})
For this data frame:
sub
Out[716]:
user_id class_section sub_skill rating date
0 101 Modern Biology - B A 2.0 2019-10-16
1 101 Spanish Novice 1 - D A 3.0 2019-09-04
2 101 Modern Biology - B B 3.0 2019-09-04
3 101 Spanish Novice 1 - D B 2.0 2019-09-04
4 101 Spanish Novice 1 - D B 3.0 2019-09-13
5 102 Modern Biology - B B 2.0 2019-10-16
6 101 Spanish Novice 1 - D B 2.0 2019-09-05
A decaying average weights the most recent event that meets conditions at full weight and weights each previous event with a multiplier less than one. In this case, the multiplier is 0.667. previously weighted events are weighted again.
So the decaying average for user 101's rating in Spanish sub_skill B is:
(2.0*0.667^2 + 2.0*0.667^1 + 3.0*0.667^0)/((0.667^2 + 0.667^1 + 0.667^0) = 2.4735
Here is what I tried, after reading a helpful post on weighted averages
sub['date'] = pd.to_datetime(sub.date_due)
def func(date, user_id, class_section, sub_skill):
return sub.apply(lambda row: row['date'] > date
and row['user_id']==user_id
and row['class_section']== class_section
and row['sub_skill']==sub_skill,axis=1).sum()
# for some reason this next line of code took about 40 minutes to run on 9000 rows:
sub['decay_count']=sub.apply(lambda row: func(row['date'],row['user_id'], row['class_section'], row['sub_skill']), axis=1)
# calculate decay factor:
sub['decay_weight']=sub.apply(lambda row: 0.667**row['decay_count'], axis=1)
# calcuate decay average contributors (still needs to be summed):
g = sub.groupby(['user_id','class_section','sub_skill'])
sub['decay_avg'] = sub.decay_weight / g.decay_weight.transform("sum") * sub.rating
# new dataframe with indicator/course summaries as decaying average (note the sum):
indicator_summary = g.decay_avg.sum().to_frame(name = 'DAvg').reset_index()
I frequently work in pandas and I am used to iterating through large datasets. I would have expected this to take rows-squared time, but it is taking much longer. A more elegant solution or some advice to speed it up would be really appreciated!
Some background on this project: I am trying to automate the conversion from proficiency-based grading into a classic course grade for my school. I have the process of data extraction from our Learning Management System into a spreadsheet that does the decaying average and then posts the information to teachers, but I would like to automate the whole process and extract myself from it. The LMS is slow to implement a proficiency-based system and is reluctant to provide a conversion - for good reason. However, we have to communicate both student proficiencies and our conversion to a traditional grade to parents and colleges since that is a language they speak.
Why not use groupby? The idea here is that you rank the dates within the group in descending order and subtract 1 (because rank starts with 1). That seems to mirror your logic in func above, without having to try to call apply with a nested apply.
sub['decay_count'] = sub.groupby(['user_id', 'class_section', 'sub_skill'])['date'].rank(method='first', ascending=False) - 1
sub['decay_weight'] = sub['decay_count'].apply(lambda x: 0.667 ** x)
Output:
sub.sort_values(['user_id', 'class_section', 'sub_skill', 'decay_count'])
user_id class_section sub_skill rating date decay_count decay_weight
0 101 Modern Biology - B A 2.0 2019-10-16 0.0 1.000000
2 101 Modern Biology - B B 3.0 2019-09-04 0.0 1.000000
1 101 Spanish Novice 1 - D A 3.0 2019-09-04 0.0 1.000000
3 101 Spanish Novice 1 - D B 2.0 2019-09-04 0.0 1.000000
6 101 Spanish Novice 1 - D B 2.0 2019-09-05 1.0 0.667000
4 101 Spanish Novice 1 - D B 3.0 2019-09-13 2.0 0.444889
5 102 Modern Biology - B B 2.0 2019-10-16 0.0 1.000000