Looping over three days - python

In my table in the postgres I have the users and the time they've made a picture. I would like to define different tracks of the users. To start with I say that within 3 days for the same user is the same track, and then starts the next one. I would like to add a column to my table, where I will save the unique number of the track. I have figured out the how to define the start and end day of every user. But I am not sure how to divide that now to every three days.
cur.execute("""SELECT users From table;""")
dbrows = cur.fetchall()
for k in set(dbrows):
cur.execute("""SELECT time From table where users ="""+k+""";""")
dbrows_time = cur.fetchall()
time_period = []
for i in dbrows_time:
new_date_time = datetime.fromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S')
new_date = new_date_time[0:10]
time_period.append(new_date)
period = len(set(time_period))
beginning = time_period[0]
end = time_period[-1]
beginning = datetime.strptime(beginning, "%Y-%m-%d")
end = datetime.strptime(end, "%Y-%m-%d")
delta=timedelta(days=3)
Here should come some loop "for every three days create a new unique number". Or may be it should be done in SQL when I insert in the table?
INSERT INTO table_tracks the number of each track
conn.commit()
I can insert some number every three entries like this, but I am not sure how to do the three days thing.
cur.execute("INSERT INTO table_track (id, users, link, tags, time, track, geom) select id, users, link, tags, time, ((row_number() over (ORDER BY users DESC) - 1) / 3) + 1, geom from table;""")
The data is basically in the form:
user time geometry
1 1.03.2015 geometry
1 2.03.2015 geometry
......
2 01.03.2015 geometry
......
Every user has a unique id and the time I convert to the datetime format. Desired results are:
user time geometry track_number
1 1.03.2015 geometry 1
1 2.03.2015 geometry 1
......
2 01.03.2015 geometry 15
......
The track_number is unique per user and per three days. Let's say the user makes a picture on 1st, 2nd and 3rd of March and then on 4th, 5th and 6th. These would be to different tracks (which is may be not ideal in the real word). But it is especially interesting if the pictures are on 1st, 2nd of March and 4th, 5th, 8th. Then I would say these should be three tracks: 1st and 2nd, 4th and 5th and 8th. So I mean a timedelta by three days. If there is only one picture within three days then it is only one track. Something like that. Thank you!!
Any ideas or directions are highly appreciated!

Related

Django query with TruncHour for multiple product with each group latest sum. How to do that?

The scenario is: The product database price update every hour. People can add product to wishlist. Wishlist page has some graph. That shows last 24 hours price changes. I want to show every hour price change of all his wishlist product for last 24 hours.
I am trying but I can't complete the query. What I am trying is given below:
now = datetime.datetime.now()
last_day = now - datetime.timedelta(hours=23)
x = (
FloorPrice.objects.filter(
product__in=vault_products_ids, creation_time__gte=last_day
)
.annotate(hour=TruncHour("creation_time"))
.values("hour", "floor_price")
)
Suppose I add in wishlist 4 product.
Now I want each hour 4 products sum with that hour latest entry(each hour each product has multiple floor price, so we need to pick the last entry of each product on that hour).

Iterating pandas dataframe and changing values

I'm looking to predict the number of customers in a restaurant at a certain time. My data preprocessing is almost finished - I have acquired the arrival dates of each customer. Those are presented in acchour. weekday means the day of the week, 0 being Monday and 6 Sunday. Now, I'd like to calculate the number of customers at that certain time in the restaurant. I figured I have to loop through the dataframe in reverse and keep adding the arrived customers to the customer count at a certain time, while simultaneously keeping track of when previous customers are leaving. As there is no data on this, we will simply assume every guest stays for an hour.
My sketch looks like this:
exp = [] #Keep track of the expiring customers
for row in reversed(df['customers']): #start from the earliest time
if row != 1: #skip the 1st row
res = len(exp) + 1 #amount of customers
for i in range(len(exp) - 1, -1, -1): #loop exp sensibly while deleting
if df['acchour'] > exp[i]+1: #if the current time is more than an hour more than the customer's arrival time
res -= 1
del exp[i]
exp.append(df['acchour'])
row = res
However, I can see that df['acchour'] is not a sensible expression and was wondering how to reference the different column on the same row properly. Altogether, if someone can come up with a more convenient way to solve the problem, I'd hugely appreciate it!
So you can get the total customers visiting at a specific time like so:
df.groupby(['weekday','time', 'hour'], as_index=False).sum()
Then maybe you can calculate the difference between each time window you want?

How to create an ID starting from 1, which after sorting starts over each time another (string, time) variable increases by 1 (minute)?

As you can see here Variables, I have four variables. Each ID represents a section of a road, which has a speed recorded at every timestamp. However, this section has 16 subsections with each their own speed. These 16 speeds have been created as one column with 16 rows instead of 16 columns, but the IDs just go from 1-16 over and over making them non-unique.
What I need is to create a unique ID starting from one, that means that for each time stamp I have (# of IDs * 16 subsections) IDs.
In other words, if it is sorted by time stamp, then ID then subsection, I need it to create an ID from 1 that starts over from 1 every time it increases by one minute.
I hope some of you can help me with this. It would be greatly appreciated.
Got it, here's an example - you could alter the new_id line if you want a different format.
def make_id(row):
new_id = (row['ID']-1)*16 + row['Segment']
return new_id
df['UniqID'] = df.apply(make_id, axis = 1)
The output would be 1 for Section 1, Subsection 1, 16 for Section 1, Subsection 16.

Python sqlite3 - Truly at a loss at how I should structure my database tables

I'm making an application that allows a user to input diet and exercise information, track it over time and output graphs and charts about the data they put in. Right now I'm trying to come up with a structure for just one part(the exercise part).
The first column would be date ( the date of the entries). Then every column after that would be an exercise then how much weight for that exercise, the number of reps they did, and for how many sets. Here's an example of one day that a person might enter.
'date' = '1/13/2014'
'entries = {'Bent Over Row(Barbell)': [['1', '6', '135'], ['1', '5', '155']], 'Deadlift': [['1', '4', '315'], ['1', '2', '315']]}
The exercises would of course change from day to day, and a person wouldn't do all possible exercises in one day.
My Question:
Should I have a column for each exercise in addition to columns for the reps,weight,and sets for that exercise? (4 in total for each exercise).
Right now I have 20 possible exercises(I intend to add more later) which would be a total of 101 columns in my table including the date column. This seems like it would problematic to work with...am I wrong? Or are tables typically this big? (will probably double soon, to 200+ columns)
How should I deal with exercises in the table the person didn't do that day? Just put 'NULL' or 'N/A'?
How do I deal with a person doing the same exercise in one day? I'm at a complete loss at this one.
Let's say that a person does two squats at different weights and reps, I feel like I need to enter in a list of lists to the table. Any better way to do this?
Thanks in advance for any reply.
You should try to make your table structures as flat as possible, i.e. definitely not a column for each exercise/set/rep. I would suggest two tables:
Exercises (just ID and Name); and
Events (ID, Date, ExerciseID, Sets, Reps, Weight).
You don't need to have entries in Events for every exercise, only the ones the person does, and you can easily group together entries in the table by e.g. ExerciseID or Date for reporting. If they do the same exercise multiple times in one day that's fine, have multiple entries in Events.
Exercises
---
ID Name
1 Dead lift
2 Bent over row (barbell)
...
Events
---
ID Date ExerciseID Weight Sets Reps
1 13/1/2014 2 135 1 6
2 13/1/2014 2 155 1 5
3 13/1/2014 1 315 1 4
...

SQLAlchemy: How to group by two fields and filter by date

So I have a table with a datestamp and two fields that I want to make sure that they are unique in the last month.
table.id
table.datestamp
table.field1
table.field2
There should be no duplicate record with the same field1 + 2 compound value in the last month.
The steps in my head are:
Group by the two fields
Look back over the last month's data to make sure this unique grouping doesn't occur.
I've got this far, but I don't think this works:
result = session.query(table).group_by(\
table.field1,
table.field2,
func.month(table.timestamp))
But I'm unsure how to do this in sqlalchemy. Could someone advise me?
Thanks very much!
Following should point you in the right direction, also see inline comments:
qry = (session.query(
table.c.field1,
table.c.field2,
# #strftime* for year-month works on sqlite;
# #todo: find proper function for mysql (as in the question)
# Also it is not clear if only MONTH part is enough, so that
# May-2001 and May-2009 can be joined, or YEAR-MONTH must be used
func.strftime('%Y-%m', table.c.datestamp),
func.count(),
)
# optionally check only last 2 month data (could have partial months)
.filter(table.c.datestamp < datetime.date.today() - datetime.timedelta(60))
.group_by(
table.c.field1,
table.c.field2,
func.strftime('%Y-%m', table.c.datestamp),
)
# comment this line out to see all the groups
.having(func.count()>1)
)

Categories

Resources