I am parsing through two separate csv files with the goal of finding matching customerID's and dates to manipulate balance.
In my for loop, at some point there should be a match as I intentionally put duplicate ID's and dates in my csv. However, when parsing and attempting to match data, the matches aren't working properly even though the values are the same.
main.py:
transactions = pd.read_csv(INPUT_PATH, delimiter=',')
accounts = pd.DataFrame(
columns=['customerID', 'MM/YYYY', 'minBalance', 'maxBalance', 'endingBalance'])
for index, row in transactions.iterrows():
customer_id = row['customerID']
date = formatter.convert_date(row['date'])
minBalance = 0
maxBalance = 0
endingBalance = 0
dict = {
"customerID": customer_id,
"MM/YYYY": date,
"minBalance": minBalance,
"maxBalance": maxBalance,
"endingBalance": endingBalance
}
print(customer_id in accounts['customerID'] and date in accounts['MM/YYYY'])
# Returns False
if (accounts['customerID'].equals(customer_id)) and (accounts['MM/YYYY'].equals(date)):
# This section never runs
print("hello")
else:
print("world")
accounts.loc[index] = dict
accounts.to_csv(OUTPUT_PATH, index=False)
Transactions CSV:
customerID,date,amount
1,12/21/2022,500
1,12/21/2022,-300
1,12/22/2022,100
1,01/01/2023,250
1,01/01/2022,300
1,01/01/2022,-500
2,12/21/2022,-200
2,12/21/2022,700
2,12/22/2022,200
2,01/01/2023,300
2,01/01/2023,400
2,01/01/2023,-700
Accounts CSV
customerID,MM/YYYY,minBalance,maxBalance,endingBalance
1,12/2022,0,0,0
1,12/2022,0,0,0
1,12/2022,0,0,0
1,01/2023,0,0,0
1,01/2022,0,0,0
1,01/2022,0,0,0
2,12/2022,0,0,0
2,12/2022,0,0,0
2,12/2022,0,0,0
2,01/2023,0,0,0
2,01/2023,0,0,0
2,01/2023,0,0,0
Expected Accounts CSV
customerID,MM/YYYY,minBalance,maxBalance,endingBalance
1,12/2022,0,0,0
1,01/2023,0,0,0
1,01/2022,0,0,0
2,12/2022,0,0,0
2,01/2023,0,0,0
Where does the problem come from
Your Problem comes from the comparison you're doing with pandas Series, to make it simple, when you do :
customer_id in accounts['customerID']
You're checking if customer_id is an index of the Series accounts['customerID'], however, you want to check the value of the Series.
And in your if statement, you're using the pd.Series.equals method. Here is an explanation of what does the method do from the documentation
This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.
So equals is used to compare between DataFrames and Series, which is different from what you're trying to do.
One of many solutions
There are multiple ways to achieve what you're trying to do, the easiest is simply to get the values from the series before doing the comparison :
customer_id in accounts['customerID'].values
Note that accounts['customerID'].values returns a NumPy array of the values of your Series.
So your comparison should be something like this :
print(customer_id in accounts['customerID'].values and date in accounts['MM/YYYY'].values)
And use the same thing in your if statement :
if (customer_id in accounts['customerID'].values and date in accounts['MM/YYYY'].values):
Alternative solutions
You can also use the pandas.Series.isin function that given an element as input return a boolean Series showing whether each element in the Series matches the given input, then you will just need to check if the boolean Series contain one True value.
Documentation of isin : https://pandas.pydata.org/docs/reference/api/pandas.Series.isin.html
It is not clear from the information what does formatter.convert_date function does. but from the example CSVs you added it seems like it should do something like:
def convert_date(mmddyy):
(mm,dd,yy) = mmddyy.split('/')
return mm + '/' + yy
in addition, make sure that data types are also equal
(both date fields are strings and also for customer id)
I have been trying to understand a piece of code that includes Try and Except that filters data based on specific/required dates:
required_date = '2021-02-11'
index_for_date = (data_dict['date'] == required_date)
data_filtered_by_date = {}
for key in data_dict.keys():
try:
data_filtered_by_date[key] = np.float_(data_dict[key][index_for_date])
except:
data_filtered_by_date[key] = data_dict[key][index_for_date]
I do not understand why the Try and Except would be used and how the whole code would function. I have researched specifics such as np.float and why we use two crotchets (e.g. [key][index_for_date], why are they together?) next to each other. Hopefully I can get further clarification on this code as I am very new to Python and have done various forms of research in order to find some sort of answer
Let's start with an explanation of your code. data_dict is a dictionary of data columns, one of which is the 'date' column on which you want to filter. index_for_date = (data_dict['date'] == required_date) constructs a Boolean index for the columns to find the specific data (an array which is all false except where it matches the desired date).
You loop over the columns which are in data_dict[key]. Then for each column, you select the ones matching the date using the Boolean index: data_dict[key][index_for_date] that is why you have two sets of square brackets, this first is dict indexing, the second is Boolean array indexing.
Then in the try clause, you try to cast the values to floats using np.float_. If this fails (it throws an exception), you fall back to the 'raw', non casted values.
I want to make a request in Django where I group by day but I want to fill the day where there is no result with 0, is it possible?
# I use the following query
AccessLog
.objects
.filter(attempt_time__gte=last_30_days)
.annotate(day=TruncDay('attempt_time'))
.values('day', 'username')
.annotate(c = Count('username'))
.order_by('day')
No, it is not possible with annotations. Annotations work with the similar types, for example, Coalesce function requires similar types and mixing datetime and numbers will result in a database error. The same for the Case function there is only one output field per result.
The function TruncDay returns a DateTime (in this case) with fields up to Day set to their minimum value, so for instance 2015-06-15 14:30:50.000321+00:00 will be converted to 2015-01-01 00:00:00+00:00 how documentation outlines. And actually annotated value cannot be sometimes integer and sometimes datetime object.
Occasionally to denote that the values are "None" in such situations preferable way would be to set it to the minimal/maximum value (we assume that the value cannot be equal to it), for instance:
AccessLog.objects.filter(
attempt_time__gte=last_30_days
).annotate(
day=Coalesce(TruncDay('attempt_time'), datetime.min)
).values('day', 'username').annotate(
c=Count('username')
).order_by('day')
I'm building a booking form, and want to allow users to pick a date of booking from available dates in the next 60 days.
I get the next 60 days by:
base = datetime.date.today()
date_list = [base + datetime.timedelta(days=x) for x in range(60)]
Then I subtract already booked dates which are stored in the db:
bookings = list(Booking.objects.all())
primarykeys = []
unav = []
for b in bookings:
primarykeys.append(b.pk)
for p in primarykeys:
unav.append(Booking.objects.get(pk=p).booking_date)
for date in unav:
if date in date_list:
date_list.remove(date)
Then I change the result into tuple for the forms(not sure if this is right?):`
date_list = tuple(date_list)
Then I pass it into the form field as such:
booking_date = forms.ChoiceField(choices=date_list, required=True)
This gives me an error of cannot unpack non-iterable datetime.date object
And now am I stumped...how can I do this? I have a feeling i'm on the complete wrong path.
Thanks in advance
The docs for Django Form fields says the following:
choices
Either an iterable of 2-tuples to use as choices for this field, or a callable that returns such an iterable. This argument accepts the
same formats as the choices argument to a model field. See the model
field reference documentation on choices for more details. If the
argument is a callable, it is evaluated each time the field’s form is
initialized. Defaults to an empty list.
It looks like what you're passing is a tuple in this format:
(date object, date object, ...)
But you need to be passing something like a list of 2-tuples, with the first element of each tuple being the value stored for each choice, and the second element being the value displayed to the user in the form:
[(date_object, date_string), (date_object, date_string), ...)
Change your code to the following and see if that works for you:
base = datetime.date.today()
date_set = set([base + datetime.timedelta(days=x) for x in range(60)])
booking_dates = set(Booking.objects.all().values_list('booking_date', flat=True))
valid_dates = date_set - booking_dates
date_choices = sorted([(valid_date, valid_date.strftime('%Y-%m-%d')) for valid_date in valid_dates],
key=lambda x: x[0])
I've used sets to make it simpler to ensure unique values and subtract the two from each other without multiple for loops. You can use values_list with flat=True to get all the existing booking dates, then create a list of 2-tuples date_choices, with the actual datetime object as the value and display a string representation in whatever format you choose using strftime.
Then the dates are sorted using sorted by date ascending based on the first key, since using sets will mess up the sort order.
Then take a look at this question to see how you can pass these choices into the form from your view, as I don't think it's good to try to dynamically set the choices when defining the Form class itself.
I have model like this:
class Order(models.Model):
dateTime = models.DateTimeField()
and I want to get object with specific hour
how can I do that?
the code below doesn't work:
o=Order.objects.get(dateTime.hour=12)
and has this problem: keyword can't be an expression
now.. How should I give the order object with specific time?
The following will give you all the objects having hour value as 12.
o = Order.objects.filter(dateTime__hour=12)
which can be used in place of
o = Order.objects.get(dateTime__hour=12)`
to get that one object, in case you have unique hour values for objects.
But if already know that you have unique value of hour then you should use
the later.
https://docs.djangoproject.com/en/1.9/ref/models/querysets/#hour
o = Order.objects.filter(dateTime__hour=12)