I have the following Data
--------------------------------------------------
| code | name | qty |
--------------------------------------------------
| FZH | apple | 3 |
--------------------------------------------------
| ZH2 | orange| 7 |
--------------------------------------------------
| H26 | mt dew | 5 |
--------------------------------------------------
| 6YS | pear | 7 |
--------------------------------------------------
| LKZ | coke | 4 |
--------------------------------------------------
Using pandas I want to tell it to sum the apple, orange, and pear and write
| 2DC | FRUIT | 17 |
The actual list of fruit is a lot longer. "Bananas" is sometimes on the list for the day and I want it to sum that too but skip it if it does not find it.
I want to do a similar thing with all of the sodas, using my list of possible sodas needed for the day
It looks like you're trying to filter, then sum. You can use loc, isin, and sum:
fruits = ["apple", "orange", "pear"]
df.loc[df["name"].isin(fruits)].qty.sum()
Let me begin by noting that this question is very close to this question about getting non-zero values for each column in a pandas dataframe, but in addition to getting the values, I would also like to know the row from which it was drawn. (And, ultimately, I would like to be able to re-use the code to find columns in which a non-zero value occurred some x amount of times.)
What I have is a dataframe with a count of words for a given year of documents:
|Year / Term | word1 | word2 | word3 | ... | wordn |
|------------|-------|-------|-------|-----|-------|
| 2001 | 23 | 0 | 0 | | 0 |
| 2002 | 0 | 0 | 12 | | 0 |
| 2003 | 0 | 42 | 34 | | 0 |
| year(n) | 0 | 0 | 0 | | 45 |
So for word1 I would like to get both 23 and 2001 -- this could be as a tuple or as a dictionary. (It doesn't matter so long as I can work through the data.) And ultimately, I would like very much to be able to discover that word3 enjoyed a two-year span of usage.
FTR, the dataframe has only 16 rows but it has a lot, a lot of columns. If there's an answer to this questions already available, revealing the weakness of my search fu, I will take the scorn as my just due.
In you case melt then groupby
df.melt('Year / Term').loc[lambda x : x['value']!=0].groupby('variable')['value'].apply(tupl)
I have one problem with one of my projects at school. I am attempting to change the order of my data.
You are able to appreciate how the data is arranged
this picture contains a sample of the data I am referring to
This is the format I am attempting to reach:
Company name | activity description | year | variable 1 | variable 2 |......
company 1 | | 2011 | | |
company 1 | | 2012 | | |
..... (one row for every year ( from 2014 to 2015 inclusive))
company 2 | | 2011 | | |
company 2 | | 2012 | | |
..... (one row for every year ( from 2014 to 2015 inclusive))
for ever single one of the 10 companies. this is a sample of my whole data-set, which contains more than 15000 companies. I attempted creating a dataframe of the size I want but I have problems filling it with the data I want and in the format I want. I am fairly new to python. Could anyone help me, please?
I have some question on Django and how to use it to solve the problem below.
Suppose you have this two table
Products table
---------------------------------------------------------------
| id | productName | description | id_country |
---------------------------------------------------------------
| 1 | x | fzefzzezfz | 1 |
---------------------------------------------------------------
| 2 | y | zoinojnfze | 1 |
---------------------------------------------------------------
| 3 | az | ononbonoj | 2 |
---------------------------------------------------------------
country table
-----------------------
| id | name |
-----------------------
| 1 | france |
-----------------------
| 2 | spain |
-----------------------
and these urls:
http://www.exemple.com/list/ (list all products)
http://www.exemple.com/add/ (add a new product)
http://www.exemple.com/detail/1 (print details about product with id=1)
What I want to do is to allow visitors of website to set a filter for the duration of their navigation, so for every display of the product list
only product from France or Spain will be displayed depending on the filter.
I could use french.exemple.com or spain.exemple.com to filter result but i dont want to ducplicate the code for every
subdomain.
How would handle this problem?
You've said it yourself in the question tags: use the session.
When the user chooses a country, set that value in the request.session dict; then, in each of your views, filter the products by that value.
This is a complicated one, but I suspect there's some principle I can apply to make it simple - I just don't know what it is.
I need to parcel out presentation slots to a class full of students for the semester. There are multiple possible dates, and multiple presentation types. I conducted a survey where students could rank their interest in the different topics. What I'd like to do is get the best (or at least a good) distribution of presentation slots to students.
So, what I have:
List of 12 dates
List of 18 students
CSV file where each student (row) has a rating 1-5 for each date
What I'd like to get:
Each student should have one of presentation type A (intro), one of presentation type B (figures) and 3 of presentation type C (aims)
Each date should have at least 1 of each type of presentation
Each date should have no more than 2 of type A or type B
Try to give students presentations that they rated highly (4 or 5)
I should note that I realize this looks like a homework problem, but it's real life :-). I was thinking that I might make a Student class for each student that contains the dates for each presentation type, but I wasn't sure what the best way to populate it would be. Actually, I'm not even sure where to start.
TL;DR: I think you're giving your students too much choice :D
But I had a shot at this problem anyway. Pretty fun exercise actually, although some of the constraints were a little vague. Most of all, I had to guess what the actual students' preference distribution would look like. I went with uniformly distributed, independent variables, although that's probably not very realistic. Still I think it should work just as well on real data as it does on my randomly generated data.
I considered brute forcing it, but a rough analysis gave me an estimate of over 10^65 possible configurations. That's kind of a lot. And since we don't have a trillion trillion years to consider all of them, we'll need a heuristic approach.
Because of the size of the problem, I tried to avoid doing any backtracking. But this meant that you could get stuck; there might not be a solution where everyone only gets dates they gave 4's and 5's.
I ended up implementing a double-edged Iterative Deepening-like search, where both the best case we're still holding out hope for (i.e., assign students to a date they gave a 5) and the worst case scenario we're willing to accept (some student might have to live with a 3) are gradually lowered until a solution is found. If we get stuck, reset, lower expectations, and try again. Tasks A and B are assigned first, and C is done only after A and B are complete, because the constraints on C are far less stringent.
I also used a weighting factor to model the trade off between maximizing students happiness with satisfying the types-of-presentations-per-day limits.
Currently it seems to find a solution for pretty much every random generated set of preferences. I included an evaluation metric; the ratio between the sum of the preference values of all assigned student/date combos, and the sum of all student ideal/top 3 preference values. For example, if student X had two fives, one four and the rest threes on his list, and is assigned to one of his fives and two threes, he gets 5+3+3=11 but could ideally have gotten 5+5+4=14; he is 11/14 = 78.6% satisfied.
After some testing, it seems that my implementation tends to produce an average student satisfaction of around 95%, at lot better than I expected :) But again, that is with fake data. Real preferences are probably more clumped, and harder to satisfy.
Below is the core of the algorihtm. The full script is ~250 lines and a bit too long for here I think. Check it out at Github.
...
# Assign a date for a given task to each student,
# preferring a date that they like and is still free.
def fill(task, lowest_acceptable, spread_weight=0.1, tasks_to_spread="ABC"):
random_order = range(nStudents) # randomize student order, so everyone
random.shuffle(random_order) # has an equal chance to get their first pick
for i in random_order:
student = students[i]
if student.dates[task]: # student is already assigned for this task?
continue
# get available dates ordered by preference and how fully booked they are
preferred = get_favorite_day(student, lowest_acceptable,
spread_weight, tasks_to_spread)
for date_nr in preferred:
date = dates[date_nr]
if date.is_available(task, student.count, lowest_acceptable == 1):
date.set_student(task, student.count)
student.dates[task] = date
break
# attempt to "fill()" the schedule while gradually lowering expectations
start_at = 5
while start_at > 1:
lowest_acceptable = start_at
while lowest_acceptable > 0:
fill("A", lowest_acceptable, spread_weight, "AAB")
fill("B", lowest_acceptable, spread_weight, "ABB")
if lowest_acceptable == 1:
fill("C", lowest_acceptable, spread_weight_C, "C")
lowest_acceptable -= 1
And here is an example result as printed by the script:
Date
================================================================================
Student | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
================================================================================
1 | | A | B | | C | | | | | | | |
2 | | | | | A | | | | | B | C | |
3 | | | | | B | | | C | | A | | |
4 | | | | A | | C | | | | | | B |
5 | | | C | | | | A | B | | | | |
6 | | C | | | | | | | A | B | | |
7 | | | C | | | | | B | | | | A |
8 | | | A | | C | | B | | | | | |
9 | C | | | | | | | | A | | | B |
10 | A | B | | | | | | | C | | | |
11 | B | | | A | | C | | | | | | |
12 | | | | | | A | C | | | | B | |
13 | A | | | B | | | | | | | | C |
14 | | | | | B | | | | C | | A | |
15 | | | A | C | | B | | | | | | |
16 | | | | | | A | | | | C | B | |
17 | | A | | C | | | B | | | | | |
18 | | | | | | | C | A | B | | | |
================================================================================
Total student satisfaction: 250/261 = 95.00%