Best practice to store the order of table rows? - python

I have table which has columns like these
class AgeAndName(models.Model):
name = m.CharField(max_length=20)
age = m.IntegerField
name age
---- --
John 22
Hitch 38
Heiku 25
Taro 36
Cho 40
Now I want to allow the user to sort as he like, and keep.
then I think of two ways.
1.make new column and keep order's
class AgeAndName(models.Model):
name = m.CharField(max_length=20)
age = m.IntegerField
order = m.IntegerField
name age order
---- -- -----
John 22 1
Hitch 38 5
Heiku 25 3
Taro 36 4
Cho 40 2
2.make one property for model and keep them.
class AgeAndName(models.Model):
#classmember??? ( I am not sure I have this kind of thing)
order = (0,4,2,3,1)
name = m.CharField(max_length=20)
age = m.IntegerField
Which one is the best practice for django??
Or is there any other good way ?

Related

How to save pandas dataframe rows as seperate files with the first row fixed for all?

I have a DataFrame with multiple columns and rows. The rows are student names with marks and the columns are marking criteria. I want to save the first row (column names) along with each row in seperate files with the name of the student as the name file.
Example of my data:
Marking_Rubric
Requirements and Delivery\nWeight 45.00%
Coding Standards\nWeight 10.00%
Documentation\nWeight 25.00%
Runtime - Effectiveness\nWeight 10.00%
Efficiency\nWeight 10.00%
Total
Comments
John Doe
54
50
90
45
50
31
Limited documentation
Jane Doe
23
12
87
10
34
98
No comments
Desired output:
Marking_Rubric
Requirements and Delivery
Coding Standards
Documentation
Runtime - Effectiveness
Efficiency
Total
Comments
John Doe
54
50
90
45
50
31
Limited documentation
Marking_Rubric
Requirements and Delivery
Coding Standards
Documentation
Runtime - Effectiveness
Efficiency
Total
Comments
Jane Doe
23
12
87
10
34
98
No comments
Just note that you have to have a unique name to save a file. Otherwise files with the same name will overwrite each other.
# `````````````````````````````````````````````````````````````````````````
### create dummy data
column1_list = ['John Doe','John Doe','Not John Doe','special ß ß %&^ character name', 'no special character name again']
column2_list = [53,23,100,0,10]
column3_list = [50,12,200,0,10]
df = pd.DataFrame({'Marking_Rubric': column1_list,
'Requirements and Delivery': column2_list,
'Coding Standards': column3_list})
# `````````````````````````````````````````````````````````````````````````
### create unique identifier that will be used as name of file, otherwise
### you will overwrite files with the same name
df['row_number'] = df.index
df['Marking_Rubric_Rowed'] = df.Marking_Rubric + " " + df.row_number.astype(str)
df
Output 1
# `````````````````````````````````````````````````````````````````````````
### create a loop the length of your dataframe and save each row as a csv
for x in range(0,len(df)):
### try to save file
try:
### get your current row of data first then selecting name of your file ,
### if you want another name just change column
df[x:x+1].to_csv(df[x:x+1].Marking_Rubric_Rowed.iloc[0]+'.csv', #### selecting name for your file here
index=False)
### catch and print out exception if something went wrong
except Exception as e:
print(e)
### continue your loop, you could also put "break" to break your loop
continue
Output 2

Multi-criteria pandas dataframe exceptions reporting

Given the following pandas df -
Holding Account
Entity ID
Holding Account Number
% Ownership
Entity ID %
Account # %
Ownership Audit Note
11 West Summit Drive LLC (80008660955)
3423435
54353453454
100
100
100
NaN
110 Goodwill LLC (91928475)
7653453
65464565
50
50
50
Partial Ownership [50.00%]
1110 Webbers St LLC (14219739)
1235734
12343535
100
100
100
NaN
120 Goodwill LLC (30271633)
9572953
96839592
55
55
55
Inactive Client [10.00%]
Objective - I am trying to create an Exceptions Report and only inc. those rows based on the following logic:
% Ownership =! 100% OR
(Ownership Audit Note == "-") & (Account # % OR Entity ID % ==100%)
Attempt - I am able to produce components, which make up my required logic, however can't seem to bring them together:
# This gets me rows which meet 1.
df = df[df['% Ownership'].eq(100)==False]
# Something 'like' this would get me 2.
df = df[df['Ownership Audit Note'] == "-"] & df[df['Account # %'|'Entity ID %'] == "None"]
I am looking for some hints/tips to help me bring all this together in the most pythonic way.
Use:
df = df[df['% Ownership'].ne(100) | (df['Ownership Audit Note'].eq("-") & (df['Account # %'].eq(100) | df['Entity ID %'].eq(100)))]

For loop store outcome in variable

I'm learning to webscrape data so I can use it to practice data visualization, and I am following a tutorial but I can't get the same results. The problem is that I have a for loop, but cant seem to store the data in a variable. When I run the for loop and try to store the results in a variable i will only get one result, but when i immediately print the for loop results i get all the data.
Can someone explain what I'm doing wrong?
for age in team_riders:
print(age.find('div', class_='age').text)
Results:
30
28
28
22
34
28
25
30
30
30
34
32
33
32
24
27
23
26
22
27
30
28
24
26
21
26
36
26
27
22
32
30
for age in team_riders:
age = age.find('div', class_='age').text
print(age)
prints:
30
Define an empty list before your loop and append() the single results from the loop to this list:
lst = []
for age in team_riders:
lst.append(age.find('div', class_='age').text)
You can also use this oneliner:
lst= [age.find('div', class_='age').text for age in team_riders]
Because you are storing de data as a variable, in this case you are saving the last one (and for every cycle you overwrite the last before), if you want to store every records you need to use a datastructure: List for example, and using append()
to add the record to the list.
So your code will be:
ages = []
for age in team_riders: ages.append(age.find('div', class_='age').text)

Duplicate Information

I have a df that contains the columns, [CPF, name, age].
I need to find the CPF that is repeated on the base and return the person's name together with the CPF.
So far I've done that.
TrueDuplicat = base.groupby(['CPF']).size().reset_index(name='count')
TrueDuplicat = TrueDuplicat[TrueDuplicat['count']>1]
When I put:
TrueDuplicat = TrueDuplicat[['name','CPF']]
I get the error "['name'] not in index".
How do I get the duplicate CPF with the person's name?
Exemplo do DF
CPF name age
38445675455 Alex 15
54785698574 Ana 25
38445675455 Bento 22
65878584558 Caio 33
After your groupby, you do not have a name column in TrueDuplicat. For the example you have posted, TrueDuplicat is:
CPF count
0 38445675455 2
If you're looking for the names corresponding to the CPF values in TrueDuplicat, you can do something like
df[df['CPF'].isin(TrueDuplicat['CPF'].tolist())]
which, for your example, will yield
CPF name age
0 38445675455 Alex 15
2 38445675455 Bento 22

How to structure complex query

I have three databases in a sql database that look like this:
D: Location of dealers
dealer: zip: affiliate:
AAA 32313 Larry
BBB 32322 John
O: Sales record
customer: affiliate: zip: count:
John's Construction Larry 35331 3
Bill's Sales John 12424 300
Jim's Searching Larry 14422 32
Z: Zip distance database
zip1: zip2: dist:
35235 35235 20
32355 15553 14
I am trying to look at Database D (a list of dealers and their location), and see how much their estimated sales are. I am doing this by using Database O, which shows all sales to customers, as well as their location. The logic we are working with is, for each dealer, look through the Database O and find the zip that minimizes distance. We will assume that the dealer that was located closest to the sale was the one who made the sale.
I am having a lot of trouble setting up the SQL query to do this, and am wondering if SQL is even the right place to do this. I know a little python, and a good amount of R. Any help is appreciated.
The query I am currently using:
SELECT d.rowid, d.dealer, d.affiliate, o.count, MIN(z.dist)
FROM database D, database O, zip z
WHERE d.Zip = z.zip1 AND o.zip = z.zip2
GROUP BY d.rowid
I have modified your test data to test the sql query in R. I used sqldf library in R.
## Your Modified Test Data
LocationOfDealers <- data.frame(dealer = c("AAA", "BBB", "CCC"), zip = c(32313, 32322, 35235), affiliate = c("Larry", "John", "Larry"))
SalesRecord <- data.frame(customer=c("John's Construction", "Bill's Sales", "Jim's Searching", "Tim's Sales"), affiliate = c("Larry", "John", "Larry", "James"), zip = c(35331, 12424, 14422, 35235), count = c(3, 300, 32, 20))
ZipDistance <- data.frame(zip1=c(35235, 32355), zip2=c(35235, 15553), dist = c(20, 14))
#LocationOfDealers
# dealer zip affiliate
#1 AAA 32313 Larry
#2 BBB 32322 John
#3 CCC 35235 Larry
# SalesRecord
# customer affiliate zip count
# 1 John's Construction Larry 35331 3
# 2 Bill's Sales John 12424 300
# 3 Jim's Searching Larry 14422 32
# 4 Tim's Sales James 35235 20
# ZipDistance
# zip1 zip2 dist
# 1 35235 35235 20
# 2 32355 15553 14
## Sql query in R using sqldf
library(sqldf)
sqldf({"
SELECT dealer, MIN(dist) as Min_Dist, SUM(count) as dealer_Sold FROM (
SELECT *
FROM LocationOfDealers D
INNER JOIN ZipDistance Z on
D.zip = Z.zip1
INNER JOIN SalesRecord O on
O.zip = Z.zip2) GROUP BY dealer
"})
### There is only one dealer with common Zip between customer and dealers, and its min distance is 20
# dealer Min_Dist dealer_Sold
#1 CCC 20 20

Categories

Resources