How to show only one object if there are few in Django? - python

I'm trying to use a list but still it shows all the values.
allexpelorer1 = Destination.objects.filter(destination = pk).order_by('-pk')
allexpelorer = []
for checkhu in allexpelorer1:
if Destination.objects.filter(destination = pk, user_pk = checkhu.user_pk) not in allexpelorer:
allexpelorer.append(checkhu)

From your question, what I have understood:
Tourists write about the experience of the countries that they have visited. So, obviously, tourists can write multiple reviews.
The table structure should look like the following:
CustomerReviewsTable / CountryProfileTable
Tourist_One has written 3 times for both countries.
Tourist_Two has written 1 time for each country.
Here is the query you should follow:
country = 1 # Denmark
max_ids = Yourmodel.objects.filter(country = country).values('tourist','country').annotate(max_id=Max('pk')).values('max_id')
Then query again
result = Yourmodel.objects.filter(id__in = max_ids)
The above result Queryset will return your expected result.
-When Country 1 passed:
Denmark | Review1 | Tourist_One
Denmark | Review4 | Tourist_Two
-When Country 2 passed:
Italy | Review3 | Tourist_One
Italy | Review5 | Tourist_Two
Now, this will not bring any duplicate comments/reviews.

Related

The function groupby().sum() in a pandas dataframe is concatenate the hours not summing

I'm a beginner python dev and I probably missing something with my code.
I make a get in an API and get a JSON response, so I get the elements that I need and append in a list:
def get_report(cookie):
today = (datetime.today()-timedelta(days=0)).strftime("%Y-%m-%d")
link_report = f'https://atendimento.sistemainfo.com.br/Report/WorkTimeResultToJsonAsync?StartDate={today}&EndDate={today}&TicketType=&Agents%5B0%5D.Id=16086163&Agents%5B0%5D.ToDelete=False&Agents%5B1%5D.Id=40721217&Agents%5B1%5D.ToDelete=False&Agents%5B2%5D.Id=16810784&Agents%5B2%5D.ToDelete=False&Agents%5B3%5D.Id=19891551&Agents%5B3%5D.ToDelete=False&Agents%5B4%5D.Id=23034581&Agents%5B4%5D.ToDelete=False&Agents%5B5%5D.Id=21459575&Agents%5B5%5D.ToDelete=False&Agents%5B6%5D.Id=28407059&Agents%5B6%5D.ToDelete=False&Agents%5B7%5D.Id=34454555&Agents%5B7%5D.ToDelete=False&Agents%5B8%5D.Id=28492909&Agents%5B8%5D.ToDelete=False&_=1666122828849'
request_report = requests.get(link_report, cookies=cookie)
reports = json.loads(request_report.content)['list']
print(reports)
agent = []
appointedTime = []
for i in reports:
agent.append(i['agent'])
appointedTime.append(i['appointedTime'])
df = pd.DataFrame(reports, columns=['agent', 'appointedTime'])
df2 = df.groupby('agent').sum()
print(df2)
get_report(get_cookies())
But my output is:
agent | appointedTime
Person 1 | 00:1000:0400:1500:0300:0200:5100:2100:0700:020
Person 2 | 00:3401:1600:0100:0201:0800:0200:1600:0200:17
Person 3 | 00:3300:17
Person 4 | 01:1300:0100:1900:2200:3000:1300:0900:1800:040...
Person 5 | 00:0400:0100:3600:1800:2100:1600:1201:0100:140...
Person 6 | 00:2100:0100:0100:0200:2300:0100:0100:0100:010...
Person 7 | 01:0600:0100:0100:06
Person 8 | 00:3200:0200:0100:2200:0300:0200:0300:5700:040
Why the second column is concatenated and not a sum of the hours? I've tried some parses but didn't work.
Can anyone help me?

Get just one value of a nested list inside a dictionary to create a Dataframe Update #1

I am using an API that returns a dictionary with a nested list inside, lets name it coins_best The result looks like this:
{'bitcoin': [[1603782192402, 13089.646908288987],
[1603865643028, 13712.070136258053]],
'ethereum': [[1603782053064, 393.6741989091851],
[1603865024078, 404.86117057956386]]}
The first value on the list is a timestamp, while the second is a price in dollars. I want to create a DataFrame with the prices and having the timestamps as index. I tried with this code to do it in just one step:
d = pd.DataFrame()
for id, obj in coins_best.items():
for i in range(0,len(obj)):
temp = pd.DataFrame({
obj[i][1]
}
)
d = pd.concat([d, temp])
d
This attempt gave me a DataFrame with just one column and not the two required, because using the columns argument threw errors (TypeError: Index(...) must be called with a collection of some kind, 'bitcoin' was passed) when I tried with id
Then I tried with comprehensions to preprocess the dictionary and their lists:
for k in coins_best.keys():
inner_lists = (coins_best[k] for inner_dict in coins_best.values())
items = (item[1] for ls in inner_lists for item in ls)
I could not obtain the both elements in the dictionary, just the last.
I know is possible to try:
df = pd.DataFrame(coins_best, columns=coins_best.keys())
Which gives me:
bitcoin ethereum
0 [1603782192402, 13089.646908288987] [1603782053064, 393.6741989091851]
1 [1603785693143, 13146.275972229188] [1603785731599, 394.6174435303511]
And then try to remove the first element in every list of every row, but was even harder to me. The required answer is:
bitcoin ethereum
1603782192402 13089.646908288987 393.6741989091851
1603785693143 13146.275972229188 394.6174435303511
Do you know how to process the dictionary before creating the DataFrame in order the get this result?
Is my first question, I tried to be as clear as possible. Thank you very much.
Update #1
The answer by Sander van den Oord also solved the problem of timestamps and is useful for its purpose. However, the sample code while correct (as it used the info provided) was limited to these two keys. This is the final code that solved the problem for every key in the dictionary.
for k in coins_best:
df_coins1 = pd.DataFrame(data=coins_best[k], columns=['timestamp', k])
df_coins1['timestamp'] = pd.to_datetime(df_coins1['timestamp'], unit='ms')
df_coins = pd.concat([df_coins1, df_coins], sort=False)
df_coins_resampled = df_coins.set_index('timestamp').resample('d').mean()
Thank you very much for your answers.
I think you shouldn't ignore the fact that values of coins are taken at different times. You could do something like this:
import pandas as pd
import hvplot.pandas
coins_best = {
'bitcoin': [[1603782192402, 13089.646908288987],
[1603865643028, 13712.070136258053]],
'ethereum': [[1603782053064, 393.6741989091851],
[1603865024078, 404.86117057956386]],
}
df_bitcoin = pd.DataFrame(data=coins_best['bitcoin'], columns=['timestamp', 'bitcoin'])
df_bitcoin['timestamp'] = pd.to_datetime(df_bitcoin['timestamp'], unit='ms')
df_ethereum = pd.DataFrame(data=coins_best['ethereum'], columns=['timestamp', 'ethereum'])
df_ethereum['timestamp'] = pd.to_datetime(df_ethereum['timestamp'], unit='ms')
df_coins = pd.concat([df_ethereum, df_bitcoin], sort=False)
Your df_coins will now look like this:
+----+----------------------------+------------+-----------+
| | timestamp | ethereum | bitcoin |
|----+----------------------------+------------+-----------|
| 0 | 2020-10-27 07:00:53.064000 | 393.674 | nan |
| 1 | 2020-10-28 06:03:44.078000 | 404.861 | nan |
| 0 | 2020-10-27 07:03:12.402000 | nan | 13089.6 |
| 1 | 2020-10-28 06:14:03.028000 | nan | 13712.1 |
+----+----------------------------+------------+-----------+
Now if you want values to be on the same line, you could use resampling, here I do it per day: all values of the same day for a coin type are averaged:
df_coins_resampled = df_coins.set_index('timestamp').resample('d').mean()
df_coins_resampled will look like this:
+---------------------+------------+-----------+
| timestamp | ethereum | bitcoin |
|---------------------+------------+-----------|
| 2020-10-27 00:00:00 | 393.674 | 13089.6 |
| 2020-10-28 00:00:00 | 404.861 | 13712.1 |
+---------------------+------------+-----------+
I like to use hvplot to get an interactive plot of the result:
df_coins_resampled.hvplot.scatter(
x='timestamp',
y=['bitcoin', 'ethereum'],
s=20, padding=0.1
)
Resulting plot:
There are different timestamps, so the correct output looks differently, than what you presented, but other than that, its a oneliner (where d is your input dictionary):
pd.concat([pd.DataFrame(val, columns=['timestamp', key]).set_index('timestamp') for key, val in d.items()], axis=1)

Pandas not displaying all columns when writing to

I am attempting to export a dataset that looks like this:
+----------------+--------------+--------------+--------------+
| Province_State | Admin2 | 03/28/2020 | 03/29/2020 |
+----------------+--------------+--------------+--------------+
| South Dakota | Aurora | 1 | 2 |
| South Dakota | Beedle | 1 | 3 |
+----------------+--------------+--------------+--------------+
However the actual CSV file i am getting is like so:
+-----------------+--------------+--------------+
| Province_State | 03/28/2020 | 03/29/2020 |
+-----------------+--------------+--------------+
| South Dakota | 1 | 2 |
| South Dakota | 1 | 3 |
+-----------------+--------------+--------------+
Using this here code (runnable by running createCSV(), pulls data from COVID govt GitHub):
import csv#csv reader
import pandas as pd#csv parser
import collections#not needed
import requests#retrieves URL fom gov data
def getFile():
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID- 19/master/csse_covid_19_data/csse_covid_19_time_series /time_series_covid19_deaths_US.csv'
response = requests.get(url)
print('Writing file...')
open('us_deaths.csv','wb').write(response.content)
#takes raw data from link. creates CSV for each unique state and removes unneeded headings
def createCSV():
getFile()
#init data
data=pd.read_csv('us_deaths.csv', delimiter = ',')
#drop extra columns
data.drop(['UID'],axis=1,inplace=True)
data.drop(['iso2'],axis=1,inplace=True)
data.drop(['iso3'],axis=1,inplace=True)
data.drop(['code3'],axis=1,inplace=True)
data.drop(['FIPS'],axis=1,inplace=True)
#data.drop(['Admin2'],axis=1,inplace=True)
data.drop(['Country_Region'],axis=1,inplace=True)
data.drop(['Lat'],axis=1,inplace=True)
data.drop(['Long_'],axis=1,inplace=True)
data.drop(['Combined_Key'],axis=1,inplace=True)
#data.drop(['Province_State'],axis=1,inplace=True)
data.to_csv('DEBUGDATA2.csv')
#sets province_state as primary key. Searches based on date and key to create new CSVS in root directory of python app
data = data.set_index('Province_State')
data = data.iloc[:,2:].rename(columns=pd.to_datetime, errors='ignore')
for name, g in data.groupby(level='Province_State'):
g[pd.date_range('03/23/2020', '03/29/20')] \
.to_csv('{0}_confirmed_deaths.csv'.format(name))
The reason for the loop is to set the date columns (everything after the first two) to a date, so that i can select only from 03/23/2020 and beyond. If anyone has a better method of doing this, I would love to know.
To ensure it works, it prints out all the field names, inluding Admin2 (county name), province_state, and the rest of the dates.
However, in my CSV as you can see, Admin2 seems to have disappeared. I am not sure how to make this work, if anyone has any ideas that'd be great!
changed
data = data.set_index('Province_State')
to
data = data.set_index((['Province_State','Admin2']))
Needed to create a multi key to allow for the Admin2 column to show. Any smoother tips on the date-range section welcome to reopen
Thanks for the help all!

How do I select the specific data in a data frame based on thee contents of other columns?

I'm new to pandas and I'm currently trying to use it on a data set I have on my tablet using qPython (temporary situation, laptop's being fixed). I have a csv file with a set of data organised by country, region, market and item label, with additional columns price, year and month. These are set out in the following manner:
Country | Region | Market | Item Label | ... | Price | Year | Month |
Canada | Quebec | Market No. | Item Name | ... | $$$ | 2002 | 1 |
Canada | Quebec | Market No. | Item Name | ... | $$$ | 2002 | 2 |
Canada | Quebec | Market No. | Item Name | ... | $$$ | 2002 | 3 |
Canada | Quebec | Market No. | Item Name | ... | $$$ | 2002 | 4 |
and so on. I'm looking for a way to plot these prices against time (I've taken to adding the month/12 to the year to effectively merge the last columns).
Originally I had a code to take the csv data and put it in a Dictionary, like so:
{Country_Name: {Region_Name: {Market_Name: {Item_Name: {"Price": price_list, "Time": time_list}}}}}
and used for loops over the keys to access each price and time list.
However, I'm having difficulty using pandas to get a similar result: I've tried a fair few different approaches, such as iloc, data[data.Country == "Canada"][data.Region == "Quebec"][..., etc. to filter the data for each country, region, market and item, but all of them were particularly slow. The data set is fairly hefty (approx. 12000 by 12), so I wouldn't expect instant results, but is there something obvious I'm missing? Or should I just wait til I have my laptop back?
Edit: to try and provide more context, I'm trying to get the prices over the course of the years and months, to plot how the prices fluctuate. I want to separate them based on the country, region, market and item lael, so each line plotted will be a different item in a market in a region in a country. So far, I have the following code:
def abs_join_paths(*args):
return os.path.abspath(os.path.join(*args))
def get_csv_data_frame(*path, memory = True):
return pandas.read_csv(abs_join_paths(*path[:-1], path[-1] + ".csv"), low_memory = memory)
def get_food_data(*path):
food_price_data = get_csv_data_frame(*path, memory = False)
return food_price_data[food_price_data.cm_name != "Fuel (diesel) - Retail"]
food_data = get_food_data(data_path, food_price_file_name)
def plot_food_price_time_data(data, title, ylabel, xlabel, plot_style = 'k-'):
plt.clf()
plt.hold(True)
data["mp_year"] += data["mp_month"]/12
for country in data["adm0_name"].unique():
for region in data[data.adm0_name == country]["adm1_name"].unique():
for market in data[data.adm0_name == country][data.adm1_name == region]["mkt_name"]:
for item_label in data[data.adm0_name == country][data.adm1_name == region][data.mkt_name == market]["cm_name"]:
current_data = data[data.adm0_name == country][data.adm1_name == region][data.mkt_name == market][data.cm_name == item_label]
#year = list(current_data["mp_year"])
#month = list(current_data["mp_month"])
#time = [float(y) + float(m)/12 for y, m in zip(year, month)]
plt.plot(list(current_data["mp_year"]), list(current_data["mp_price"]), plot_style)
print(list(current_data["mp_price"]))
plt.savefig(abs_join_paths(imagepath, title + ".png"))
Edit2/tl;dr: I have a bunch of prices and times, one after the other in one long list. How do I use pandas to split them up based on the contents of the other columns?
Cheers!
I hesitate to guess, but it seems that you are probably iterating through rows (you said you were using iloc). This is the slowest operation in pandas. Pandas data frames are optimized for series access.
If your plotting you can use matplotlib directly with pandas data frames and use the groupby method to combine data, without having to iterate through the rows of your data frame.
Without more information it's difficult to answer your question specifically. Please take a look at the comments on your question.
The groupby function did the trick:
def plot_food_price_time_data(data, title, ylabel, xlabel, plot_style = 'k-'):
plt.clf()
plt.hold(True)
group_data = data.groupby(["adm0_name", "adm1_name", "mkt_name", "cm_name"])
for i in range(len(data)):
print(data.iloc[i, [1, 3, 5, 7]])
specific_data = group_data.get_group(tuple(data.iloc[i, [1, 3, 5, 7]]))
plt.plot(specific_data["mp_price"], specific_data["mp_year"] + specific_data["mp_month"]/12)

Django: understanding .values() and .values_list() use cases

I'm having trouble understanding under what circumstances are .values() or .values_list() better than just using Model instances?
I think the following are all equivalent:
results = SomeModel.objects.all()
for result in results:
print(result.some_field)
results = SomeModel.objects.all().values()
for result in results:
print(result['some_field'])
results = SomeModel.objects.all().values_list()
for some_field, another_field in results:
print(some_field)
obviously these are stupid examples, could anyone point out a good reason for using .values() / .values_list() over just using Model instances directly?
edit :
I did some simple profiling, using a noddy model that contained 2 CharField(max_length=100)
Iterating over just 500 instances to copy 'first' to another variable, taking the average of 200 runs I got following results:
Test.objects.all() time: 0.010061947107315063
Test.objects.all().values('first') time: 0.00578328013420105
Test.objects.all().values_list('first') time: 0.005257354974746704
Test.objects.all().values_list('first', flat=True) time: 0.0052023959159851075
Test.objects.all().only('first') time: 0.011166254281997681
So the answer is definitively : performance! (mostly, see knbk answer below)
.values() and .values_list() translate to a GROUP BY query. This means that rows with duplicate values will be grouped into a single value. So say you have a model People the following data:
+----+---------+-----+
| id | name | age |
+----+---------+-----+
| 1 | Alice | 23 |
| 2 | Bob | 42 |
| 3 | Bob | 23 |
| 4 | Charlie | 30 |
+----+---------+-----+
Then People.objects.values_list('name', flat=True) will return 3 rows: ['Alice', 'Bob', 'Charlie']. The rows with name 'Bob' are grouped into a single value. People.objects.all() will return 4 rows.
This is especially useful when doing annotations. You can do e.g. People.objects.values_list('name', Sum('age')), and it will return the following results:
+---------+---------+
| name | age_sum |
+---------+---------+
| Alice | 23 |
| Bob | 65 |
| Charlie | 30 |
+---------+---------+
As you can see, the ages of both Bob's have been summed, and are returned in a single row. This is different from distinct(), which only applies after the annotations.
Performance is just a side-effect, albeit a very useful one.
values() and values_list() are both intended as optimizations for a specific use case: retrieving a subset of data without the overhead of creating a model instance. Good explanation is given in the Django Documentation.
I use "values_list()" to create a Custom Dropdown Single Select Box for Django Admin as shown below:
# "admin.py"
from django.contrib import admin
from django import forms
from .models import Favourite, Food, Fruit, Vegetable
class FoodForm(forms.ModelForm):
# Here
FRUITS = Fruit.objects.all().values_list('id', 'name')
fruits = forms.ChoiceField(choices=FRUITS)
# Here
VEGETABLES = Vegetable.objects.all().values_list('id', 'name')
vegetables = forms.ChoiceField(choices=VEGETABLES)
class FoodInline(admin.TabularInline):
model = Food
form = FoodForm
#admin.register(Favourite)
class FavouriteAdmin(admin.ModelAdmin):
inlines = [FoodInline]

Categories

Resources