I am trying to make an RSS feed composed of different sources and I would like them to be sorted by newest date, rather than the source itself. I store all of my news in one python dictionary, regardless of its source:
feed = None
if sports['nhl'] == 1:
feed = newsParse('nhl')
allOff = False
if sports['nba'] == 1:
feed = newsParse('nba')
allOff = False
if sports['nfl'] == 1:
feed = newsParse('nfl')
allOff = False
if sports['mlb'] == 1:
feed = newsParse('mlb')
allOff = False
The function looks like this:
def newsParse(league):
rss_url = 'https://www.espn.com/espn/rss/' + league + '/news'
parser = feedparser.parse(rss_url)
newsInfo = {
'title': [],
'link': [],
'description': [],
'date': []
}
for entry in parser.entries:
newsInfo['title'].append(entry.title)
newsInfo['description'].append(entry.description)
newsInfo['link'].append(entry.links[0].href)
newsInfo['date'].append(entry.published)
return newsInfo
If I print out 'feed' I get all of the titles sorted by source, then all of the descriptions sorted by source, and etc. The ['date'] data looks like this:
Fri, 24 Jul 2020 09:35:08 EST'
How can I sort all of my values in chronological order, whilst keeping the titles, descriptions, and links together?
Why not save the entries as a list of dictionaries ?
For example:
def newsParse(league):
rss_url = 'https://www.espn.com/espn/rss/' + league + '/news'
parser = feedparser.parse(rss_url)
newsInfo = []
for entry in parser.entries:
newEntry = {'title': entry.title,
'description': entry.description,
'link': entry.link,
'date': entry.date}
newsInfo.append(newEntry)
return newsInfo
newsInfo will be a list of dictionaries,
and you can sort that list using this line of code:
sorted(newsInfo, key=lambda k: k['date'])
If the date from the RSS feed is a string,
I think you should convert it to python's datetime type for the sorting to work.
Edit (answer for comment):
If you need a single list with all the leagues,
you can use this code:
feed = []
if sports['nhl'] == 1:
feed.extend(newsParse('nhl'))
allOff = False
if sports['nba'] == 1:
feed.extend(newsParse('nba'))
allOff = False
if sports['nfl'] == 1:
feed.extend(newsParse('nfl'))
allOff = False
if sports['mlb'] == 1:
feed.extend(newsParse('mlb'))
allOff = False
After feed contains all the data you need,
you can sort it by date:
sorted(feed, key=lambda k: k['date'])
Related
I have a database of scientific articles with their authors, the date of publication (on arXiV) and their respective arXiV id. Now, I want to add to this database the number of citations each year after the article has been created.
For instance, I would to like to retrieve the graph on the right hand side (example).
Is there an API that could help me?
I could use this method here opencitationAPI, but I wondered if there was a more straightforward way using the inspirehep data.
I figured out how to do that by using the inspirehep api. A sleeping time should also be considered.
import pandas as pd
import requests
from collections import defaultdict
ihep_search_arxiv = "https://inspirehep.net/api/arxiv/"
ihep_search_article = "https://inspirehep.net/api/literature?sort=mostcited&size=100&page=1&q=refersto%3Arecid%3A"
year = [str(x+1) for x in range(2009,2022)]
def count_year(year, input_list):
#counting the number of citations each year
year_count = {}
for y in year:
if input_list[0] == 'NaN':
year_count[y] = 0
else:
year_count[y] = input_list.count(y)
return year_count
def get_cnumber():
citation_url = []
for id in arxiv_id:
inspirehep_url_arxiv = f"{ihep_search_arxiv}{id}"
control_number = requests.get(inspirehep_url_arxiv).json()["metadata"]["control_number"]
citation_url.append(f"{ihep_search_article}{control_number}")
return citation_url
def get_citations():
citation_url = get_cnumber()
citation_per_year = pd.DataFrame(columns=year)
citation_date = defaultdict(list)
for i, url in enumerate(citation_url):
data_article = requests.get(url).json()
if len(data_article["hits"]["hits"]) == 0:
citation_date[i].append('NaN')
else :
for j, _ in enumerate(data_article["hits"]["hits"]):
citation_date[i].append(data_article["hits"]["hits"][j]["created"][:4])
for p, _ in enumerate(citation_date):
citation_per_year = citation_per_year.append(count_year(year,citation_date[p]), True)
citation_per_year.insert(0,"arxiv_id",arxiv_id,True)
return citation_per_year
arxiv_id = recollect_data() #list of arxiv ids collected in a separate way
print(get_citations())
lista =
[{Identity: joe,
summary:[
{distance: 1, time:2, status: idle},
{distance:2, time:5, status: moving}],
{unit: imperial}]
I can pull the data easily and put in pandas. The issue is, if an identity has multiple instances of, say idle, it takes the last value, instead of summing together.
my code...
zdrivershours = {}
zdistance = {}
zstophours = {}
For driver in resp:
driverid[driver['AssetID']] = driver['AssetName']
for value in [driver['SegmentSummary']]:
for value in value:
if value['SegmentType'] == 'Motion':
zdriverhours[driver['AssetID']] = round(value['Time']/3600,2)
if value['SegmentType'] == 'Stop':
zstophours[driver['AssetID']] = round(value['IdleTime']/3600,2)
zdistance[driver['AssetID']] = value['Distance']
To obtain the summatory of distance for every driver replace:
zdistance[driver['AssetID']] = value['Distance']
by
if driver['AssetID'] in zdistance:
zdistance[driver['AssetID']] = zdistance[driver['AssetID']] + value['Distance']
else:
zdistance[driver['AssetID']] = value['Distance']
I'm not sure I am approaching this in the right way.
Scenario:
I have two SQL tables that contain rent information. One table contains rent due, and the other contains rent received.
I'm trying to build a rent book which takes the data from both tables for a specific lease and generates a date ordered statement which will be displayed on a webpage.
I'm using Python, Flask and SQL Alchemy.
I am currently learning Python, so I'm not sure if my approach is the best.
I've created a dictionary which contains the keys 'Date', 'Payment type' and 'Payment Amount', and in each of these keys I store a list which contains the data from my SQL queries. The bit im struggling on is how to sort the dictionary so it sorts by the date key, keeping the values in the other keys aligned to their date.
lease_id = 5
dates_list = []
type_list = []
amounts_list = []
rentbook_dict = {}
payments_due = Expected_Rent_Model.query.filter(Expected_Rent_Model.lease_id == lease_id).all()
payments_received = Rent_And_Fee_Income_Model.query.filter(Rent_And_Fee_Income_Model.lease_id == lease_id).all()
for item in payments_due:
dates_list.append(item.expected_rent_date)
type_list.append('Rent Due')
amounts_list.append(item.expected_rent_amount)
for item in payments_received:
dates_list.append(item.payment_date)
type_list.append(item.payment_type)
amounts_list.append(item.payment_amount)
rentbook_dict.setdefault('Date',[]).append(dates_list)
rentbook_dict.setdefault('Type',[]).append(type_list)
rentbook_dict.setdefault('Amount',[]).append(amounts_list)
I was then going to use a for loop within the flask template to iterate through each value and display it in a table on the page.
Or am I approaching this in the wrong way?
so I managed to get this working just using zipped list. Im sure there is a better way for me to accomplish this but im pleased I've got it working.
lease_id = 5
payments_due = Expected_Rent_Model.query.filter(Expected_Rent_Model.lease_id == lease_id).all()
payments_received = Rent_And_Fee_Income_Model.query.filter(Rent_And_Fee_Income_Model.lease_id == lease_id).all()
total_due = 0
for debit in payments_due:
total_due = total_due + int(debit.expected_rent_amount)
total_received = 0
for income in payments_received:
total_received = total_received + int(income.payment_amount)
balance = total_received - total_due
if balance < 0 :
arrears = "This account is in arrears"
else:
arrears = ""
dates_list = []
type_list = []
amounts_list = []
for item in payments_due:
dates_list.append(item.expected_rent_date)
type_list.append('Rent Due')
amounts_list.append(item.expected_rent_amount)
for item in payments_received:
dates_list.append(item.payment_date)
type_list.append(item.payment_type)
amounts_list.append(item.payment_amount)
payment_data = zip(dates_list, type_list, amounts_list)
sorted_payment_data = sorted(payment_data)
tuples = zip(*sorted_payment_data)
list1, list2, list3 = [ list(tuple) for tuple in tuples]
return(render_template('rentbook.html',
payment_data = zip(list1,list2,list3),
total_due = total_due,
total_received = total_received,
balance = balance))
I have a list of lists containing company objects:
companies_list = [companies1, companies2]
I have the following function:
def get_fund_amount_by_year(companies_list):
companies_length = len(companies_list)
for idx, companies in enumerate(companies_list):
companies1 = companies.values_list('id', flat=True)
funding_rounds = FundingRound.objects.filter(company_id__in=companies1).order_by('announced_on')
amount_per_year_list = []
for fr in funding_rounds:
fr_year = fr.announced_on.year
fr_amount = fr.raised_amount_usd
if not any(d['year'] == fr_year for d in amount_per_year_list):
year_amount = {}
year_amount['year'] = fr_year
for companies_idx in range(companies_length):
year_amount['amount'+str(companies_idx)] = 0
if companies_idx == idx:
year_amount['amount'+str(companies_idx)] = fr_amount
amount_per_year_list.append(year_amount)
else:
for year_amount in amount_per_year_list:
if year_amount['year'] == fr_year:
year_amount['amount'+str(idx)] += fr_amount
return amount_per_year_list
The problem is the resulting list of dictionaries has only one amount attribute updated.
As you can see "amount0" contains all "0" amounts:
[{'amount1': 12100000L, 'amount0': 0, 'year': 1999}, {'amount1':
8900000L, 'amount0': 0, 'year': 2000}]
What am I doing wrong?
I put list of dictionaries being built in the loop and so when it iterated it overwrote the last input. I changed it to look like:
def get_fund_amount_by_year(companies_list):
companies_length = len(companies_list)
**amount_per_year_list = []**
for idx, companies in enumerate(companies_list):
companies1 = companies.values_list('id', flat=True)
funding_rounds = FundingRound.objects.filter(company_id__in=companies1).order_by('announced_on')
I would like to create a bunch of empty lists with names such as:
author1_count = []
author2_count = []
...
...
and so on...but a priori I do not know how many lists I need to generate.
Answers to question similar this one suggest to create a dictionary as in (How to create multiple (but individual) empty lists in Python?) or an array of lists. However, I wish to append values to the list as in:
def search_list(alist, aname):
count = 0
author_index = 0
author_list = alist
author_name = aname
for author in author_list:
if author == author_name:
author_index = author_list.index(author)+1
count = 1
return count, author_index
cehw_list = ["Ford, Eric", "Mustang, Jason", "BMW, James", "Mercedes, Megan"]
author_list = []
for author in authors:
this_author = author.encode('ascii', 'ignore')
author_list.append(this_author)
# Find if the author is in the authorlist
for cehw in cehw_list:
if cehw == cehw_list[0]:
count0, position0 = search_list(author_list, cehw)
author1_count.append(count0)
elif cehw == cehw_list[1]:
count1, position1 = search_list(author_list, cehw)
author2_count.append(count1)
...
...
Any idea how to create such distinct lists. Is there an elegant way to do this?
Dictionaries! You only need to be more specific when appending values, e.g.
author_lists = {}
for i in range(3):
author_lists['an'+str(i)] = []
author_lists
{'an0': [], 'an1': [], 'an2': []}
author_lists['an0'].append('foo')
author_lists
{'an0': ['foo'], 'an1': [], 'an2': []}
You should be able to use a dictionary still.
data = {}
for cehw in cehw_list:
count0, position0 = search_list(author_list, cehw)
# Or whatever property on cehw that has the unique identifier
if cehw in data:
data[cehw].append(count0)
else:
data[cehw] = [count0]