I need to be able to use function (get_years) to iterate a list of reviews such as:
{'rating': 5.0,
'reviewer_name': 'Karen',
'product_id': 'B00004RFRV',
'review_title': 'Bialetti is the Best!',
'review_time': '11 12, 2017',
'images': ['https://images-na.ssl-images-amazon.com/images/I/81+XxFRGyBL._SY88.jpg'],
'styles': {'Size:': ' 12-Cup', 'Color:': ' Silver'}}```
{'rating': 3.0,
'reviewer_name': 'Peter DP',
'product_id': 'B00005OTXM',
'review_title': "Mr. Coffee DWX23 12-cup doesn't have the quality feel as my 13 year old nearly identical 12-cup Mr. Coffee",
'review_time': '04 17, 2015',
'images': ['https://images-na.ssl-images-amazon.com/images/I/71sFKwTW9sL._SY88.jpg'],
'styles': {'Style Name:': ' COFFEE MAKER ONLY'}}
{'rating': 5.0,
'reviewer_name': 'B. Laska',
'product_id': 'B00004RFRV',
'review_title': 'Love my Moka pots!',
'review_time': '07 9, 2015',
'images': ['https://images-na.ssl-images-amazon.com/images/I/719NCqw4GML._SY88.jpg'],
'styles': {'Size:': ' 1-Cup', 'Color:': ' Silver'}}
to be able to return:
print(get_years(reviews)) # [2007, 2008, 2009, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018]
print(type(get_years(reviews))) # <class 'list'>
I have:
def get_years(review):
review_years_set = set()
for review in reviews:
review_years_set.add(review['review_time'][-4:])
review_years_list = list(review_years_set)
review_years_list.sort()
return review_years_list
which gives me what I want but it seems like the longer route. Is there a more Pythonic or efficient way to get a sorted list of set values?
Given an iterable of string-formatted dates, e.g.:
dates = ['07 9, 2007', '04 1, 2008', '01 2, 2007', '08 2, 2014', '01 3, 2004', '01 4, 2004']
A concise way to produce a sorted list of unique years is as follows using set comprehension:
sorted_dates = sorted({int(date[-4:]) for date in dates})
print(sorted_dates)
Output:
[2004, 2007, 2008, 2014]
try this.
def get_years(reviews):
return sorted([review['review_time'][-4:] for review in reviews])
print(get_years(reviews))
Related
I'm getting info from a page with BeautifulSoup and I obtained the link:
[<span class="field-content">Friday, September 11, 2015</span>]
with the commands
links = soup.find_all('div', attrs={'class':'views-row'})
link = links[0]
link.find('span', attrs={'class':'views-field views-field-created'}).select('span')
but I need to parse the date. How can I get Friday, September 11, 2015 out of this?
I've found it, it's link.find('span', attrs={'class':'views-field views-field-created'}).select_one('span').text
To answer on your example from the question - Pick the last element from your resultset:
link.find('span', attrs={'class':'views-field views-field-created'}).select('span')[-1].text
or shorter:
link.find_all("span")[-1].text
But if you want to extract all information and store as structured data, there would be a better approach with using stripped_strings.
Example
import requests
from bs4 import BeautifulSoup
url = 'https://web.archive.org/web/20150913224145/http://www.newyorksocialdiary.com/party-pictures'
res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")
data = []
for item in soup.select('.view-content div'):
c = list(item.stripped_strings)
data.append({
'title':c[0],
'date':c[-1],
'url':item.a['href'].split('/',3)[-1]
})
print(data)
Output
[{'title': 'Kicks offs, sing offs, and pro ams', 'date': 'Friday, September 11, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/kicks-offs-sing-offs-and-pro-ams'}, {'title': 'Grand Finale of the Hampton Classic Horse Show', 'date': 'Tuesday, September 1, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/grand-finale-of-the-hampton-classic-horse-show'}, {'title': 'Riders, Spectators, Horses, and More ...', 'date': 'Wednesday, August 26, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/riders-spectators-horses-and-more'}, {'title': 'Artist and Writers (and Designers)', 'date': 'Thursday, August 20, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/artist-and-writers-and-designers'}, {'title': 'Garden Parties Kickoffs and Summer Benefits', 'date': 'Monday, August 17, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/garden-parties-kickoffs-and-summer-benefits'}, {'title': 'The Summer Set', 'date': 'Wednesday, August 12, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/the-summer-set'}, {'title': 'Midsummer Parties', 'date': 'Wednesday, August 5, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/midsummer-parties'}, {'title': 'The Watermill Center and The Parrish', 'date': 'Wednesday, July 29, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/the-watermill-center-and-the-parrish'}, {'title': 'Unconditional Love', 'date': 'Thursday, July 23, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/unconditional-love'}, {'title': "Women's Health, Boys & Girls, Cancer Research, and Just Plain Summer Fun", 'date': 'Friday, July 17, 2015', 'url': 'http://www.newyorksocialdiary.com/party-pictures/2015/womens-health-boys-girls-cancer-research-and-just-plain-summer-fun'},...]
I've only tried
from datetime import datetime
my_dates = ['5-Nov-18', '25-Mar-17', '1-Nov-18', '7 Mar 17']
my_dates.sort(key=lambda date: datetime.strptime(date, "%d-%b-%y"))
print(my_dates)
But how can I make this work for date formats like
my_dates = ['05 Nov 2018', '25 Mar 2017', '01 Nov 2018', '07 Mar 2017']
One inelegant solution which comes to mind is replacing all spaces with dashes as shown below:
from datetime import datetime
my_dates = ['5-Nov-18', '25-Mar-17', '1-Nov-18', '7 Mar 17']
my_dates.sort(key=lambda date: datetime.strptime(date.replace(' ', '-'), "%d-%b-%y"))
print(my_dates)
If all you're looking is the specific set of dates you have provided, just change up the format in strptime():
my_dates = ['05 Nov 2018', '25 Mar 2017', '01 Nov 2018', '07 Mar 2017']
my_dates.sort(key=lambda date: datetime.strptime(date, "%d %b %Y"))
# Change the last %y to %Y
However, if you have varying date formats in your list, you could prepare a list of possibly anticipated string formats and define your own function to parse against each format:
def func(date, formats):
for frmt in formats:
try:
str_date = datetime.strptime(date, frmt)
return str_date
except ValueError:
continue
# might want to consider handling a scenario when nothing is returned
my_formats = ['%d-%b-%y', '%d %b %y', '%d %b %Y']
my_dates = ['5-Nov-18', '25-Mar-17', '1-Nov-18', '7 Mar 17', '05 Nov 2018', '25 Mar 2017', '01 Nov 2018', '07 Mar 2017']
my_dates.sort(key=lambda date: func(date, my_formats))
print(my_dates)
# ['7 Mar 17', '07 Mar 2017', '25-Mar-17', '25 Mar 2017', '1-Nov-18', '01 Nov 2018', '5-Nov-18', '05 Nov 2018']
The caveat here is if an unanticipated date format shows up, the function will return None so it won't be sorted properly. If that's a concern, you might want to add some handling at the end of the func() when all parsing attempts failed. Some devs might also say to avoid try...except..., but I can only come up with this way.
dates can be sorted once it's a datetime object
my_dates = ['05 Nov 2018', '25 Mar 2017', '01 Nov 2018', '07 Mar 2017']
my_dates = [ dt.strptime(date, '%d %b %Y') for date in my_dates ]
print(my_dates)
# [datetime.datetime(2018, 11, 5, 0, 0), datetime.datetime(2017, 3, 25, 0, 0), datetime.datetime(2018, 11, 1, 0, 0), datetime.datetime(2017, 3, 7, 0, 0)]
my_dates.sort()
print(my_dates)
# [datetime.datetime(2017, 3, 7, 0, 0), datetime.datetime(2017, 3, 25, 0, 0), datetime.datetime(2018, 11, 1, 0, 0), datetime.datetime(2018, 11, 5, 0, 0)]
my_dates = [ date.strftime('%d %b %Y') for date in my_dates ]
print(my_dates)
# ['07 Mar 2017', '25 Mar 2017', '01 Nov 2018', '05 Nov 2018']
I have this list comprehension function:
def mergesafirmacheta(list1,list2):
desiredlist = [list2[0][:3] + [n2, list2[0][4]] if n1 == list2[0][1]
else [id, n1, dates, n2, 0] for id, n1, dates, n2, n3 in list1]
return desiredlist
And my list1 and list2 looks like this:
list1=[['user1', 186, 'Feb 2017, Mar 2017, Apr 2017', 550, 555],
['user2', 282, 'Mai 2017', 3579, 3579],
['user3', 281, 'Mai 2017', 10, 10]]
list2=[['user2', 282, 'Feb 2017, Mar 2017, Apr 2017, Mai 2017', 100, 1000],
['user1', 186, 'Feb 2017, Mar 2017, Apr 2017, Mai 2017', 0, 740]]
Where I if n1 == list2[0][1]I would like to loop over all lists not just the first one at index[1] position in list2, cause right now I would only get to: ['user2', 282, 'Feb 2017, Mar 2017, Apr 2017, Mai 2017', 100, 1000] compare the 282 in my if, but I will never get to 186 in my second list. How I can loop over all of them? (could be that in list2 will have more lists).
Later Edit:
Desired output:
[['user1', 186, 'Feb 2017, Mar 2017, Apr 2017', 550, 740],
['user2', 282, 'Feb 2017, Mar 2017, Apr 2017, Mai 2017', 3579, 1000],
['user3', 281, 'Mai 2017', 10, 0]]
Add one more loop at begging and change the variables like this:-
def mergesafirmacheta(list1,list2):
desiredlist = [list_2[:3] + [n2, list_2[4]] if n1 == list_2[1]
else [id, n1, dates, n2, 0] for list_2 in list2 for id, n1, dates, n2, n3 in list1]
return desiredlist
I hope this is what you were looking for.
I guess this is what you are looking for:
list1=[['user1', 186, 'Feb 2017, Mar 2017, Apr 2017', 550, 555], ['user2', 282, 'Mai 2017', 3579, 3579], ['user3', 281, 'Mai 2017', 10, 10]]
list2=[['user2', 282, 'Feb 2017, Mar 2017, Apr 2017, Mai 2017', 100, 1000],['user1', 186, 'Feb 2017, Mar 2017, Apr 2017, Mai 2017', 0, 740]]
desiredlist = []
for id, n1, dates, n2, n3 in list1:
counter = 0
for list_2 in list2:
if n1 == list_2[1]:
desiredlist.append(list_2[:3] + [n2, list_2[4]])
else:
counter += 1
if counter == len(list2):
desiredlist.append([id, n1, dates, n2, 0])
print(desiredlist)
You want to go to else condition only when NO MATCH found for n1 in list1 for all elements of list2.
I have this list of lists:
listoflist = [['BOTOS', 'AUGUSTIN', 14, 'March 2016', 600, 'ALOCATIA'], ['HENDRE', 'AUGUSTIN', 14, 'February 2015', 600, 'ALOCATIA']]
^^ That was just a example, I will have many more lists in my list of lists with the same format.
This will be my desired output:
listoflist = [['BOTOS AUGUSTIN', 14, 'March 2016', 600, 'ALOCATIA'], ['HENDRE AUGUSTIN', 14, 'February 2015', 600, 'ALOCATIA']]
Basically in each list I want to join first index with the second to form a full name in one index like in the example.
And I would need a function for that who will take a input a list, how can I do this in a simple way? (I don't want extra lib for this). I use python 3.5, thank you so much for your time!
You can iterate through the outer list and then join the slice of the first two items:
def merge_names(lst):
for l in lst:
l[0:2] = [' '.join(l[0:2])]
merge_names(listoflist)
print(listoflist)
# [['BOTOS AUGUSTIN', 14, 'March 2016', 600, 'ALOCATIA'], ['HENDRE AUGUSTIN', 14, 'February 2015', 600, 'ALOCATIA']]
this simple list-comprehension should do the trick:
res = [[' '.join(item[0:2]), *item[2:]] for item in listoflist]
join the first two items in the list and append the rest as is.
You can try this:
f = lambda *args: [' '.join(args[:2]), *args[2:]]
listoflist = [['BOTOS', 'AUGUSTIN', 14, 'March 2016', 600, 'ALOCATIA'], ['HENDRE', 'AUGUSTIN', 14, 'February 2015', 600, 'ALOCATIA']]
final_list = [f(*i) for i in listoflist]
Output:
[['BOTOS AUGUSTIN', 14, 'March 2016', 600, 'ALOCATIA'], ['HENDRE AUGUSTIN', 14, 'February 2015', 600, 'ALOCATIA']]
You can use a list comprehension as well:
listoflist = [['BOTOS', 'AUGUSTIN', 14, 'March 2016', 600, 'ALOCATIA'], ['HENDRE', 'AUGUSTIN', 14, 'February 2015', 600, 'ALOCATIA']]
def f(lol):
return [[' '.join(l[0:2])]+l[3:] for l in lol]
listoflist = f(listoflist)
print(listoflist)
# => [['BOTOS AUGUSTIN', 'March 2016', 600, 'ALOCATIA'], ['HENDRE AUGUSTIN', 'February 2015', 600, 'ALOCATIA']]
I have 3 lists. Currently I am using 1 list to sort all list like this:
sorted_lists = sorted(zip(new_date_list, flipkart_sale_list, paytm_sale_list), key=lambda x: x[0])
new_date_list, flipkart_sale_list, paytm_sale_list = [[x[i] for x in sorted_lists] for i in range(3)]
The new date list contains date strings as element which are in format of %b %d.
new_date_list = ['Feb 01', 'Feb 02', 'Jan 04', 'Jan 05', 'Jan 06', 'Jan 07']
flipkart_sale_list = [5000,4000,3000,6000,1000,9000]
paytm_sale_list = [2200,2500,3500,5000,4000,1000]
However The above code is not sorting the date string properly. So how to sort the all the lists according to new_date_lists?
The other two lists contains numbers.
Expected Result:
new_date_list = ['Jan 04', 'Jan 05', 'Jan 06', 'Jan 07', 'Feb 01', 'Feb 02']
flipkart_sale_list = [3000,6000,1000,9000,5000,4000]
paytm_sale_list = [3500,5000,4000,1000,2200,2500]
Convert to datetime objects before sort. Try
sorted_lists = sorted(zip(new_date_list, flipkart_sale_list, paytm_sale_list), key=lambda x: datetime.datetime.strptime(x[0], "%b %d"))
new_date_list, flipkart_sale_list, paytm_sale_list = [[x[i] for x in sorted_lists] for i in range(3)]
Result:
>>> new_date_list
['Jan 04', 'Jan 05', 'Jan 06', 'Jan 07', 'Feb 01', 'Feb 02']
>>> flipkart_sale_list
[1000, 9000, 5000, 4000, 3000, 6000]
>>> paytm_sale_list
[4000, 1000, 2200, 2500, 3500, 5000]
As suggested by #StefanPochmann, this
new_date_list, flipkart_sale_list, paytm_sale_list = [[x[i] for x in sorted_lists] for i in range(3)]
can also be done by zip
new_date_list, flipkart_sale_list, paytm_sale_list = map(list, zip(*sorted_lists))
I did not really get the 3 lists purpose but if u got one big list do
this trick.
import datetime
sorted(new_date_list, key=lambda x: datetime.datetime.strptime(x, '%b %d'))
**edit - after u've added the lists data example - i'd do the sort 3 times