Why don't I get the desired output? In this recursive function? - python

I've been doing this course on Udacity and this one problem has been stressing me for a while and I don't know why this keeps coming back to me and I can't really get a idea of it due to the fact that I find recursive functions super confusing and complicated.
I would like to find the solution by myself but I need some help to how this works and why it doesn't output in my desired manner. Thank you.
# Single Gold Star
# Family Trees
# In the lecture, we showed a recursive definition for your ancestors. For this
# question, your goal is to define a procedure that finds someone's ancestors,
# given a Dictionary that provides the parent relationships.
# Here's an example of an input Dictionary:
ada_family = { 'Judith Blunt-Lytton': ['Anne Isabella Blunt', 'Wilfrid Scawen Blunt'],
'Ada King-Milbanke': ['Ralph King-Milbanke', 'Fanny Heriot'],
'Ralph King-Milbanke': ['Augusta Ada King', 'William King-Noel'],
'Anne Isabella Blunt': ['Augusta Ada King', 'William King-Noel'],
'Byron King-Noel': ['Augusta Ada King', 'William King-Noel'],
'Augusta Ada King': ['Anne Isabella Milbanke', 'George Gordon Byron'],
'George Gordon Byron': ['Catherine Gordon', 'Captain John Byron'],
'John Byron': ['Vice-Admiral John Byron', 'Sophia Trevannion'] }
# Define a procedure, ancestors(genealogy, person), that takes as its first input
# a Dictionary in the form given above, and as its second input the name of a
# person. It should return a list giving all the known ancestors of the input
# person (this should be the empty list if there are none). The order of the list
# does not matter and duplicates will be ignored.
output = []
def ancestors(genealogy, person):
if person in genealogy:
for candidate in genealogy[person]:
output.append(candidate)
ancestors(genealogy, candidate)
return output
else:
return []
# Here are some examples:
print (ancestors(ada_family, 'Augusta Ada King'))
#>>> ['Anne Isabella Milbanke', 'George Gordon Byron',
# 'Catherine Gordon','Captain John Byron']
print (ancestors(ada_family, 'Judith Blunt-Lytton'))
#>>> ['Anne Isabella Blunt', 'Wilfrid Scawen Blunt', 'Augusta Ada King',
# 'William King-Noel', 'Anne Isabella Milbanke', 'George Gordon Byron',
# 'Catherine Gordon', 'Captain John Byron']
print (ancestors(ada_family, 'Dave'))
#>>> []

You need to declare output within the function scope. Otherwise it's not resetting.

Related

Last item on the reduce Method

I have this list of countries:
countries = ['Estonia', 'Finland', 'Sweden', 'Denmark', 'Norway', 'Iceland']
I need to resolve following exersice: Use reduce to concatenate all the countries and to produce this sentence: Estonia, Finland, Sweden, Denmark, Norway, and Iceland are north European countries
def sentece(pais,pais_next):
if pais_next=='Iceland':
return pais+' and '+pais_next + ' are north European countries'
else: return pais+', '+pais_next
countries_reduce=reduce(sentece,countries)
print(countries_reduce)
The code run perfect, but if I want to do in general, How I know what is the last element?.
The reduce function doesn't have a way to tell it what to do about the last item, only what to do about the initialization.
There's two general ways to go about it:
Just do simple concatenation with a comma and a space, but only on the first n-1 items of the list, then manually append the correct format for the last item
Change the last item from Iceland to and Iceland are north European countries, then do the concatenation for the full list.
Figuring out which is the last element is a bad idea; any solution that would give you that information would be a royal hack.
Normally, you wouldn't use reduce to solve this at all (repeated concatenation is a form of Schlemiel the Painter's Algorithm, involving O(n²) work, where efficient algorithms can be O(n)), so you'd just use ', '.join, e.g.:
countries = ['Estonia', 'Finland', 'Sweden', 'Denmark', 'Norway', 'Iceland']
countries_str = f'{", ".join(countries[:-1])} and {countries[-1]} are north European countries'
or, premodifying countries[-1] to reduce the complexity to the point where an f-string isn't necessary (assuming an Oxford comma is okay):
countries = ['Estonia', 'Finland', 'Sweden', 'Denmark', 'Norway', 'Iceland']
countries[-1] = 'and ' + countries[-1] # Put the "and " prefix in front ahead of time
countries_str = ', '.join(countries) + ' are north European countries'
where join is used for the consistent join components, wrapped in an f-string that inserts the last item along with the rest of the formatting.
If you must use reduce, you'd still want to handle the final element separately, either by processing it completely separately at the end, e.g.
from functools import reduce
countries = ['Estonia', 'Finland', 'Sweden', 'Denmark', 'Norway', 'Iceland']
countries_str = f'{reduce(lambda x, y: f"{x}, {y}", countries[:-1])} and {countries[-1]} are north European countries'
print(countries_str)
Try it online!
or by manually tweaking it ahead of time so it can be used in a consistent manner (assuming you're okay with the Oxford comma):
from functools import reduce
countries = ['Estonia', 'Finland', 'Sweden', 'Denmark', 'Norway', 'Iceland']
countries[-1] = 'and ' + countries[-1] # Put the "and " prefix in front ahead of time
countries_str = f'{reduce(lambda x, y: f"{x}, {y}", countries)} are north European countries'
print(countries_str)
Try it online!
Again, reduce is a bad solution to the problem; str.join (using ', '.join) is a O(n) solution (on CPython, it pre-scans the items to join to compute the final length, then preallocates the complete final str, and copies each input exactly once), where reduce is O(n²) (and unlike an actual loop using +=, it can't even benefit from the CPython reference interpreter's implementation detail that sometimes allows concatenation to mutate in-place, reducing the number of data copies).

Get the members from a Twitter list with python

I am trying to create a Data frame with some data from the European Parliament members. However I am struggling with the data received when using the tweepy package.
api = tweepy.API(auth)
# Iterate through all members of the owner's list
member in tweepy.Cursor(api.list_members, 'Europarl_EN', 'all-meps-on-twitter').items():
m = member
print(member)
The problem is I do not how to get a readable table after this. Also I tried this just in order to get the names:
lel = api.list_members('Europarl_EN', 'all-meps-on-twitter', -10)
for i in lel:
print(i.name)
And the output is:
Jaromír Kohlíček
István Ujhelyi
Deli Andor
Maria Grapini
Winkler Gyula
LefterisChristoforou
Mircea Diaconu
Maria Heubuch
Daniel Buda
Marijana Petir
Maite Pagazaurtundúa
Janice Atkinson
Andrew Lewer
Martina Michels
Joachim Starbatty
Peter Jahr
Emil Radev
József Nagy
Quisthoudt-Rowohl
Dominique Bilde
All in, my intention is to transform lel into a dataframe or in the worst scenario to get the usernames.

Add data_frame column while fulfill condition in dict

I am trying to add a column to a pandas.DataFrame. If the string in the DataFrame has one or more words as a key in a dict. But it gives me an error, and I don't know what went wrong. Could anyone help?
data_frame:
tw_test.head()
tweet
0 living the dream. #cameraman #camera #camerac...
1 justin #trudeau's reasons for thanksgiving. to...
2 #themadape butt…..butt…..we’re allergic to l...
3 2 massive explosions at peace march in #turkey...
4 #mulcair suggests there’s bad blood between hi...
dict:
party={}
{'#mulcair': 'NDP', '#cdnleft': 'liberal', '#LiberalExpress': 'liberal', '#ThankYouStephenHarper': 'Conservative ', '#pmjt': 'liberal'...}
My code:
tw_test["party"]=tw_test["tweet"].apply(lambda x: party[x.split(' ')[1].startswith("#")[0]])
I believe your trouble was due to trying to cram too much into a lambda. A function to do the lookup was pretty straight forward:
Code:
party_tags = {
'#mulcair': 'NDP',
'#cdnleft': 'liberal',
'#LiberalExpress': 'liberal',
'#ThankYouStephenHarper': 'Conservative ',
'#pmjt': 'liberal'
}
def party(tweet):
for tag in [t for t in tweet.split() if t.startswith('#')]:
if tag in party_tags:
return party_tags[tag]
Test Code:
import pandas as pd
tw_test = pd.DataFrame([x.strip() for x in u"""
living the dream. #cameraman #camera #camerac
justin #trudeau's reasons for thanksgiving. to
#themadape butt…..butt…..we’re allergic to
2 massive explosions at peace march in #turkey
#mulcair suggests there’s bad blood between
""".split('\n')[1:-1]], columns=['tweet'])
tw_test["party"] = tw_test["tweet"].apply(party)
print(tw_test)
Results:
tweet party
0 living the dream. #cameraman #camera #camerac None
1 justin #trudeau's reasons for thanksgiving. to None
2 #themadape butt…..butt…..we’re allergic to None
3 2 massive explosions at peace march in #turkey None
4 #mulcair suggests there’s bad blood between NDP

Good ways to sort a queryset? - Django

what I'm trying to do is this:
get the 30 Authors with highest score ( Author.objects.order_by('-score')[:30] )
order the authors by last_name
Any suggestions?
What about
import operator
auths = Author.objects.order_by('-score')[:30]
ordered = sorted(auths, key=operator.attrgetter('last_name'))
In Django 1.4 and newer you can order by providing multiple fields.
Reference: https://docs.djangoproject.com/en/dev/ref/models/querysets/#order-by
order_by(*fields)
By default, results returned by a QuerySet are ordered by the ordering tuple given by the ordering option in the model’s Meta. You can override this on a per-QuerySet basis by using the order_by method.
Example:
ordered_authors = Author.objects.order_by('-score', 'last_name')[:30]
The result above will be ordered by score descending, then by last_name ascending. The negative sign in front of "-score" indicates descending order. Ascending order is implied.
I just wanted to illustrate that the built-in solutions (SQL-only) are not always the best ones. At first I thought that because Django's QuerySet.objects.order_by method accepts multiple arguments, you could easily chain them:
ordered_authors = Author.objects.order_by('-score', 'last_name')[:30]
But, it does not work as you would expect. Case in point, first is a list of presidents sorted by score (selecting top 5 for easier reading):
>>> auths = Author.objects.order_by('-score')[:5]
>>> for x in auths: print x
...
James Monroe (487)
Ulysses Simpson (474)
Harry Truman (471)
Benjamin Harrison (467)
Gerald Rudolph (464)
Using Alex Martelli's solution which accurately provides the top 5 people sorted by last_name:
>>> for x in sorted(auths, key=operator.attrgetter('last_name')): print x
...
Benjamin Harrison (467)
James Monroe (487)
Gerald Rudolph (464)
Ulysses Simpson (474)
Harry Truman (471)
And now the combined order_by call:
>>> myauths = Author.objects.order_by('-score', 'last_name')[:5]
>>> for x in myauths: print x
...
James Monroe (487)
Ulysses Simpson (474)
Harry Truman (471)
Benjamin Harrison (467)
Gerald Rudolph (464)
As you can see it is the same result as the first one, meaning it doesn't work as you would expect.
Here's a way that allows for ties for the cut-off score.
author_count = Author.objects.count()
cut_off_score = Author.objects.order_by('-score').values_list('score')[min(30, author_count)]
top_authors = Author.objects.filter(score__gte=cut_off_score).order_by('last_name')
You may get more than 30 authors in top_authors this way and the min(30,author_count) is there incase you have fewer than 30 authors.

How to classify users into different countries, based on the Location field

Most web applications have a Location field, in which uses may enter a Location of their choice.
How would you classify users into different countries, based on the location entered.
For eg, I used the Stack Overflow dump of users.xml and extracted users' names, reputation and location:
['Jeff Atwood', '12853', 'El Cerrito, CA']
['Jarrod Dixon', '1114', 'Morganton, NC']
['Sneakers OToole', '200', 'Unknown']
['Greg Hurlman', '5327', 'Halfway between the boardwalk and Six Flags, NJ']
['Power-coder', '812', 'Burlington, Ontario, Canada']
['Chris Jester-Young', '16509', 'Durham, NC']
['Teifion', '7024', 'Wales']
['Grant', '3333', 'Georgia']
['TimM', '133', 'Alabama']
['Leon Bambrick', '2450', 'Australia']
['Coincoin', '3801', 'Montreal']
['Tom Grochowicz', '125', 'NJ']
['Rex M', '12822', 'US']
['Dillie-O', '7109', 'Prescott, AZ']
['Pete', '653', 'Reynoldsburg, OH']
['Nick Berardi', '9762', 'Phoenixville, PA']
['Kandis', '39', '']
['Shawn', '4248', 'philadelphia']
['Yaakov Ellis', '3651', 'Israel']
['redwards', '21', 'US']
['Dave Ward', '4831', 'Atlanta']
['Liron Yahdav', '527', 'San Rafael, CA']
['Geoff Dalgas', '648', 'Corvallis, OR']
['Kevin Dente', '1619', 'Oakland, CA']
['Tom', '3316', '']
['denny', '573', 'Winchester, VA']
['Karl Seguin', '4195', 'Ottawa']
['Bob', '4652', 'US']
['saniul', '2352', 'London, UK']
['saint_groceon', '1087', 'Houston, TX']
['Tim Boland', '192', 'Cincinnati Ohio']
['Darren Kopp', '5807', 'Woods Cross, UT']
using the following Python script:
from xml.etree import ElementTree
root = ElementTree.parse('SO Export/so-export-2009-05/users.xml').getroot()
items = ['DisplayName','Reputation','Location']
def loop1():
for count,i in enumerate(root):
det = [i.get(x) for x in items]
print det
if count>30: break
loop1()
What is the simplest way to classify people into different countries? Are there any ready lookup tables available that provide me an output saying X location belongs to Y country?
The lookup table need not be totally accurate. Reasonably accurate answers are obtained by querying the location string on Google, or better still, Wolfram Alpha.
You best bet is to use a Geocoding API like geopy (some Examples).
The Google Geocoding API, for example, will return the country in the CountryNameCode-field of the response.
With just this one location field the number of false matches will probably be relatively high, but maybe it is good enough.
If you had server logs, you could try to also look up the users IP address with an IP geocoder (more information and pointers on Wikipedia
Force users to specify country, because you'll have to deal with ambiguities. This would be the right way.
If that's not possible, at least make your best-guess in conjunction with their IP address.
For example, ['Grant', '3333', 'Georgia']
Is this Georgia, USA?
Or is this the Republic of Georgia?
If their IP address suggests somewhere in Central Asia or Eastern Europe, then chances are it's the Republic of Georgia. If it's North America, chances are pretty good they mean Georgia, USA.
Note that mappings for IP address to country isn't 100% accurate, and the database needs to be updated regularly. In my opinion, far too much trouble.

Categories

Resources