Scraping data with a lack of classes / ids on elements

Scraping data with a lack of classes / ids on elements - python

I'm trying to scrape data to build an object which looks like;
{
"artist": "Oasis",
"albums": {
"Definitely Maybe": [
"Rock n Roll Star",
"Shakermaker",
...
],
"(What's The Story) Morning Glory": [
"Hello",
"Roll With It"
...
],
...
}
}
Here is how the HTML on the page looks;
I'm currently scrapping the data like so;
data = []
for div in soup.find_all("div",{"id":"listAlbum"}):
links = div.findAll('a')
for a in links:
if a.text.strip() is "":
pass
elif a.text.strip():
data.append(a.text.strip())
Likewise, grabbing the album names is straightforward also;
for div in soup.find_all("div",{"class":"album"}):
titles = div.findAll('b')
for t in titles:
...
My problem is how to use the above two loops to build an object like the one at the top. How can I ensure the songs from X album, go into the correct album object. If each song had an album attribute, it would be clear to me. However, with the HTML structured the way it is - I'm at a bit of a loss.
EDIT: Find the HTML below;
<div id="listAlbum">
<a id="1368"></a>
<div class="album">album: <b>"Definitely Maybe"</b> (1994)</div>
Rock 'n' Roll Star<br>
Shakermaker<br>
Live Forever<br>
Up In The Sky<br>
Columbia<br>
Supersonic<br>
Bring It On Down<br>
Cigarettes & Alcohol<br>
Digsy's Diner<br>
Slide Away<br>
Married With Children<br>
Sad Song<br>
<a id="1366"></a>
<div class="album">album: <b>"(What's The Story) Morning Glory"</b> (1995)</div>
Hello<br>
Roll With It<br>
Wonderwall<br>
Don't Look Back In Anger<br>
Hey Now<br>
Some Might Say<br>
Cast No Shadow<br>
She's Electric<br>
Morning Glory<br>
Champagne Supernova<br>
Bonehead's Bank Holiday<br>

You can do this using find_next_siblings().
Code:
oasis = {
'artist': 'Oasis',
'albums': {}
}
soup = BeautifulSoup(html, 'lxml') # where html is the html you've provided
all_albums = soup.find('div', id='listAlbum')
first_album = all_albums.find('div', class_='album')
album_name = first_album.b.text
songs = []
for tag in first_album.find_next_siblings(['a', 'div']):
# If tag is <div> add the previous album.
if tag.name == 'div':
oasis['albums'][album_name] = songs
songs = []
album_name = tag.b.text
# If tag is <a> append song to the list.
else:
songs.append(tag.text)
# Add the last album
oasis['albums'][album_name] = songs
print(oasis)
Output:
{
'artist': 'Oasis',
'albums': {
'"Definitely Maybe"': ["Rock 'n' Roll Star", 'Shakermaker', 'Live Forever', 'Up In The Sky', 'Columbia', 'Supersonic', 'Bring It On Down', 'Cigarettes & Alcohol', "Digsy's Diner", 'Slide Away', 'Married With Children', 'Sad Song', ''],
'"(What\'s The Story) Morning Glory"': ['Hello', 'Roll With It', 'Wonderwall', "Don't Look Back In Anger", 'Hey Now', 'Some Might Say', 'Cast No Shadow', "She's Electric", 'Morning Glory', 'Champagne Supernova', "Bonehead's Bank Holiday"]
}
}
EDIT:
After checking the website, I've made a few changes to the code.
First, you need to skip this <a id="6910"></a> tag (which is located at the end of each album) as it will add a song with empty name. Second, the text other songs: is not located inside <b> tag; so it will raise an error with album_name = tag.b.text.
Doing the following changes will give you exactly what you need.
for tag in first_album.find_next_siblings(['a', 'div']):
if tag.name == 'div':
oasis['albums'][album_name] = songs
songs = []
album_name = tag.text if tag.text == 'other songs:' else tag.b.text
continue
if tag.get('id'):
continue
songs.append(tag.text)
Final output:
{
'artist': 'Oasis',
'albums': {
'"Definitely Maybe"': ["Rock 'n' Roll Star", 'Shakermaker', 'Live Forever', 'Up In The Sky', 'Columbia', 'Supersonic', 'Bring It On Down', 'Cigarettes & Alcohol', "Digsy's Diner", 'Slide Away', 'Married With Children', 'Sad Song'],
'"(What\'s The Story) Morning Glory"': ['Hello', 'Roll With It', 'Wonderwall', "Don't Look Back In Anger", 'Hey Now', 'Some Might Say', 'Cast No Shadow', "She's Electric", 'Morning Glory', 'Champagne Supernova', "Bonehead's Bank Holiday"],
'"Be Here Now"': ["D'You Know What I Mean?", 'My Big Mouth', 'Magic Pie', 'Stand By Me', 'I Hope, I Think, I Know', 'The Girl In The Dirty Shirt', 'Fade In-Out', "Don't Go Away", 'Be Here Now', 'All Around The World', "It's Getting Better (Man!!)"],
'"The Masterplan"': ['Acquiesce', 'Underneath The Sky', 'Talk Tonight', 'Going Nowhere', 'Fade Away', 'I Am The Walrus (Live)', 'Listen Up', "Rockin' Chair", 'Half The World Away', "(It's Good) To Be Free", 'Stay Young', 'Headshrinker', 'The Masterplan'],
'"Standing On The Shoulder Of Giants"': ["Fuckin' In The Bushes", 'Go Let It Out', 'Who Feels Love?', 'Put Yer Money Where Yer Mouth Is', 'Little James', 'Gas Panic!', 'Where Did It All Go Wrong?', 'Sunday Morning Call', 'I Can See A Liar', 'Roll It Over'],
'"Heathen Chemistry"': ['The Hindu Times', 'Force Of Nature', 'Hung In A Bad Place', 'Stop Crying Your Heart Out', 'Song Bird', 'Little By Little', '(Probably) All In The Mind', 'She Is Love', 'Born On A Different Cloud', 'Better Man'],
'"Don\'t Believe The Truth"': ['Turn Up The Sun', 'Mucky Fingers', 'Lyla', 'Love Like A Bomb', 'The Importance Of Being Idle', 'The Meaning Of Soul', "Guess God Thinks I'm Abel", 'Part Of The Queue', 'Keep The Dream Alive', 'A Bell Will Ring', 'Let There Be Love'],
'"Dig Out Your Soul"': ['Bag It Up', 'The Turning', 'Waiting For The Rapture', 'The Shock Of The Lightning', "I'm Outta Time", '(Get Off Your) High Horse Lady', 'Falling Down', "To Be Where There's Life", "Ain't Got Nothin'", 'The Nature Of Reality', 'Soldier On', 'I Believe In All'],
'other songs:': ["(As Long As They've Got) Cigarettes In Hell", '(I Got) The Fever', 'Alice', 'Alive', 'Angel Child', 'Boy With The Blues', 'Carry Us All', 'Cloudburst', 'Cum On Feel The Noize', "D'Yer Wanna Be A Spaceman", 'Eyeball Tickler', 'Flashbax', 'Full On', 'Helter Skelter', 'Heroes', 'I Will Believe', "Idler's Dream", 'If We Shadows', "It's Better People", 'Just Getting Older', "Let's All Make Believe", 'My Sister Lover', 'One Way Road', 'Round Are Way', 'Step Out', 'Street Fighting Man', 'Take Me', 'Take Me Away', 'The Fame', 'Whatever', "You've Got To Hide Your Love Away"]
}
}

Related

Getting only a part of the page source using selenium webdriver

I'm trying to get the HTML of Billboard's top 100 chart, but I keep getting only about half of the page.
I tried getting the page source using this code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
url = "https://www.billboard.com/charts/hot-100/"
driver.get(url)
driver.implicitly_wait(10)
print(driver.page_source)
But it always returns the page source only from the 53rd song on the chart (I've tried increasing the implicit wait and nothing changed)

Not sure why you are getting 53 elements. Using explicit waits I am getting all 100 elements.
User webdriverwait() as an explicit wait.
driver.get('https://www.billboard.com/charts/hot-100/')
elements=WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "ul.lrv-a-unstyle-list h3#title-of-a-story")))
print(len(elements))
print([item.text for item in elements])
Need to import below libraries.
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
OutPut:
100
['Anti-Hero', 'Lavender Haze', 'Maroon', 'Snow On The Beach', 'Midnight Rain', 'Bejeweled', 'Question...?', "You're On Your Own, Kid", 'Karma', 'Vigilante Shit', 'Unholy', 'Bad Habit', 'Mastermind', 'Labyrinth', 'Sweet Nothing', 'As It Was', 'I Like You (A Happier Song)', "I Ain't Worried", 'You Proof', "Would've, Could've, Should've", 'Bigger Than The Whole Sky', 'Super Freaky Girl', 'Sunroof', "I'm Good (Blue)", 'Under The Influence', 'The Great War', 'Vegas', 'Something In The Orange', 'Wasted On You', 'Jimmy Cooks', 'Wait For U', 'Paris', 'High Infidelity', 'Tomorrow 2', 'Titi Me Pregunto', 'About Damn Time', 'The Kind Of Love We Make', 'Late Night Talking', 'Cuff It', 'She Had Me At Heads Carolina', 'Glitch', 'Me Porto Bonito', 'Die For You', 'California Breeze', 'Dear Reader', 'Forever', 'Hold Me Closer', 'Just Wanna Rock', '5 Foot 9', 'Unstoppable', 'Thank God', 'Fall In Love', 'Rock And A Hard Place', 'Golden Hour', 'Heyy', 'Half Of Me', 'Until I Found You', 'Victoria’s Secret', 'Son Of A Sinner', "Star Walkin' (League Of Legends Worlds Anthem)", 'Romantic Homicide', 'What My World Spins Around', 'Real Spill', "Don't Come Lookin'", 'Monotonia', 'Stand On It', 'Never Hating', 'Wishful Drinking', 'No Se Va', 'Free Mind', 'Not Finished', 'Music For A Sushi Restaurant', 'Pop Out', 'Staying Alive', 'Poland', 'Whiskey On You', 'Billie Eilish.', 'Betty (Get Money)', '2 Be Loved (Am I Ready)', 'Wait In The Truck', 'All Mine', 'La Bachata', 'Last Last', 'Glimpse Of Us', 'Freestyle', 'Calm Down', 'Gatubela', 'Evergreen', 'Pick Me Up', 'Gotta Move On', 'Perfect Timing', 'Country On', 'Snap', 'She Likes It', 'Made You Look', 'Dark Red', 'From Now On', 'Forget Me', 'Miss You', 'Despecha']

driver.implicitly_wait(10) is not a pause command!
It has no effect on the next line print(driver.page_source).
If you want to wait for page to be completely load you can wait for some specific element there to be visible. Use WebDriverWait expected_conditions for that. Or just add a hardcoded pause time.sleep(10) instead of driver.implicitly_wait(10)

how to stop letter repeating itself python

I am making a code which takes in jumble word and returns a unjumbled word , the data.json contains a list and here take a word one-by-one and check if it contains all the characters of the word and later checking if the length is same , but the problem is when i enter a word as helol then the l is checked twice and giving me some other outputs including the main one(hello). i know why does it happen but i cant get a fix to it
import json
val = open("data.json")
val1 = json.load(val)#loads the list
a = input("Enter a Jumbled word ")#takes a word from user
a = list(a)#changes into list to iterate
for x in val1:#iterates words from list
for somethin in a:#iterates letters from list
if somethin in list(x):#checks if the letter is in the iterated word
continue
else:
break
else:#checks if the loop ended correctly (that means word has same letters)
if len(a) != len(list(x)):#checks if it has same number of letters
continue#returns
else:
print(x)#continues the loop to see if there are more like that
EDIT: many people wanted the json file so here it is
['Torres Strait Creole', 'good bye', 'agon', "queen's guard", 'animosity', 'price list', 'subjective', 'means', 'severe', 'knockout', 'life-threatening', 'entry into the war', 'dominion', 'damnify', 'packsaddle', 'hallucinate', 'lumpy', 'inception', 'Blankenese', 'cacophonous', 'zeptomole', 'floccinaucinihilipilificate', 'abashed', 'abacterial', 'ableism', 'invade', 'cohabitant', 'handicapped', 'obelus', 'triathlon', 'habitue', 'instigate', 'Gladstone Gander', 'Linked Data', 'seeded player', 'mozzarella', 'gymnast', 'gravitational force', 'Friedelehe', 'open up', 'bundt cake', 'riffraff', 'resourceful', 'wheedle', 'city center', 'gorgonzola', 'oaf', 'auf', 'oafs', 'galoot', 'imbecile', 'lout', 'moron', 'news leak', 'crate', 'aggregator', 'cheating', 'negative growth', 'zero growth', 'defer', 'ride back', 'drive back', 'start back', 'shy back', 'spring back', 'shrink back', 'shy away', 'abderian', 'unable', 'font manager', 'font management software', 'consortium', 'gown', 'inject', 'ISO 639', 'look up', 'cross-eyed', 'squinting', 'health club', 'fitness facility', 'steer', 'sunbathe', 'combatives', 'HTH', 'hope that helps', 'How The Hell', 'distributed', 'plum cake', 'liberalization', 'macchiato', 'caffè macchiato', 'beach volley', 'exult', 'jubilate', 'beach volleyball', 'be beached', 'affogato', 'gigabyte', 'terabyte', 'petabyte', 'undressed', 'decameter', 'sensual', 'boundary marker', 'poor man', 'cohabitee', 'night sleep', 'protruding ears', 'three quarters of an hour', 'spermophilus', 'spermophilus stricto sensu', "devil's advocate", 'sacred king', 'sacral king', 'myr', 'million years', 'obtuse-angled', 'inconsolable', 'neurotic', 'humiliating', 'mortifying', 'theological', 'rematch', 'varıety', 'be short', 'ontological', 'taxonomic', 'taxonomical', 'toxicology testing', 'on the job training', 'boulder', 'unattackable', 'inviolable', 'resinous', 'resiny', 'ionizing radiation', 'citrus grove', 'comic book shop', 'preparatory measure', 'written account', 'brittle', 'locker', 'baozi', 'bao', 'bau', 'humbow', 'nunu', 'bausak', 'pow', 'pau', 'yesteryear', 'fire drill', 'rotted', 'putto', 'overthrow', 'ankle monitor', 'somewhat stupid', 'a little stupid', 'semordnilap', 'pangram', 'emordnilap', 'person with a sunlamp tan', 'tittle', 'incompatible', 'autumn wind', 'dairyman', 'chesty', 'lacustrine', 'chronophotograph', 'chronophoto', 'leg lace', 'ankle lace', 'ankle lock', 'Babelfy', 'ventricular', 'recurrent', 'long-lasting', 'long-standing', 'long standing', 'sea bass', 'reap', 'break wind', 'chase away', 'spark', 'speckle', 'take back', 'Westphalian', 'Aeolic Greek', 'startup', 'abseiling', 'impure', 'bottle cork', 'paralympic', 'work out', 'might', 'ice-cream man', 'ice cream man', 'ice cream maker', 'ice-cream maker', 'traveling', 'special delivery', 'prizefighter', 'abs', 'ab', 'churro', 'pilfer', 'dehumanize', 'fertilize', 'inseminate', 'digitalize', 'fluke', 'stroke of luck', 'decontaminate', 'abandonware', 'manzanita', 'tule', 'jackrabbit', 'system administrator', 'system admin', 'springtime lethargy', 'Palatinean', 'organized religion', 'bearing puller', 'wheel puller', 'gear puller', 'shot', 'normalize', 'palindromic', 'lancet window', 'terminological', 'back of head', 'dragon food', 'barbel', 'Central American Spanish', 'basis', 'birthmark', 'blood vessel', 'ribes', 'dog-rose', 'dreadful', 'freckle', 'free of charge', 'weather verb', 'weather sentence', 'gipsy', 'gypsy', 'glutton', 'hump', 'low voice', 'meek', 'moist', 'river mouth', 'turbid', 'multitude', 'palate', 'peak of mountain', 'poetry', 'pure', 'scanty', 'spicy', 'spicey', 'spruce', 'surface', 'infected', 'copulate', 'dilute', 'dislocate', 'grow up', 'hew', 'hinder', 'infringe', 'inhabit', 'marry off', 'offend', 'pass by', 'brother of a man', 'brother of a woman', 'sister of a man', 'sister of a woman', 'agricultural farm', 'result in', 'rebel', 'strew', 'scatter', 'sway', 'tread', 'tremble', 'hog', 'circuit breaker', 'Southern Quechua', 'safety pin', 'baby pin', 'college student', 'university student', 'pinus sibirica', 'Siberian pine', 'have lunch', 'floppy', 'slack', 'sloppy', 'wishi-washi', 'turn around', 'bogeyman', 'selfish', 'Talossan', 'biomembrane', 'biological membrane', 'self-sufficiency', 'underevaluation', 'underestimation', 'opisthenar', 'prosody', 'Kumhar Bhag Paharia', 'psychoneurotic', 'psychoneurosis', 'levant', "couldn't-care-less attitude", 'noctambule', 'acid-free paper', 'decontaminant', 'woven', 'wheaten', 'waste-ridden', 'war-ridden', 'violence-ridden', 'unwritten', 'typewritten', 'spoken', 'abiogenetically', 'rasp', 'abstractly', 'cyclically', 'acyclically', 'acyclic', 'ad hoc', 'spare tire', 'spare wheel', 'spare tyre', 'prefabricated', 'ISO 9000', 'Barquisimeto', 'Maracay', 'Ciudad Guayana', 'San Cristobal', 'Barranquilla', 'Arequipa', 'Trujillo', 'Cusco', 'Callao', 'Cochabamba', 'Goiânia', 'Campinas', 'Fortaleza', 'Florianópolis', 'Rosario', 'Mendoza', 'Bariloche', 'temporality', 'papyrus sedge', 'paper reed', 'Indian matting plant', 'Nile grass', 'softly softly', 'abductive reasoning', 'abductive inference', 'retroduction', 'Salzburgian', 'cymotrichous', 'access point', 'wireless access point', 'dynamic DNS', 'IP address', 'electrolyte', 'helical', 'hydrometer', 'intranet', 'jumper', 'MAC address', 'Media Access Control address', 'nickel–cadmium battery', 'Ni-Cd battery', 'oscillograph', 'overload', 'photovoltaic', 'photovoltaic cell', 'refractor telescope', 'autosome', 'bacterial artificial chromosome', 'plasmid', 'nucleobase', 'base pair', 'base sequence', 'chromosomal deletion', 'deletion', 'deletion mutation', 'gene deletion', 'chromosomal inversion', 'comparative genomics', 'genomics', 'cytogenetics', 'DNA replication', 'DNA repair', 'DNA sequence', 'electrophoresis', 'functional genomics', 'retroviral', 'retroviral infection', 'acceptance criteria', 'batch processing', 'business rule', 'code review', 'configuration management', 'entity–relationship model', 'lifecycle', 'object code', 'prototyping', 'pseudocode', 'referential', 'reusability', 'self-join', 'timestamp', 'accredited', 'accredited translator', 'certify', 'certified translation', 'computer-aided design', 'computer-aided', 'computer-assisted', 'management system', 'computer-aided translation', 'computer-assisted translation', 'machine-aided translation', 'conference interpreter', 'freelance translator', 'literal translation', 'mother-tongue', 'whispered interpreting', 'simultaneous interpreting', 'simultaneous interpretation', 'base anhydride', 'binary compound', 'absorber', 'absorption coefficient', 'attenuation coefficient', 'active solar heater', 'ampacity', 'amorphous semiconductor', 'amorphous silicon', 'flowerpot', 'antireflection coating', 'antireflection', 'armored cable', 'electric arc', 'breakdown voltage','casing', 'facing', 'lining', 'assumption of Mary', 'auscultation']
Just a example and the dictionary is full of items

As I understand it you are trying to identify all possible matches for the jumbled string in your list. You could sort the letters in the jumbled word and match the resulting list against sorted lists of the words in your data file.
sorted_jumbled_word = sorted(a)
for word in val1:
if len(sorted_jumbled_word) == len(word) and sorted(word) == sorted_jumbled_word:
print(word)
Checking by length first reduces unnecessary sorting. If doing this repeatedly, you might want to create a dictionary of the words in the data file with their sorted versions, to avoid having to repeatedly sort them.
There are spaces and punctuation in some of the terms in your word list. If you want to make the comparison ignoring spaces then remove them from both the jumbled word and the list of unjumbled words, using e.g. word = word.replace(" ", "")

Find the anagram pairs of from 2 lists and create a list of tuples of the anagrams

say I have two lists
list_1 = [ 'Tar', 'Arc', 'Elbow', 'State', 'Cider', 'Dusty', 'Night', 'Inch', 'Brag', 'Cat', 'Bored', 'Save', 'Angel','bla', 'Stressed', 'Dormitory', 'School master','Awesoame', 'Conversation', 'Listen', 'Astronomer', 'The eyes', 'A gentleman', 'Funeral', 'The Morse Code', 'Eleven plus two', 'Slot machines', 'Fourth of July', 'Jim Morrison', 'Damon Albarn', 'George Bush', 'Clint Eastwood', 'Ronald Reagan', 'Elvis', 'Madonna Louise Ciccone', 'Bart', 'Paris', 'San Diego', 'Denver', 'Las Vegas', 'Statue of Liberty']
and
list_B = ['Cried', 'He bugs Gore', 'They see', 'Lives', 'Joyful Fourth', 'The classroom', 'Diagnose', 'Silent', 'Taste', 'Car', 'Act', 'Nerved', 'Thing', 'A darn long era', 'Brat', 'Twelve plus one', 'Elegant man', 'Below', 'Robed', 'Study', 'Voices rant on', 'Chin', 'Here come dots', 'Real fun', 'Pairs', 'Desserts', 'Moon starer', 'Dan Abnormal', 'Old West action', 'Built to stay free', 'One cool dance musician', 'Dirty room', 'Grab', 'Salvages', 'Cash lost in me', "Mr. Mojo Risin'", 'Glean', 'Rat', 'Vase']
What I am looking for is to find the anagram pairs of list_A in list_B. Create a list of tuples of the anagrams.
For one list I can do the following and generate the list of tuples, however, for two lists I need some assistance. Thanks in advance for the help!
What I have tried for one list,
from collections import defaultdict
anagrams = defaultdict(list)
for w in list_A:
anagrams[tuple(sorted(w))].append(w)

You can use a nested for loop, outer for the first list, inner for the second (also, use str.lower to make it case-insensitive):
anagram_pairs = [] # (w_1 from list_A, w_2 from list_B)
for w_1 in list_A:
for w_2 in list_B:
if sorted(w_1.lower()) == sorted(w_2.lower()):
anagram_pairs.append((w_1, w_2))
print(anagram_pairs)
Output:
[('Tar', 'Rat'), ('Arc', 'Car'), ('Elbow', 'Below'), ('State', 'Taste'), ('Cider', 'Cried'), ('Dusty', 'Study'), ('Night', 'Thing'), ('Inch', 'Chin'), ('Brag', 'Grab'), ('Cat', 'Act'), ('Bored', 'Robed'), ('Save', 'Vase'), ('Angel', 'Glean'), ('Stressed', 'Desserts'), ('School master', 'The classroom'), ('Listen', 'Silent'), ('The eyes', 'They see'), ('A gentleman', 'Elegant man'), ('The Morse Code', 'Here come dots'), ('Eleven plus two', 'Twelve plus one'), ('Damon Albarn', 'Dan Abnormal'), ('Elvis', 'Lives'), ('Bart', 'Brat'), ('Paris', 'Pairs'), ('Denver', 'Nerved')]

You are quite close with your current attempt. All you need to do is repeat the same process on list_B:
from collections import defaultdict
anagrams = defaultdict(list)
list_A = [ 'Tar', 'Arc', 'Elbow', 'State', 'Cider', 'Dusty', 'Night', 'Inch', 'Brag', 'Cat', 'Bored', 'Save', 'Angel','bla', 'Stressed', 'Dormitory', 'School master','Awesoame', 'Conversation', 'Listen', 'Astronomer', 'The eyes', 'A gentleman', 'Funeral', 'The Morse Code', 'Eleven plus two', 'Slot machines', 'Fourth of July', 'Jim Morrison', 'Damon Albarn', 'George Bush', 'Clint Eastwood', 'Ronald Reagan', 'Elvis', 'Madonna Louise Ciccone', 'Bart', 'Paris', 'San Diego', 'Denver', 'Las Vegas', 'Statue of Liberty']
list_B = ['Cried', 'He bugs Gore', 'They see', 'Lives', 'Joyful Fourth', 'The classroom', 'Diagnose', 'Silent', 'Taste', 'Car', 'Act', 'Nerved', 'Thing', 'A darn long era', 'Brat', 'Twelve plus one', 'Elegant man', 'Below', 'Robed', 'Study', 'Voices rant on', 'Chin', 'Here come dots', 'Real fun', 'Pairs', 'Desserts', 'Moon starer', 'Dan Abnormal', 'Old West action', 'Built to stay free', 'One cool dance musician', 'Dirty room', 'Grab', 'Salvages', 'Cash lost in me', "Mr. Mojo Risin'", 'Glean', 'Rat', 'Vase']
for w in list_A:
anagrams[tuple(sorted(w))].append(w)
for w in list_B:
anagrams[tuple(sorted(w))].append(w)
result = [b for b in anagrams.values() if len(b) > 1]
Output:
[['Cider', 'Cried'], ['The eyes', 'They see'], ['Damon Albarn', 'Dan Abnormal'], ['Bart', 'Brat'], ['Paris', 'Pairs']]

Another solution using dictionary:
out = {}
for word in list_A:
out.setdefault(tuple(sorted(word.lower())), []).append(word)
for word in list_B:
word_s = tuple(sorted(word.lower()))
if word_s in out:
out[word_s].append(word)
print(list(tuple(v) for v in out.values() if len(v) > 1))
Prints:
[
("Tar", "Rat"),
("Arc", "Car"),
("Elbow", "Below"),
("State", "Taste"),
("Cider", "Cried"),
("Dusty", "Study"),
("Night", "Thing"),
("Inch", "Chin"),
("Brag", "Grab"),
("Cat", "Act"),
("Bored", "Robed"),
("Save", "Vase"),
("Angel", "Glean"),
("Stressed", "Desserts"),
("School master", "The classroom"),
("Listen", "Silent"),
("The eyes", "They see"),
("A gentleman", "Elegant man"),
("The Morse Code", "Here come dots"),
("Eleven plus two", "Twelve plus one"),
("Damon Albarn", "Dan Abnormal"),
("Elvis", "Lives"),
("Bart", "Brat"),
("Paris", "Pairs"),
("Denver", "Nerved"),
]

Sum values from specific column in DataFrame in duplicate rows

I have a DataFrame of books that I removed and reworked some information. However, there are some rows in the column "bookISBN" that have duplicate values, and I want to merge all those rows into one.
I plan to make a new DataFrame where I keep the first values for the url, the ISBN, the title and the genre, but I want to sum the values of the column "genreVotes" in order to create the merge. How can I do this?
Original dataframe:
In [23]: network = data[["bookTitle", "bookISBN", "highestVotedGenre", "genreVotes"]]
network.head().to_dict("list")
Out [23]:
{'bookTitle': ['The Hunger Games',
'Twilight',
'The Book Thief',
'Animal Farm',
'The Chronicles of Narnia'],
'bookISBN': ['9780439023481',
'9780316015844',
'9780375831003',
'9780452284241',
'9780066238500'],
'highestVotedGenre': ['Young Adult',
'Young Adult',
'Historical-Historical Fiction',
'Classics',
'Fantasy'],
'genreVotes': [103407, 80856, 59070, 73590, 26376]}
Duplicates:
In [24]: duplicates = network[network.duplicated(subset=["bookISBN"], keep=False)]
duplicates.loc[(duplicates["bookISBN"] == "9780439023481") | (duplicates["bookISBN"] == "9780375831003")]
Out [24]:
{'bookTitle': ['The Hunger Games',
'The Book Thief',
'The Hunger Games',
'The Book Thief',
'The Book Thief'],
'bookISBN': ['9780439023481',
'9780375831003',
'9780439023481',
'9780375831003',
'9780375831003'],
'highestVotedGenre': ['Young Adult',
'Historical-Historical Fiction',
'Young Adult',
'Historical-Historical Fiction',
'Historical-Historical Fiction'],
'genreVotes': [103407, 59070, 103407, 59070, 59070]}
(In this example the votes were all the same but in some cases the values are different).
Expected output:
{'bookTitle': ['The Hunger Games',
'Twilight',
'The Book Thief',
'Animal Farm',
'The Chronicles of Narnia'],
'bookISBN': ['9780439023481',
'9780316015844',
'9780375831003',
'9780452284241',
'9780066238500'],
'highestVotedGenre': ['Young Adult',
'Young Adult',
'Historical-Historical Fiction',
'Classics',
'Fantasy'],
'genreVotes': [260814, 80856, 177210, 73590, 26376]}

Recommendation System by using Euclidean Distance (TypeError: unsupported operand type(s) for -: 'str' and 'str')

I have a problem about implementing recommendation system by using Euclidean Distance.
What I want to do is to list some close games with respect to search criteria by game title and genre.
Here is my project link : Link
After calling function, it throws an error shown below.
How can I fix it?
Here is the error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-31-bda255afa9da> in <module>
1 search_game=input('Please enter The name of the wiigame :')
2 number=int(input('Please enter the number of recommendations you want: '))
----> 3 recommendationSystembyEuclideanDistance(wii_df,search_game,number)
<ipython-input-27-c8bfb3378f18> in recommendationSystembyEuclideanDistance(data, game, number)
17 count=0
18 for i in df.values:
---> 19 p.append([distance.euclidean(x,i),count])
20 count+=1
21 p.sort()
~\Anaconda3\lib\site-packages\scipy\spatial\distance.py in euclidean(u, v, w)
618
619 """
--> 620 return minkowski(u, v, p=2, w=w)
621
622
~\Anaconda3\lib\site-packages\scipy\spatial\distance.py in minkowski(u, v, p, w)
510 if p < 1:
511 raise ValueError("p must be at least 1")
--> 512 u_v = u - v
513 if w is not None:
514 w = _validate_weights(w)
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Here is my games titles
array(['007 Legends', '1001 Spikes', '140', '153 Hand Video Poker',
'3Souls', '6-Hand Video Poker', '6180 the moon', '8Bit Hero',
'99Moves', '99Seconds', 'Absolutely Unstoppable MineRun', 'Abyss',
'Ace: Alien Cleanup Elite', 'Ace of Seafood',
'Act It Out! A Game of Charades',
'Adventure Party: Cats and Caverns',
"Adventure Time: Explore the Dungeon Because I Don't Know!",
'Adventure Time: Finn & Jake Investigations', 'Adventures of Pip',
'Aenigma Os', 'Affordable Space Adventures', 'Alice in Wonderland',
'Alphadia Genesis', 'The Amazing Spider-Man: Ultimate Edition',
'The Amazing Spider-Man 2', 'Angry Birds Star Wars',
'Angry Birds Trilogy', 'Angry Bunnies: Colossal Carrot Crusade',
'Angry Video Game Nerd Adventures',
'Animal Crossing: Amiibo Festival', 'Animal Gods', 'Annihilation',
'Another World: 20th Anniversary Edition', 'Aperion Cyberstorm',
'Aqua Moto Racing Utopia', 'Aqua TV', 'Arc Style: Baseball! SP',
'Archery by Thornbury Software', 'Ark Rush', 'Armikrog', 'Armillo',
'Armored Acorns: Action Squirrel Squad', 'Arrow Time U',
'Art of Balance', 'Ascent of Kings', 'Asdivine Hearts',
"Assassin's Creed III", "Assassin's Creed IV: Black Flag",
'Asteroid Quarry', 'Astral Breakers',
'Ava and Avior Save the Earth', 'Avoider', 'Axiom Verge',
'Azure Snake', 'B3 Game Expo for Bees', 'Back to Bed',
'Badland: Game of the Year Edition', 'Baila Latino',
'Ballpoint Universe: Infinite',
'Barbie and her Sisters: Puppy Rescue', 'Barbie: Dreamhouse Party',
'Batman: Arkham City – Armored Edition', 'Batman: Arkham Origins',
'Batman: Arkham Origins Blackgate – Deluxe Edition', 'Bayonetta',
'Bayonetta 2', 'Beatbuddy: Tale of the Guardians',
"The Beggar's Ride", 'Ben 10: Omniverse', 'Ben 10: Omniverse 2',
"Bigley's Revenge", 'The Binding of Isaac: Rebirth',
'Bird Mania Party', 'Bit Dungeon Plus',
'Bit.Trip Presents... Runner2: Future Legend of Rhythm Alien',
'Blackjack 21', 'Blasting Agent: Ultimate Edition', 'Blek', 'Bloc',
'Blockara', 'Block Zombies!', 'Blocky Bot', 'Blok Drop U',
'Blok Drop X: Twisted Fusion', 'Blue-Collar Astronaut',
'Bombing Bastards', 'The Book of Unwritten Tales 2', 'Booty Diver',
'Box Up', 'Brave Tank Hero', 'Breakout Defense',
'Breakout Defense 2', 'Breezeblox', 'BrickBlast U!',
'Brick Breaker', 'Brick Race', 'The Bridge',
'Bridge Constructor Playground', 'Brunswick Pro Bowling',
'Bubble Gum Popper', 'Buddy & Me: Dream Edition', 'Buta Medal',
"Cabela's Big Game Hunter: Pro Hunts",
"Cabela's Dangerous Hunts 2013",
'Cake Ninja 3: The Legend Continues', 'Call of Duty: Black Ops II',
'Call of Duty: Ghosts', 'Call of Nightmare', 'Candy Hoarder',
'Canvaleon', 'Captain Toad: Treasure Tracker',
'Cars 3: Driven to Win', 'CastleStorm', 'The Cave', 'Chariot',
'ChariSou Ultra DX: Sekai Tour', 'Chasing Aurora', 'Chasing Dead',
"Chests o' Booty", 'Child of Light', 'Chimpuzzle Pro',
'Chompy Chomp Chomp Party',
'Christmas Adventure of Rocket Penguin', 'Chroma Blast',
'Chronicles of Teddy: Harmony of Exidus', 'Chubbins',
'Citadale: Gate of Souls', 'Citadale: The Legends Trilogy',
'Citizens of Earth', 'Cloudberry Kingdom', 'Coaster Crazy Deluxe',
'Cocoto Magic Circus 2', 'Collateral Thinking', 'Color Bombs',
'Color Cubes', 'Color Symphony 2', 'Color Zen', 'Color Zen Kids',
'Coqui The Game', 'Cosmophony', 'Costume Quest 2',
'Crab Cakes Rescue', 'The Croods: Prehistoric Party!',
'Crush Insects', 'Crystorld', 'Cube Blitz',
'Cube Life: Island Survival', 'Cube Life: Pixel Action Heroes',
'Cubemen 2', 'Cubeshift', 'Cubit The Hardcore Platformer Robot HD',
'Cup Critters', 'Cutie Clash', 'Cutie Pets Go Fishing',
'Cutie Pets Jump Rope', 'Cutie Pets Pick Berries',
'Cycle of Eternity: Space Anomaly',
'D.M.L.C. Death Match Love Comedy', 'Daikon Set',
'Dare Up Adrenaline', 'Darksiders II',
'Darksiders: Warmastered Edition', 'Darts Up',
'A Day at the Carnival', 'The Deer God', 'Defend Your Crypt',
'Defense Dome', 'Demonic Karma Summoner',
"Deus Ex: Human Revolution – Director's Cut", "Devil's Third",
'Dinox', 'Discovery', 'Disney Epic Mickey 2: The Power of Two',
'Disney Infinity', 'Disney Infinity 3.0',
'Disney Infinity: Marvel Super Heroes', 'Disney Planes',
'Disney Planes: Fire & Rescue', "Disney's DuckTales: Remastered",
'Dodge Club Party', 'DokiDoki Tegami Relay', 'Dolphin Up',
"Don't Crash", "Don't Starve: Giant Edition",
"Don't Touch Anything Red", 'Donkey Kong Country: Tropical Freeze',
'Dot Arcade', 'Double Breakout', 'Double Breakout II', 'Dr. Luigi',
"Dracula's Legacy", 'Dragon Fantasy: The Black Tome of Ice',
'Dragon Fantasy: The Volumes of Westeria',
'Dragon Quest X: 5000-nen no Harukanaru Kokyou e Online',
'Dragon Quest X: Inishie no Ryuu no Denshou Online',
'Dragon Quest X: Mezameshi Itsutsu no Shuzoku Online',
'Dragon Quest X: Nemureru Yūsha to Michibiki no Meiyū Online',
'Dragon Skills', 'Draw 2 Survive', 'Draw a Stickman: Epic 2',
"A Drawing's Journey", 'Dream Pinball 3D II', 'Dreamals',
'Dreamals: Dream Quest', 'Dreii', 'Drop It: Block Paradise!',
'Dual Core', 'Dungeons & Dragons: Chronicles of Mystara',
'Dungeon Hearts DX', 'Dying is Dangerous',
'Earthlock: Festival of Magic', 'Eba & Egg: A Hatch Trip',
'Ectoplaza', 'Edge', 'Educational Pack of Kids Games',
'Electronic Super Joy', 'Electronic Super Joy: Groove City',
'Elliot Quest', 'El Silla: Arcade Edition',
'Emojikara: A Clever Emoji Match Game', 'Endless Golf',
'Epic Dumpster Bear', 'Escape from Flare Industries',
'ESPN Sports Connection', 'Evofish', "Exile's End", 'Explody Bomb',
'Extreme Exorcism', 'F1 Race Stars: Powered Up Edition',
'Factotum', 'Fake Colors', 'The Fall', 'Falling Skies: The Game',
'Family Party: 30 Great Games Obstacle Arcade', 'Family Tennis SP',
'Fast & Furious: Showdown', 'Fast Racing Neo', 'Fat City',
'Fat Dragons', 'Fatal Frame: Maiden of Black Water', 'FIFA 13',
'Fifteen', 'Finding Teddy II', "Fire: Ungh's Quest",
"Fist of the North Star: Ken's Rage 2", 'Fit Music for Wii U',
'Flapp & Zegeta', 'Flight of Light',
"Flowerworks HD: Follie's Adventure", 'Forced', 'Forest Escape',
'Forma.8', 'Frag doch mal...die Maus!',
'Frankenstein: Master of Death', 'Frederic: Resurrection of Music',
'Free Balling', 'Freedom Planet', 'FreezeME', 'Frenchy Bird',
'Fujiko F. Fujio Characters: Daishuugou! SF Dotabata Party!',
'FullBlast', 'Funk of Titans', "Funky Barn: It's Farming",
'Funky Physics', 'Futuridium EP Deluxe', 'Gaiabreaker',
'Galaxy Blaster', 'Game & Wario', 'Game Party Champions',
'Games for Toddlers', 'Gear Gauntlet', 'The Gem Collector',
'Gemology', 'Geom', 'GetClose: A game for Rivals',
'Ghost Blade HD', 'Giana Sisters: Twisted Dreams',
'Giana Sisters: Twisted Dreams – Owltimate Edition',
'Ginsei Shogi: Kyoutendotou Fuuraijin', 'The Girl and the Robot',
'Girls Like Robots',
'Gotouchi Tetsudou: Gotouchi Kyara to Nihon Zenkoku no Tabi',
'GravBlocks+', 'Gravity+', 'Gravity Badgers', 'The Great Race',
'Grumpy Reaper', "Guac' a Mole",
'Guacamelee!: Super Turbo Championship Edition',
'Guitar Hero Live', 'Gunman Clive HD Collection',
'Hello Kitty Kruisers', 'Heptrix', 'High Strangeness', 'Hive Jump',
'Hold Your Fire: A Game About Responsibility', 'Horror Stories',
'Hot Rod Racer', "Hot Wheels World's Best Driver",
'How to Survive', 'How to Train Your Dragon 2',
'Human Resource Machine', 'Humanitarian Helicopter',
"Hunter's Trophy 2 Europa", 'HurryUp! Bird Hunter',
'Hyrule Warriors', 'I C Redd', "I've Got to Run!",
'Ice Cream Surfer', 'Infinity Runner', 'Injustice: Gods Among Us',
'Insect Planet TD', 'Inside My Radio', 'Internal Invasion',
'Invanoid', 'IQ Test', 'Island Flight Simulator', 'Ittle Dew',
'Jackpot 777', 'Jeopardy!', 'Jett Tailfin', 'Jewel Quest',
'Jikan Satansa', 'Job the Leprechaun', "Joe's Diner",
'Jolt Family Robot Racer', 'Jones on Fire',
'Jotun: Valhalla Edition', 'Journey of a Special Average Balloon',
'Just Dance 4', 'Just Dance 2014', 'Just Dance 2015',
'Just Dance 2016', 'Just Dance 2017', 'Just Dance 2018',
'Just Dance 2019', 'Just Dance: Disney Party 2',
'Just Dance Kids 2014', 'Just Dance Wii U',
'Kamen Rider: Battride War II', 'Kamen Rider: SummonRide',
'Kemono Dash', 'Kick & Fennick', 'KickBeat: Special Edition',
'Kirby and the Rainbow Curse', 'Koi DX', 'Knytt Underground',
'Kung Fu Fight!', 'Kung Fu Panda: Showdown of Legendary Legends',
'Kung Fu Rabbit', 'Land It Rocket', 'Laser Blaster',
'Last Soldier', 'Legend of Kay Anniversary',
'The Legend of Zelda: Breath of the Wild',
'The Legend of Zelda: Twilight Princess HD',
'The Legend of Zelda: The Wind Waker HD',
'Lego Batman 2: DC Super Heroes', 'Lego Batman 3: Beyond Gotham',
'Lego City Undercover', 'Lego Dimensions', 'Lego Jurassic World',
'Lego Marvel Super Heroes', "Lego Marvel's Avengers",
'The Lego Movie Videogame', 'Lego Star Wars: The Force Awakens',
'Lego The Hobbit', 'The Letter', 'Letter Quest Remastered',
"Level 22, Gary's Misadventures", 'Life of Pixel',
'Little Inferno', "Lone Survivor: The Director's Cut",
'Lost Reavers', 'Lovely Planet', 'Lucadian Chronicles',
'Lucentek Beyond', 'Lucentek: Activate',
'Luv Me Buddies Wonderland', 'Madden NFL 13', 'Mahjong',
'Mahjong Deluxe 3', 'Manabi Getto!',
'Mario & Sonic at the Rio 2016 Olympic Games',
'Mario & Sonic at the Sochi 2014 Olympic Winter Games',
'Mario Kart 8', 'Mario Party 10', 'Mario Tennis: Ultra Smash',
'Mario vs. Donkey Kong: Tipping Stars',
'Marvel Avengers: Battle for Earth',
'Mass Effect 3: Special Edition', 'Masked Forces', 'Master Reboot',
'Maze', 'Maze Break', 'Mega Maze', 'Meine Ersten Mitsing-Lieder',
'Meme Run', 'Michiko Jump!', 'Midnight', 'Midnight 2',
'Midtown Crazy Race', 'Mighty No. 9',
'Mighty Switch Force!: Hyper Drive Edition',
'Mighty Switch Force! 2', 'Miko Mole', 'MikroGame: Rotator',
'Minecraft: Story Mode - The Complete Adventure',
'Minecraft: Wii U Edition',
'Mini Mario & Friends: Amiibo Challenge',
'Mini-Games Madness Volume #1: Hello World!',
'Minna de Uchū Tour: ChariSou DX 2',
'The Misshitsukara no Dasshutsu: Subete no Hajimari 16 no Nazo',
'The Misshitsukara no Dasshutsu 2: Kesareta 19 no Kioku',
'Molly Maggot', 'Momonga Pinball Adventures',
'Mon Premier Karaoké', 'Monkey Pirates', 'Monster High: 13 Wishes',
'Monster High: New Ghoul in School', 'Monster Hunter 3 Ultimate',
'Monster Hunter: Frontier G', 'Monster Hunter: Frontier G Genuine',
'Monster Hunter: Frontier G5', 'Monster Hunter: Frontier G6',
'Monster Hunter: Frontier G7', 'Monster Hunter: Frontier G8',
'Monster Hunter: Frontier G9', 'Monster Hunter: Frontier G10',
'Monster Hunter: Frontier Z', 'Mop: Operation Cleanup',
'Mortar Melon', 'Mountain Peak Battle Mess',
'Mr. Pumpkin Adventure', 'Mutant Alien Moles of the Dead',
'Mutant Mudds Deluxe', 'Mutant Mudds Super Challenge',
'My Arctic Farm', 'My Exotic Farm', 'My Farm', 'My First Songs',
'My Jurassic Farm', 'My Style Studio: Hair Salon',
'The Mysterious Cities of Gold: Secret Paths', 'Nano Assault Neo',
'NBA 2K13', 'Near Earth Objects', 'Need for Speed: Most Wanted U',
'Neon Battle', 'NES Remix', 'NES Remix 2', 'Never Alone',
'New Super Luigi U', 'New Super Mario Bros. U', 'Nihilumbra',
"Ninja Gaiden 3: Razor's Edge", 'Ninja Pizza Girl',
'Ninja Strike: Dangerous Dash',
'Nintendo Game Seminar 2013 Jukousei Sakuhin', 'Nintendo Land',
'Noitu Love: Devolution', 'Nova-111', 'Now I know my ABCs',
'Octocopter: Super Sub Squid Escape', 'Octodad: Dadliest Catch',
"Oddworld: Abe's Oddysee – New 'n' Tasty!",
"Ohayou! Beginner's Japanese", 'OlliOlli', 'Olympia Rising',
'One Piece: Unlimited World Red', 'Orbit', 'Othello',
'Outside The Realm', 'Overworld Defender Remix',
'Pac-Man and the Ghostly Adventures',
'Pac-Man and the Ghostly Adventures 2', 'Panda Love', 'Paparazzi',
'Paper Mario: Color Splash', 'Paper Monsters Recut',
'Paranautical Activity',
"The Peanuts Movie: Snoopy's Grand Adventure", 'Peg Solitaire',
'Penguins of Madagascar', 'Pentapuzzle', "Percy's Predicament",
'Perpetual Blast', 'The Perplexing Orb', 'Petite Zombies',
'Phineas and Ferb: Quest for Cool Stuff', 'Piano Teacher',
'Pic-a-Pix Color', 'PictoParty',
'Pier Solar and the Great Architects', 'Pikmin 3', 'Pinball',
'The Pinball Arcade', 'Pinball Breakout', 'Ping 1.5+',
'Pirate Pop Plus', 'Pixel Slime U', 'PixelJunk Monsters',
'PixlCross', 'Placards', 'Plantera', 'Plenty of Fishies',
'Pokémon Rumble U', 'Poker Dice Solitaire Future',
'Pokkén Tournament', 'Poncho',
'Preston Sterling and the Legend of Excalibur', 'Prism Pets',
'Psibo', 'Psyscrolr', 'Puddle', 'Pumped BMX', 'Pure Chess',
'Pushmo World', 'Puyo Puyo Tetris', 'Puzzle Monkeys',
'Quadcopter Pilot Challenge', "Q.U.B.E.: Director's Cut",
'Queens Garden', 'Quest of Dungeons', 'The Quiet Collection',
'Rabbids Land', 'Race the Sun', 'Radiantflux: Hyperfractal',
'Rainbow Snake', 'Rakoo & Friends', 'Rapala Pro Bass Fishing',
'Rayman Legends', 'Rayman Legends Challenges', 'Red Riding Hood',
'Regina & Mac', 'Replay: VHS is Not Dead', 'Reptilian Rebellion',
'Resident Evil: Revelations', 'Retro Road Rumble', 'Revenant Saga',
'Rise of the Guardians: The Video Game',
'The Rivers of Alice: Extended Version',
"Rock 'N Racing Grand Prix", "Rock 'N Racing Off Road",
"Rock 'N Racing Off Road DX", 'Rock Zombie',
'Rodea the Sky Soldier', 'Romance of the Three Kingdoms 12',
'Rorrim', 'Roving Rogue', 'RTO', 'RTO 2', "Rubik's Cube",
'Run Run and Die', 'Runbow', 'Rush',
"Rynn's Adventure: Trouble in the Enchanted Forest",
'Ryū ga Gotoku 1 & 2: HD Edition', 'Sanatory Hallways',
'Santa Factory', 'Schlag den Star: Das Spiel',
'Scoop! Around the World in 80 Spaces',
'Scram Kitty and His Buddy on Rails', 'Scribble',
'Scribblenauts Unlimited',
'Scribblenauts Unmasked: A DC Comics Adventure',
'Secret Files: Tunguska', 'Severed', 'Shadow Archer',
'Shadow Archery', 'Shadow Puppeteer', 'Shakedown: Hawaii',
"Shantae and the Pirate's Curse", 'Shantae: Half-Genie Hero',
"Shantae: Risky's Revenge - Director's Cut", 'Shapes of Gray',
'Shiftlings', 'Shiny the Firefly', 'SHMUP Collection',
'Shooting Range by Thornbury Software', 'Shoot the Ball',
'Shooty Space', 'Shovel Knight', 'Shut the Box', 'Shütshimi',
'Shuttle Rush', 'Sing Party', 'Sinister Assistant',
'Six Sides of the World', 'Skeasy', 'Sketch Wars', 'Skorb',
"Skunky B's Super Slots Saga #1", 'Sky Force Anniversary',
'Skylanders: Giants', 'Skylanders: Imaginators',
"Skylanders: Spyro's Adventure", 'Skylanders: SuperChargers',
'Skylanders: Swap Force', 'Skylanders: Trap Team',
'Slender: The Arrival', "Slots: Pharaoh's Riches",
'Smart Adventures Mission Math: Sabotage at the Space Station',
'The Smurfs 2', 'Snake Den', 'Sniper Elite V2', 'Snowball',
'Solitaire', 'Solitaire Dungeon Escape',
'Sonic & All-Stars Racing Transformed',
'Sonic Boom: Rise of Lyric', 'Sonic Lost World', 'Soon Shine',
'Soul Axiom', 'Space Hulk', 'Space Hunted',
'Space Hunted: The Lost Levels', 'Space Intervention',
'SpaceRoads', "Spellcaster's Assistant", 'Sphere Slice',
'SphereZor', 'Spheroids', 'Spikey Walls',
"Spin the Bottle: Bumpie's Party", 'Splashy Duck', 'Splatoon',
"SpongeBob SquarePants: Plankton's Robotic Revenge", 'Sportsball',
'Spot the Differences! Party', 'Spy Chameleon', 'Squids Odyssey',
'Star Fox Guard', 'Star Fox Zero', 'Star Ghost', 'Star Sky',
'Star Sky 2', 'Star Splash: Shattered Star', 'Star Wars Pinball',
'Starwhal', 'Stealth Inc. 2: A Game of Clones', 'SteamWorld Dig',
'SteamWorld Heist', 'Steel Lords', 'Steel Rivals',
'Stick It to the Man!', 'Stone Shire', 'The Stonecutter',
'Sudoku and Permudoku', 'Sudoku Party', 'Super Destronaut',
'Super Destronaut 2: Go Duck Yourself', 'Super Hero Math',
'Super Mario 3D World', 'Super Mario Maker', 'Super Meat Boy',
'Super Robo Mouse', 'Super Smash Bros. for Wii U',
'Super Toy Cars', 'Super Ultra Star Shooter',
"Surfin' Sam: Attack of the Aqualites",
'Suspension Railroad Simulator', 'Swap Blocks', 'Swap Fire',
'The Swapper', 'Sweetest Thing', 'The Swindle',
'Swords & Soldiers', 'Swords & Soldiers II', 'Tabletop Gallery',
'Tachyon Project', 'Tadpole Treble',
'Taiko no Tatsujin: Atsumete ☆ Tomodachi Daisakusen!',
'Taiko no Tatsujin: Tokumori!', 'Taiko no Tatsujin: Wii U Version',
'Tallowmere', 'Tank SP', 'Tank! Tank! Tank!', 'Tap Tap Arcade',
'Tap Tap Arcade 2', 'Tekken Tag Tournament 2: Wii U Edition',
'Temple of Yog', 'Tengami', 'Terraria', 'Teslagrad', 'Teslapunk',
'Test Your Mind', 'Tested with Robots!', 'Tetrobot and Co.',
'Tetraminos', 'The First Skunk Bundle', 'Thomas Was Alone',
'Tilelicious: Delicious Tiles', 'Tiny Galaxy!', 'Tiny Thief',
'Titans Tower', 'TNT Racers: Nitro Machines Edition',
'Toby: The Secret Mine', 'Togabito no Senritsu', 'Toki Tori',
'Toki Tori 2+', 'Tomeling in Trouble', 'Tokyo Mirage Sessions ♯FE',
"Tom Clancy's Splinter Cell: Blacklist", 'Toon Tanks', 'Toon War',
'TorqueL', 'Toss n Go', 'Totem Topple', 'Toto Temple Deluxe',
'Touch Battle Tank SP', 'Touch Selections',
'Transformers Prime: The Game',
'Transformers: Rise of the Dark Spark', 'Trine: Enchanted Edition',
"Trine 2: Director's Cut", 'Tri-Strip', 'Triple Breakout',
'Tumblestone', 'Turbo: Super Stunt Squad', 'Turtle Tale',
'Twin Robots', 'Twisted Fusion', 'Typoman', 'U Host', 'Ultratron',
'Unalive', 'Underground', 'Unepic', 'Use Your Words', 'uWordsmith',
'Vaccine', 'Vector Assault', 'Vektor Wars',
'The Voice: I Want You', 'Volcanic Field 2', 'Volgarr the Viking',
'VRog', 'The Walking Dead: Survival Instinct', 'Wall Ball',
'Warriors Orochi 3 Hyper', 'Watch Dogs', 'Wheel of Fortune',
'Whispering Willows', 'Wicked Monsters Blast! HD Plus',
'Wii Fit U', 'Wii Karaoke U', 'Wii Party U', 'Wii Sports Club',
'Wind-up Knight 2', 'Wings of Magloryx', 'WinKings', 'Wipeout 3',
'Wipeout: Create & Crash', 'Woah Dave!', 'The Wonderful 101',
"Wooden Sen'SeY", 'Word Party', 'Word Logic by Powgi',
'Word Puzzles by Powgi', 'Word Search by Powgi',
'WordsUp! Academy', 'A World of Keflings', 'Xavier',
'Xenoblade Chronicles X', 'Xeodrifter', 'XType Plus', 'Y.A.S.G',
'Yakuman Hō-ō', 'Year Walk',
'Yo-kai Watch Dance: Just Dance Special Version',
"Yoshi's Woolly World", 'Your Shape: Fitness Evolved 2013',
"ZaciSa's Last Stand", 'Zen Pinball 2', 'Ziggurat', 'Zombeer',
'Zombie Brigade: No Brain No Gain', 'Zombie Defense', 'ZombiU',
'Zumba Fitness: World Party'], dtype=object)
Here is finding word function
def find_word(word,words):
t=[]
count=0
if word[-1]==' ':
word=word[:-1]
for i in words:
if word.lower() in i.lower():
t.append([len(word)/len(i),count])
else:
t.append([0,count])
count+=1
t.sort(reverse=True)
return words[t[0][1]]
Here is my recommendation system function.
def recommendationSystembyEuclideanDistance(data,game,number):
df=pd.DataFrame()
data.drop_duplicates(inplace=True)
games=data['Title'].values
best=find_word(game,games)
print('The game closest to your search is :',best)
genre=data[data['Title']==best]['Genre(s)'].values[0]
df=data[data['Genre(s)']==genre]
x=df[df['Title']==best].drop(columns=['Genre(s)','Title']).values
if len(x)>1:
x=x[1]
games_names=df['Title'].values
df.drop(columns=['Genre(s)','Title'],inplace=True)
df=df.fillna(df.mean())
p=[]
count=0
for i in df.values:
p.append([distance.euclidean(x,i),count])
count+=1
p.sort()
for i in range(1,number+1):
print(games_names[p[i][1]])
Here is the main part
search_game=input('Please enter The name of the wiigame :')
number=int(input('Please enter the number of recommendations you want: '))
recommendationSystembyEuclideanDistance(wii_df,search_game,number)

The issue is that you are using euclidean distance for comparing strings. Consider using Levenshtein distance, or something similar, which is designed for strings. NLTK has a function called edit distance that can do this or you can implement it on your own.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scraping data with a lack of classes / ids on elements - python

Related

Getting only a part of the page source using selenium webdriver

how to stop letter repeating itself python

Find the anagram pairs of from 2 lists and create a list of tuples of the anagrams

Sum values from specific column in DataFrame in duplicate rows

Recommendation System by using Euclidean Distance (TypeError: unsupported operand type(s) for -: 'str' and 'str')

Categories

Resources