reads a textile using list as data class objects - python

I'm trying to solve a program called that reads a file of movie characters and creates a list of the movie characters (as data class objects). However I'm having some problem with it
so far I've come up with this
import os
from dataclasses import dataclass
#dataclass
class objects():
lst = []
fname: str = 'starwars.txt'
lst = []
char = objects()
with open(char.fname, "r") as path: # Reads the open file
for line in path:
x = line[:-1]
lst.append(x)
print(lst)
but I know I'm doing something wrong because I'm getting this output:
['Qui-Gon Jinn, Human, Coruscant', 'Han Solo, Human, Corellia', 'Leia Organa, Human, Alderaan', 'Luke Skywalker, Human, Tatooine', 'Chewbacca, Wookiee, Kashyyyk', 'Cassian Andor, Human, Kenari', 'Jar Jar Binks, Gungan, Naboo', 'Ahsoka Tano, Togruta, Shili', 'Plo Koon, Kel Dor, Dorin', 'Din Djarin, Human, Aq Vetina', 'Cad Bane, Duro, Duros', 'Max Rebo, Ortolan, Orto', 'Boba Fett, Human, Kamino', 'Jabba the Hutt, Hutt, Nal Hutta', 'Rey Skywalker, Human, Jakku']
When I'm supposed to get this output
Qui-Gon Jinn Human Coruscant
Han Solo Human Corellia
Leia Organa Human Alderaan
Luke Skywalker Human Tatooine
Chewbacca Wookiee Kashyyyk
Cassian Andor Human Kenari
Jar Jar Binks Gungan Naboo
Ahsoka Tano Togruta Shili
Plo Koon Kel Dor Dorin
Din Djarin Human Aq Vetina
Cad Bane Duro Duros
Max Rebo Ortolan Orto
Boba Fett Human Kamino
Jabba the Hutt Hutt Nal Hutta
Rey Skywalker Human Jakku
I don't really know what I'm doing wrong or if I'm even reading a file and creating list of the movie characters as data class objects.I would really appreciate the help

this works if your txt file is delimited by \n:
import os
from dataclasses import dataclass
#dataclass
class objects():
lst = []
fname: str = 'starwars.txt'
lst = []
char = objects()
with open(char.fname, "r") as file: # Reads the open file
for line in file:
tmp = line.strip().split('\n')[0].split(',')
lst.append(tmp)
for line in lst:
print(f'{line[0]}\t\t{line[1]}\t\t{line[2]}')
import pandas as pd
pd.DataFrame(lst)
# print list output
Qui-Gon Jinn Human Coruscant
Han Solo Human Corellia
Leia Organa Human Alderaan
Luke Skywalker Human Tatooine
Chewbacca Wookiee Kashyyyk
Cassian Andor Human Kenari
Jar Jar Binks Gungan Naboo
Ahsoka Tano Togruta Shili
Plo Koon Kel Dor Dorin
Din Djarin Human Aq Vetina
Cad Bane Duro Duros
Max Rebo Ortolan Orto
Boba Fett Human Kamino
Jabba the Hutt Hutt Nal Hutta
Rey Skywalker Human Jakku
# dataframe output
0 1 2
0 Qui-Gon Jinn Human Coruscant
1 Han Solo Human Corellia
2 Leia Organa Human Alderaan
3 Luke Skywalker Human Tatooine
4 Chewbacca Wookiee Kashyyyk
5 Cassian Andor Human Kenari
6 Jar Jar Binks Gungan Naboo
7 Ahsoka Tano Togruta Shili
8 Plo Koon Kel Dor Dorin
9 Din Djarin Human Aq Vetina
10 Cad Bane Duro Duros
11 Max Rebo Ortolan Orto
12 Boba Fett Human Kamino
13 Jabba the Hutt Hutt Nal Hutta
14 Rey Skywalker Human Jakku
pandas is a simple and intuitive tool for handling tables
if your txt file isn't dilimited by the \n then I would post a data sample to better represent a MRE
if you prefer to do it in an OO fashion...
#dataclass
class character():
name: str
species: str
origin: str
lst = []
char = objects()
with open(char.fname, "r") as file: # Reads the open file
for line in file:
tmp = line.strip().split('\n')[0].split(',')
individual = character(name = tmp[0], species = tmp[1], origin = tmp[2])
lst.append(individual)
output
[character(name='Qui-Gon Jinn', species=' Human', origin=' Coruscant'),
character(name='Han Solo', species=' Human', origin=' Corellia'),
character(name='Leia Organa', species=' Human', origin=' Alderaan'),
character(name='Luke Skywalker', species=' Human', origin=' Tatooine'),
character(name='Chewbacca', species=' Wookiee', origin=' Kashyyyk'),
character(name='Cassian Andor', species=' Human', origin=' Kenari'),
character(name='Jar Jar Binks', species=' Gungan', origin=' Naboo'),
character(name='Ahsoka Tano', species=' Togruta', origin=' Shili'),
character(name='Plo Koon', species=' Kel Dor', origin=' Dorin'),
character(name='Din Djarin', species=' Human', origin=' Aq Vetina'),
character(name='Cad Bane', species=' Duro', origin=' Duros'),
character(name='Max Rebo', species=' Ortolan', origin=' Orto'),
character(name='Boba Fett', species=' Human', origin=' Kamino'),
character(name='Jabba the Hutt', species=' Hutt', origin=' Nal Hutta'),
character(name='Rey Skywalker', species=' Human', origin=' Jakku')]

Related

In nltk wordnet, wn.synsets.definition(lang="lang") show enlish and japanese, but not other languages

wn.synsets.definition(lang="lang") show english and japanese result, but not other languages.
wn.synset('word').lemma_names shows the other languages too, though.
Do I need extra download? , there is the difference between languages?
the documents says that it do lazy download. so I tried a few times, but result didn't change.
I played around a bit and the first thing I found out is that definitions are available for more languages than just English and Japanese. See the following table for definitions of a few words including your example word for all the languages available from wn.langs() after downloading nltk omw-1.4. 'dog' has definitions in 7 languages, 'house' in 9, and 'person' in 11.
Regarding the missing definitions for certain languages, I think the data just isn't present in the corresponding wordnets. The NLTK wordnet documentation states:
This module also allows you to find lemmas in languages other than English from the Open Multilingual Wordnet (https://omwn.org/)
If you go to https://omwn.org/ and follow the links for the respective wordnets, you'll find for example this page where you can search for words in a few languages. Searching 'casa' in Spanish, you'll find the definition reverts to the English definition for 'house', but for Italian there is a definition in Italian - which is consistent with the table below.
Hope this helps!
lang
dog
house
person
eng
a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds
a dwelling that serves as living quarters for one or more families
a human being
als
Ndërtesë për të banuar (zakonisht për një familje a për familje të një gjaku); banesë; apartament ku banon një familje.
të qënurit njeri
arb
bul
Вид домашно животно от семейство хищни бозайници, с различна големина, цвят на козината и различни породи, което лае и често се използва като пазач на дома и имота, за лов, може да бъде дресирано и обучавано за различни служебни цели.
Сграда,помещение за постоянно живеене на отделно семейство или човек.
Отделен човек, който със своите неповторими качества се отличава, различава от другите хора.
cmn
dan
ell
σκύλος του γένους Canis familiaris που συνήθως προέρχεται από τον κοινό λύκο και έχει εξημερωθεί από τους προϊστορικούς χρόνους
το τμήμα οικήματος (λ .χ. το διαμέρισμα πολυκατοικίας) στο οποίο διαμένει κανείς
το έμβιο ον, κάθε άτομο, άνθρωπος ανεξαρτήτως φύλου και ηλικίας
fin
fra
heb
מבנה המשמש כמקום מגורים למשפחה אחת או יותר
מישהו דופק בדלת
hrv
isl
ita
mammifero domestico dei canidi, molto comune, diffuso in tutto il mondo, con attitudini varie a seconda della razza
edificio destinato ad abitazione
entità umana considerata in quanto tale, senza caratterizzazioni di sesso, età, provenienza, ecc.
ita_iwn
animale domestico molto comune, diffuso in tutto il mondo, usato per la caccia, la difesa, nella pastorizia, e come animale da compagnia
essere distinto da ogni altro della medesima specie
jpn
有史以前から人間に家畜化されて来た(おそらく普通のオオカミを先祖とする)イヌ属の動物
1家族以上のための居住棟として機能する住居
一人の人間
cat
eus
glg
spa
ind
seseorang yang dipandang tinggi
zsm
nld
nno
nob
pol
por
ron
Animal mamifer carnivor domesticit, folosit pentru pază, vânătoare etc..
construcție destinată pentru a servi de locuință uneia sau mai multor familii
Individ al speciei umane, om considerat prin totalitatea însușirilor sale fizice și psihice
lit
slk
slv
swe
tha
total
7
9
11
Code used to generate the above table (in Google Colab):
import nltk
from nltk.corpus import wordnet as wn
nltk.download('wordnet')
nltk.download('omw-1.4')
import pandas as pd
defs = pd.DataFrame()
for lang in wn.langs():
for word in ['dog', 'house', 'person']:
this_word = {}
def_ = wn.synsets(word)[0].definition(lang=lang)
defs.at[lang, word] = def_[0] if isinstance(def_, list) else def_
defs[word] = defs[word].astype('object')
for word in defs.columns:
defs_present = len([def_ for def_ in defs[word].to_list() if def_ != None])
defs.at['total', word] = defs_present
defs

How to filter string using regex?

I have a list of strings which I have to filter in python.
list=["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
"चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
"पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
"Gurav, Pune, Maharashtra,",
"411027",
"www"]
I want desire output
list=["Address: sr no94/1B/1/2/3",
"ashatvinayak chal, jay bhavani",
"411027 nagar, near bus stop, Pimple",
"Gurav, Pune, Maharashtra,"
"411027",
"www"]
My code
regex = re.compile("[^a-zA-Z0-9!##$&()\\-`.+,/\"]+")
for i in list:
print(" ".join(regex.sub(' ', i).split()))
My output
Himanshu Address sr no94/1B/1/2/3
, foo boo, , ashatvinayak chal, jay bhavani
, , , 411027 nagar, near bus stop, Pimple
Gurav, Pune, Maharashtra,
411027
www
I want to remove Himansu if it comes between Non English character (eg: पत्ता स नं Himanshu अष्टविनायक).
Try with this code:
import re
list = ["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
"चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
"पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
"पिं Gurav, Pune, Maharashtra,",
"411027",
"www"]
list2 = []
pattern = "[^a-zA-Z0-9!#\s:#$&()\\-`.+,/\"]+[, ]*(?!.*[^a-zA-Z0-9!#\s:#$&()\\-`.+,/\"]+[, ]*)"
for i in list:
st = re.findall(pattern,i)
if st:
list2.append(i[i.index(st[0])+len(st[0]):])
else:
list2.append(i)
print(list2)
output :
['Address: sr no94/1B/1/2/3', 'ashatvinayak chal, jay bhavani', '411027 nagar, near bus stop, Pimple', 'Gurav, Pune, Maharashtra,', '411027', 'www']

I cant print horizontally

I want to make a book catalog, the output is to print horizontally and change line for every 3 books. I understand that we can do a print horizontal by using:
end = ""
BUT that only works for 1 line. As my output has 3 line like Title, ISBN, Price, if I using end = "", it can't get it done.
Below is my code
line_format = "{:50s} \n{:6s} - {:13s} \n{:11s}"
books = db.get_books(kel)
for book in books:
print((line_format.format(str(book.title),
str(book.isbn),
"Rp. {:,}".format(book.price).replace(",","."))))
What I got is:
Deaver - Never Game A/UK
9780008303778
Rp. 161.000
Poirot - DEATH ON THE NILE (Exp]
9780008328948
Rp. 28.000
Alchemist - 25th Anniv ed
9780062355300
Rp. 160.000
Finn- Woman in the Window [MTI]
9780062906137
Rp. 162.000
Mahurin- Blood & Honey
9780063041172
Rp. 62.000
What I want for the output is:
Deaver - Never Game DEATH ON THE NILE (Exp] Alchemist
9780008303778 9780008328948 9780062355300
Rp. 161.000 Rp. 28.000 Rp. 160.000
Woman in the Window Blood & Honey
9780062906137 9780063041172
Rp. 162.000 Rp. 62.000

Convert in utf16

I am crawling several websites and extract the names of the products. In some names there are errors like this:
Malecon 12 Jahre 0,05 ltr.<br>Reserva Superior
Bols Watermelon Lik\u00f6r 0,7l
Hayman\u00b4s Sloe Gin
Ron Zacapa Edici\u00f3n Negra
Havana Club A\u00f1ejo Especial
Caol Ila 13 Jahre (G&M Discovery)
How can I fix that?
I am using xpath and re.search to get the names.
In every Python file, this is the first code: # -*- coding: utf-8 -*-
Edit:
This is the sourcecode, how I get the information.
if '"articleName":' in details:
closer_to_product = details.split('"articleName":', 1)[1]
closer_to_product_2 = closer_to_product.split('"imageTitle', 1)[0]
if debug_product == 1:
print('product before try:' + repr(closer_to_product_2))
try:
found_product = re.search(f'{'"'}(.*?)'f'{'",'}'closer_to_product_2).group(1)
except AttributeError:
found_product = ''
if debug_product == 1:
print('cleared product: ', '>>>' + repr(found_product) + '<<<')
if not found_product:
print(product_detail_page, found_product)
items['products'] = 'default'
else:
items['products'] = found_product
Details
product_details = information.xpath('/*').extract()
product_details = [details.strip() for details in product_details]
Where is a problem (Python 3.8.3)?
import html
strings = [
'Bols Watermelon Lik\u00f6r 0,7l',
'Hayman\u00b4s Sloe Gin',
'Ron Zacapa Edici\u00f3n Negra',
'Havana Club A\u00f1ejo Especial',
'Caol Ila 13 Jahre (G&M Discovery)',
'Old Pulteney \\u00b7 12 Years \\u00b7 40% vol',
'Killepitsch Kr\\u00e4uterlik\\u00f6r 42% 0,7 L']
for str in strings:
print( html.unescape(str).
encode('raw_unicode_escape').
decode('unicode_escape') )
Bols Watermelon Likör 0,7l
Hayman´s Sloe Gin
Ron Zacapa Edición Negra
Havana Club Añejo Especial
Caol Ila 13 Jahre (G&M Discovery)
Old Pulteney · 12 Years · 40% vol
Killepitsch Kräuterlikör 42% 0,7 L
Edit Use .encode('raw_unicode_escape').decode('unicode_escape') for doubled Reverse Solidi, see Python Specific Encodings

BeautifulSoup - how to arrange data and write to txt?

New to Python, have a simple problem. I am pulling some data from Yahoo Fantasy Baseball to text file, but my code didn't work properly:
from bs4 import BeautifulSoup
import urllib2
teams = ("http://baseball.fantasysports.yahoo.com/b1/2282/players?status=A&pos=B&cut_type=33&stat1=S_S_2015&myteam=0&sort=AR&sdir=1")
page = urllib2.urlopen(teams)
soup = BeautifulSoup(page, "html.parser")
players = soup.findAll('div', {'class':'ysf-player-name Nowrap Grid-u Relative Lh-xs Ta-start'})
playersLines = [span.get_text('\t',strip=True) for span in players]
with open('output.txt', 'w') as f:
for line in playersLines:
line = playersLines[0]
output = line.encode('utf-8')
f.write(output)
In output file is only one player for 25 times. Any ideas to get result like this?
Pedro Álvarez Pit - 1B,3B
Kevin Pillar Tor - OF
Melky Cabrera CWS - OF
etc
Try removing:
line = playersLines[0]
Also, append a newline character to the end of your output to get them to write to separate lines in the output.txt file:
from bs4 import BeautifulSoup
import urllib2
teams = ("http://baseball.fantasysports.yahoo.com/b1/2282/players?status=A&pos=B&cut_type=33&stat1=S_S_2015&myteam=0&sort=AR&sdir=1")
page = urllib2.urlopen(teams)
soup = BeautifulSoup(page, "html.parser")
players = soup.findAll('div', {'class':'ysf-player-name Nowrap Grid-u Relative Lh-xs Ta-start'})
playersLines = [span.get_text('\t',strip=True) for span in players]
with open('output.txt', 'w') as f:
for line in playersLines:
output = line.encode('utf-8')
f.write(output+'\n')
Results:
Pedro Álvarez Pit - 1B,3B
Kevin Pillar Tor - OF
Melky Cabrera CWS - OF
Ryan Howard Phi - 1B
Michael A. Taylor Was - OF
Joe Mauer Min - 1B
Maikel Franco Phi - 3B
Joc Pederson LAD - OF
Yangervis Solarte SD - 1B,2B,3B
César Hernández Phi - 2B,3B,SS
Eddie Rosario Min - 2B,OF
Austin Jackson Sea - OF
Danny Espinosa Was - 1B,2B,3B,SS
Danny Valencia Oak - 1B,3B,OF
Freddy Galvis Phi - 3B,SS
Jimmy Paredes Bal - 2B,3B
Colby Rasmus Hou - OF
Luis Valbuena Hou - 1B,2B,3B
Chris Young NYY - OF
Kevin Kiermaier TB - OF
Steven Souza TB - OF
Jace Peterson Atl - 2B,3B
Juan Lagares NYM - OF
A.J. Pierzynski Atl - C
Khris Davis Mil - OF

Categories

Resources