I have a conditional dict, when a string is recognized thorugh a function, it returns the key of that dictionary.
The dict is "myDict", which will be use together with the list "lookup", find a word that exists in it, and return the dictionary key for that list, adding it to the end of that name.
My code:
myDict = {'NESTLE':
['NESFIT', 'NESCAU', 'DOIS FRADES', 'MOCA', 'KIT KAT', 'NESQUIK', 'ALPINO', 'LEITE CONDENSADO MOCA'
'CHAMYTO', 'NINHO', 'MOLICO', 'CHAMBINHO', 'CHANDELLE', 'BONO', 'PASSATEMPO', 'KITKAT', 'CLASSIC'],
'all_names': ['rodis', 'Stone', 'abx']}
def search(myDict, lookup):
for key, value in myDict.items():
for v in value:
if lookup in v:
return key
lookup = ['NESCAU', 'Confeito coberto de chocolate ao leite NESCAU Ball', 'NESCAU EM PO', 'Stone']
lookup2 = [str(x).split(' ') for x in lookup]
for x in range(0, len(lookup)):
finder = search(myDict, lookup[x])
print(finder)
NESTLE
None
None
all_names
I arrived here, but was unable to do this output with the dictinary's key at the end of the list, and it is not searching for all words that contains in it, thus I wanted this output:
output_lookup = ['NESCAU NESTLE', 'Confeito coberto de chocolate ao leite NESCAU Ball NESTLE', 'NESCAU EM PO NESTLE', 'Stone all_names']
I changed your code.
I added a split of the lookup string.
myDict = {'NESTLE':
['NESFIT', 'NESCAU', 'DOIS FRADES', 'MOCA', 'KIT KAT', 'NESQUIK', 'ALPINO', 'LEITE CONDENSADO MOCA'
'CHAMYTO', 'NINHO', 'MOLICO', 'CHAMBINHO', 'CHANDELLE', 'BONO', 'PASSATEMPO', 'KITKAT', 'CLASSIC'],
'all_names': ['rodis', 'Stone', 'abx']}
def search(myDict, lookup):
for key, value in myDict.items():
for v in value:
# if lookup in v:
if v in lookup.split(' '):
return key
lookup = ['NESCAU', 'Confeito coberto de chocolate ao leite NESCAU Ball', 'NESCAU EM PO', 'Stone']
lookup2 = [str(x).split(' ') for x in lookup]
for x in lookup:
finder = search(myDict, x)
print(f"{x} {finder}")
Here the output that i got
NESCAU NESTLE
Confeito coberto de chocolate ao leite NESCAU Ball NESTLE
NESCAU EM PO NESTLE
Stone all_names
Related
I would like to know how I can find a word which has the next one with the first letter capitalised.
For example:
ID Testo
141 Vivo in una piccola città
22 Gli Stati Uniti sono una grande nazione
153 Il Regno Unito ha votato per uscire dall'Europa
64 Hugh Laurie ha interpretato Dr. House
12 Mi piace bere birra.
My expected output would be:
ID Testo Estratte
141 Vivo in una piccola città []
22 Gli Stati Uniti sono una grande nazione [Gli Stati, Stati Uniti]
153 Il Regno Unito ha votato per uscire dall'Europa [Il Regno, Regno Unito]
64 Hugh Laurie ha interpretato Dr. House [Hugh Laurie, Dr House]
12 Mi piace bere birra. []
To extract letter capitalised I do:
df['Estratte'] = df['Testo'].str.findall(r'\b([A-Z][a-z]*)\b')
However this column collect only single words since the code does not look at the next word.
Could you please tell me which condition I should add to look at the next word?
Sometime regex is not always good , let us try split with explode
s=df.Testo.str.split(' ').explode()
s2=s.groupby(level=0).shift(-1)
assign=(s + ' ' + s2)[s.str.istitle() & s2.str.isttimeitle()].groupby(level=0).agg(list)
Out[244]:
1 [Gli Stati, Stati Uniti]
2 [Il Regno, Regno Unito]
3 [Hugh Laurie, Dr. House]
Name: Testo, dtype: object
df['New']=assign
# notice after assign the not find row will be assign as NaN
Maybe you could use my code below
def getCapitalize(myStr):
words = myStr.split()
for i in range(0, len(words) - 1):
if (words[i][0].isupper() and words[i+1][0].isupper()):
yield f"{words[i]} {words[i+1]}"
This function will create a generator and you will have to convert to a list or wtv
import re
import pandas as pd
x = {141 : 'Vivo in una piccola città', 22: 'Gli Stati Uniti sono una grande nazione',
153 : 'Il Regno Unito ha votato per uscire dall\'Europa', 64 : 'Hugh Laurie ha interpretato Dr. House', 12 :'Mi piace bere birra.'}
df = pd.DataFrame(x.items(), columns = ['id', 'testo'])
caps = []
vals = df.testo
for string in vals:
string = string.split(' ')
string = string[1:]
string = ' '.join(string)
caps.append(re.findall('([A-Z][a-z]+)', string))
df['Estratte'] = caps```
Why not match a word starting with capital letter but not at the start of line
df.Testo.str.findall('(?<!^)([A-Z]\w+)')
or
df.Testo.str.findall('(?<!^)[A-Z][a-z]+')
0 []
1 [Stati, Uniti]
2 [Regno, Unito, Europa]
3 [Laurie, Dr, House]
4 []
I think the simplest is to use regex, search (pattern-space-pattern), with overlapping:
import regex as re
df['Estratte'] = df.Testo.apply(lambda x: re.findall('[A-Z][a-z]+[ ][A-Z][a-z]+', x, overlapped=True))
I have a list that contains pair of keywords ('k1', 'k2'). Here's a sample:
print (word_pairs)
--->[('salaire', 'dépense'), ('gratuité', 'argent'), ('causesmwedemwelamwemort', 'cadres'), ('caractèresmwedumwedispositif', 'historique'), ('psychomotricienmwediplôme', 'infirmier'), ('impôtmwesurmwelesmweréunionsmwesportives', 'compensation'), ('affichage', 'affichagemweopinion'), ('délaimweprorogation', 'défaillance'), ('créancemwenotion', 'généralités')]
I have a text file r_isa.txt (205MB) that contain words that share an "isa" relationship. Here's a sample, where \t represents a literal tab character:
égalité de Parseval\tformule_0.9333\tégalité_1.0
filiation illégitime\tfiliation_1.0
Loi reconnaissant l'égalité\tloi_1.0
égalité entre les sexes\tégalité_1.0
liberté égalité fraternité\tliberté_1.0
This basically means, "égalité de Parseval" isa "formule" with a score of 0.9333 and isa "égalité" with a score of 1. And so go on..
I want to know based on the r_isa file, if the keyword k1 isa k2, and if k2 is-a k1. On the output file, I want to save on each line the pair of words that do have the is-a relationship.
Here's what I did:
#Reading data as list
keywords = [line for line in open('version_final_PMI_espace.txt', encoding='utf8')]
keywords = ast.literal_eval(keywords[0])
word_pairs = []
for k,v in keywords.items():
if v:
word_pairs.append((k,v[0][0]))
len(list(set(word_pairs)))
#####
with open("r_isa.txt",encoding="utf-8") as readfile, open('Hyperonymy_file_pair.txt', 'w') as writefile:
for line in readfile:
firstfield = line.split('\t')[0].lower()
for w in word_pairs:
if w[0]==firstfield:
if w[1] in line:
writefile.write("".join(w[0]) + "\t"+"".join(w[1]) +"\n" )
This returns random pairs to me, for exemple:
salaire\targent
dépense\tcadres
unstead of ( in case of an existing isa relationship)
salaire\tdépense
causesmwedemwelamwemort\tcadres
Where did I go wrong ?
Updated Answer
The statement if w[1] in line: is highly suspect. See the following code for what I believe the logic should be. Since I don't have access to your files, I have turned readfile into a list of strings for testing purposes and instead of writing output to writefile, I am just printing some results. I have added some values to word_pairs and readfile so that I get some results. Also note that if you are converting the input file to lower case, then your word pairs must also be lower case.
This code checks if k1 isa k2 and if not, then checks if k2 isa k1.
word_pairs = [('égalité de parseval', 'égalité'), ('salaire', 'dépense'), ('gratuité', 'argent'), ('causesmwedemwelamwemort', 'cadres'), ('caractèresmwedumwedispositif', 'historique'), ('psychomotricienmwediplôme', 'infirmier'), ('impôtmwesurmwelesmweréunionsmwesportives', 'compensation'), ('affichage', 'affichagemweopinion'), ('délaimweprorogation', 'défaillance'), ('créancemwenotion', 'généralités')]
word_pairs2 = [(pair[1], pair[0]) for pair in word_pairs] # reverse the words
word_dict = dict(word_pairs) # create a dictionary for fast searching
word_dict2 = dict(word_pairs2)
readfile = [
'égalité de Parseval\tformule_0.9333\tégalité_1.0',
'filiation illégitime\tfiliation_1.0',
'Loi reconnaissant l\'égalité\tloi_1.0',
'égalité entre les sexes\tégalité_1.0',
'liberté égalité fraternité\tliberté_1.0',
'dépense\tsalaire_.9'
]
for line in readfile:
fields = line.lower().split('\t')
first_word = fields.pop(0)
isa_word = word_dict.get(first_word, word_dict2.get(first_word)) # check k2 isa k1 if k1 isa k2 is false
if isa_word is not None:
for field in fields: # check each one
fields2 = field.split('_')
second_word, score = fields2
if second_word == isa_word:
print(first_word, second_word, score)
Prints:
égalité de parseval égalité 1.0
dépense salaire .9
If it is possible that k1 isa k2 and k2 isa k1, then you need the more general (but more complicated) code:
word_pairs = [('égalité de parseval', 'égalité'), ('salaire', 'dépense'), ('gratuité', 'argent'), ('causesmwedemwelamwemort', 'cadres'), ('caractèresmwedumwedispositif', 'historique'), ('psychomotricienmwediplôme', 'infirmier'), ('impôtmwesurmwelesmweréunionsmwesportives', 'compensation'), ('affichage', 'affichagemweopinion'), ('délaimweprorogation', 'défaillance'), ('créancemwenotion', 'généralités')]
word_pairs2 = [(pair[1], pair[0]) for pair in word_pairs] # reverse the words
word_dict = dict(word_pairs) # create a dictionary for fast searching
word_dict2 = dict(word_pairs2)
readfile = [
'égalité de Parseval\tformule_0.9333\tégalité_1.0',
'filiation illégitime\tfiliation_1.0',
'Loi reconnaissant l\'égalité\tloi_1.0',
'égalité entre les sexes\tégalité_1.0',
'liberté égalité fraternité\tliberté_1.0',
'salaire\tdépense_1.0',
'dépense\tsalaire_.9'
]
for line in readfile:
fields = line.lower().split('\t')
first_word = fields.pop(0)
# k1 isa k2?
isa_word = word_dict.get(first_word)
if isa_word is not None:
for field in fields: # check each one
fields2 = field.split('_')
second_word, score = fields2
if second_word == isa_word:
print(first_word, second_word, score)
# k2 isa k1?
isa_word = word_dict2.get(first_word)
if isa_word is not None:
for field in fields: # check each one
fields2 = field.split('_')
second_word, score = fields2
if second_word == isa_word:
print(first_word, second_word, score)
Prints:
égalité de parseval égalité 1.0
salaire dépense 1.0
dépense salaire .9
kw = [('salaire', 'dépense'),
('gratuité', 'argent'),
('causesmwedemwelamwemort', 'cadres'),
('caractèresmwedumwedispositif', 'historique'),
('psychomotricienmwediplôme', 'infirmier'),
('impôtmwesurmwelesmweréunionsmwesportives', 'compensation'),
('affichage', 'affichagemweopinion'),
('délaimweprorogation', 'défaillance'),
('créancemwenotion', 'généralités')]
lines_from_file = ['égalité de Parseval\tformule_0.9333\tégalité_1.0',
'filiation illégitime\tfiliation_1.0',
'Loi reconnaissant l\'égalité\tloi_1.0',
'égalité entre les sexes\tégalité_1.0',
'liberté égalité fraternité\tliberté_1.0',
'créancemwenotion\tgénéralités_1.0',
'généralités\tcréancemwenotion_1.0']
who_is_who_dict = {}
for line in lines_from_file:
words = line.split('\t')
key = words[0]
other_words = [w.split('_')[0] for w in words[1:]]
if key in who_is_who_dict:
who_is_who_dict[key] = who_is_who_dict[key] + other_words
else:
who_is_who_dict[key] = other_words
pairs_to_write = []
for kw1, kw2 in kw:
if (kw1 in who_is_who_dict and kw2 in who_is_who_dict[kw1]
and kw2 in who_is_who_dict and kw1 in who_is_who_dict[kw2]):
pairs_to_write.append((kw1, kw2))
print(pairs_to_write)
output :
[('créancemwenotion', 'généralités')]
link for original txt file
https://medusa.ugent.be/en/exercises/187053144/description/wM6YaQUbWdHKPhQX/media/ICD.txt
This is what I got:
given_string = 'You are what you eat.'
dictionary ={'D89.1': 'Cryoglobulinemia', 'M87.332': 'Other secondary osteonecrosis of left radius', 'M25.57': 'Pain in ankle and joints of foot', 'H59.111': 'Intraoperative hemorrhage and hematoma of right eye and adnexa complicating an ophthalmic procedure', 'I82.5Z9': 'Chronic embolism and thrombosis of unspecified deep veins of unspecified distal lower extremity', 'T38.3X': 'Poisoning by, adverse effect of and underdosing of insulin and oral hypoglycemic [antidiabetic] drugs', 'H95.52': 'Postprocedural hematoma of ear and mastoid process following other procedure', 'Q90.1': 'Trisomy 21, mosaicism (mitotic nondisjunction)', 'X83.8': 'Intentional self-harm by other specified means', 'H02.145': 'Spastic ectropion of left lower eyelid', 'M67.341': 'Transient synovitis, right hand', 'P07.32': 'Preterm newborn, gestational age 29 completed weeks', 'R44.8': 'Other symptoms and signs involving general sensations and perceptions', 'R03.1': 'Nonspecific low blood-pressure reading', 'Q03': 'Congenital hydrocephalus', 'C11.0': 'Malignant neoplasm of superior wall of nasopharynx', 'C44.4': 'Other and unspecified malignant neoplasm of skin of scalp and neck', 'N48.5': 'Ulcer of penis', 'T50.2X1': 'Poisoning by carbonic-anhydrase inhibitors, benzothiadiazides and other diuretics, accidental (unintentional)', 'V92.13': 'Drowning and submersion due to being thrown overboard by motion of other powered watercraft', 'D30.0': 'Benign neoplasm of kidney', 'M08.06': 'Unspecified juvenile rheumatoid arthritis, knee', 'T41.5X4': 'Poisoning by therapeutic gases, undetermined', 'T59.3X2': 'Toxic effect of lacrimogenic gas, intentional self-harm', 'S84.91': 'Injury of unspecified nerve at lower leg level, right leg', 'Z80.4': 'Family history of malignant neoplasm of genital organs', 'M05.34': 'Rheumatoid heart disease with rheumatoid arthritis of hand', 'Y36.531': 'War operations involving thermal radiation effect of nuclear weapon, civilian', 'H59.88': 'Other intraoperative complications of eye and adnexa, not elsewhere classified', 'R29.91': 'Unspecified symptoms and signs involving the musculoskeletal system', 'M71.139': 'Other infective bursitis, unspecified wrist', 'S00.441': 'External constriction of right ear', 'V04': 'Pedestrian injured in collision with heavy transport vehicle or bus', 'C92.1': 'Chronic myeloid leukemia, BCR/ABL-positive', 'I82.60': 'Acute embolism and thrombosis of unspecified veins of upper extremity', 'I75.89': 'Atheroembolism of other site', 'S51.031': 'Puncture wound without foreign body of right elbow', 'Z01.110': 'Encounter for hearing examination following failed hearing screening', 'I06.8': 'Other rheumatic aortic valve diseases', 'Z68.25': 'Body mass index (BMI) 25.0-25.9, adult', 'A66': 'Yaws', 'S78.921': 'Partial traumatic amputation of right hip and thigh, level unspecified', 'F44': 'Dissociative and conversion disorders', 'O87.8': 'Other venous complications in the puerperium', 'K04.3': 'Abnormal hard tissue formation in pulp', 'V38.7': 'Person on outside of three-wheeled motor vehicle injured in noncollision transport accident in traffic accident', 'V36.1': 'Passenger in three-wheeled motor vehicle injured in collision with other nonmotor vehicle in nontraffic accident', 'B94.9': 'Sequelae of unspecified infectious and parasitic disease', 'K50.911': "Crohn's disease, unspecified, with rectal bleeding", 'S00.52': 'Blister (nonthermal) of lip and oral cavity', 'T43.1': 'Poisoning by, adverse effect of and underdosing of monoamine-oxidase-inhibitor antidepressants', 'B99.8': 'Other infectious disease', 'S97.12': 'Crushing injury of lesser toe(s)', 'S02.69': 'Fracture of mandible of other specified site', 'V29.10': 'Motorcycle passenger injured in collision with unspecified motor vehicles in nontraffic accident', 'Z68.35': 'Body mass index (BMI) 35.0-35.9, adult', 'A81.2': 'Progressive multifocal leukoencephalopathy', 'V44.4': 'Person boarding or alighting a car injured in collision with heavy transport vehicle or bus', 'M62.51': 'Muscle wasting and atrophy, not elsewhere classified, shoulder', 'M62.151': 'Other rupture of muscle (nontraumatic), right thigh', 'V52.2': 'Person on outside of pick-up truck or van injured in collision with two- or three-wheeled motor vehicle in nontraffic accident', 'E09.622': 'Drug or chemical induced diabetes mellitus with other skin ulcer', 'S43.492': 'Other sprain of left shoulder joint', 'M08.212': 'Juvenile rheumatoid arthritis with systemic onset, left shoulder', 'R00.0': 'Tachycardia, unspecified', 'G21.8': 'Other secondary parkinsonism', 'W58.01': 'Bitten by alligator', 'D46.1': 'Refractory anemia with ring sideroblasts', 'H61.32': 'Acquired stenosis of external ear canal secondary to inflammation and infection', 'H95.0': 'Recurrent cholesteatoma of postmastoidectomy cavity', 'Z72.4': 'Inappropriate diet and eating habits', 'Z68.41': 'Body mass index (BMI) 40.0-44.9, adult', 'S20.172': 'Other superficial bite of breast, left breast', 'I63.232': 'Cerebral infarction due to unspecified occlusion or stenosis of left carotid arteries', 'M14.811': 'Arthropathies in other specified diseases classified elsewhere, right shoulder', 'E13.41': 'Other specified diabetes mellitus with diabetic mononeuropathy', 'H02.53': 'Eyelid retraction', 'V95.49': 'Other spacecraft accident injuring occupant', 'D74.0': 'Congenital methemoglobinemia', 'D60.1': 'Transient acquired pure red cell aplasia', 'T52.1X2': 'Toxic effect of benzene, intentional self-harm', 'O71.2': 'Postpartum inversion of uterus', 'M08.439': 'Pauciarticular juvenile rheumatoid arthritis, unspecified wrist', 'M01.X72': 'Direct infection of left ankle and foot in infectious and parasitic diseases classified elsewhere', 'H95.3': 'Accidental puncture and laceration of ear and mastoid process during a procedure', 'C74.92': 'Malignant neoplasm of unspecified part of left adrenal gland', 'G00': 'Bacterial meningitis, not elsewhere classified', 'M19.011': 'Primary osteoarthritis, right shoulder', 'G72.49': 'Other inflammatory and immune myopathies, not elsewhere classified', 'Z68.34': 'Body mass index (BMI) 34.0-34.9, adult', 'V86.64': 'Passenger of military vehicle injured in nontraffic accident', 'L20.9': 'Atopic dermatitis, unspecified', 'S65.51': 'Laceration of blood vessel of other and unspecified finger', 'B67.1': 'Echinococcus granulosus infection of lung', 'S08.81': 'Traumatic amputation of nose', 'Z36.5': 'Encounter for antenatal screening for isoimmunization', 'S59.22': 'Salter-Harris Type II physeal fracture of lower end of radius', 'M66.359': 'Spontaneous rupture of flexor tendons, unspecified thigh', 'I69.919': 'Unspecified symptoms and signs involving cognitive functions following unspecified cerebrovascular disease', 'I25.700': 'Atherosclerosis of coronary artery bypass graft(s), unspecified, with unstable angina pectoris', 'V24.0': 'Motorcycle driver injured in collision with heavy transport vehicle or bus in nontraffic accident', 'S53.025': 'Posterior dislocation of left radial head', 'Q72.819': 'Congenital shortening of unspecified lower limb', 'G44.82': 'Headache associated with sexual activity', 'M93.2': 'Osteochondritis dissecans', 'V44.6': 'Car passenger injured in collision with heavy transport vehicle or bus in traffic accident', 'O90.89': 'Other complications of the puerperium, not elsewhere classified', 'T83.518': 'Infection and inflammatory reaction due to other urinary catheter', 'Z02.9': 'Encounter for administrative examinations, unspecified', 'S55.091': 'Other specified injury of ulnar artery at forearm level, right arm'}
Each character of the string must be replaced by randomly choosing among all possible Hippocrates-codes that encode the character, and return result contain code where character is in, and index of character in value
so. this is the answer that I supposed to get
A66.0 M62.51.29 V44.6.68 H95.3.70 M08.06.26 S51.031.39 V92.13.17 V95.49.25 P07.32.46 C11.0.44 V04.45 E13.41.30 G21.8.5 R00.0.4 V52.2.54 B67.1.38 V24.0.43 M01.X72.10 C74.92.35 G72.49.35 Z68.41.24
and, this is the answer that i got.
F44.6.4 S78.922.3 W36.1.17 S93.121.2 E10.32.39 A00.1.12 S90.464.3 T37.1X.9 T43.2.17 W24.0.3 Q60.3.5 V59.9.14 S66.911.5 W93.42 V14.1.34 Y92.139.14 T21.06.12 T65.89.6 Q95.3.4 S85.161.16 S93.121.7 T37.1X.18 V49.60.23 T37.1X5.7 F98.29.16 J10.89.14
for get that I wrote code like this
import re
import random
class Hippocrates:
def __init__(self, code):
self.code = code
def description(self, x):
line_list = []
split_point = []
k = []
v = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
for key, value in d.items():
if x == key:
return d[key]
else:
raise ValueError('invalid ICD-code')
def character(self, numb):
line_list = []
split_point = []
k = []
v = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
rev = numb[::-1]
revs = rev.split('.',1)
r1 =(revs[1][::-1])
r2 = (revs[0][::-1])
for key, value in d.items():
if r1 == key:
answer = d[key]
result = answer[int(r2)]
return result
else:
raise ValueError('invalid Hippocrates-code')
def codes(self, char):
line_list = []
split_point = []
k = []
v = []
r_v = []
code_result = []
des_result = []
des_result2 = []
location = []
final = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
for i in v:
for x in i:
if x == char:
r_v.append(i)
for key, value in d.items():
for i in r_v:
if i == value:
code_result.append(key)
for key in d.keys():
for i in code_result:
if i == key:
des_result.append(d[i])
for i in des_result:
if i not in des_result2:
des_result2.append(i)
for i in des_result2:
regex = re.escape(char)
a = [m.start() for m in re.finditer(regex,i)]
location.append(a)
location = (sum(location,[]))
for i in range(len(code_result)):
answer = (str(code_result[i]) +'.'+ str(location[i]))
final.append(answer)
return (set(final))
def encode(self, plaintxt):
line_list = []
split_point = []
#key of dictionary
k = []
#value of dictionary
v = []
#description that contain character with index
r = []
#list of possible choice
t = []
#randomly choosen result from t
li_di = []
#descriptoin
des = []
#index of char in description
index_char = []
#answer to print
resul = []
dictlist = []
answers = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
print(d)
for key, value in d.items():
for i in plaintxt:
if i in value:
answer = d[key] +':'+ str(d[key].index(i))
r.append(answer)
print(r)
a = len(plaintxt)
b=0
for i in range(len(r)):
t.append(r[b::a])
b+=1
if b == len(plaintxt):
break
for i in t:
li_di.append(random.choice(i))
for i in li_di:
sep = i.split(":", 1)
des.append(sep[0])
index_char.append(sep[1])
print(index_char)
for i in des:
for key, value in d.items():
if i == value:
resul.append(key)
print(resul)
for i in range(len(resul)):
answers.append(resul[i]+'.'+index_char[i]+'')
return(" ".join(answers))
the codes that represent character in given_string should be in same order with, original given string, but i messed it up. how can i fix this?
This should work for your encode function:
def encode(self, plaintxt):
code_map = {}
codes = []
with open(self.code) as f:
for line in f:
line = line.rstrip().split(' ', 1)
code_map[line[0]] = line[1]
for ch in plaintxt:
matches = []
for key, value in code_map.items():
pos = -1
while True:
pos = value.find(ch, pos + 1)
if pos != -1:
matches.append((key, pos))
else:
break
if not matches:
raise ValueError(f'Character {ch} cannot be encoded as there are no matches')
code_tuple = random.choice(matches)
code, idx = code_tuple
codes.append(f'{code}.{idx}')
return ' '.join(codes)
Edit: I updated this to make it more space-efficient, by getting rid of char_map and appending codes as it goes
First, it creates a dict of keys as codes and values as the corresponding strings. Then it iterates through the given plaintxt string, and searches all of the values of the dict for matches (including multiple matches in a single value), and adds this to a matches list of tuples, where each tuple contains a suitable code and the index of the match. If there are no matches, it raises a ValueError as soon as it runs into an issue. It chooses randomly from each list of tuples to choose some code and index pair, and appends this to a list on the fly, and then at the end it joins this list to make your encoded string.
If memory is not a problem, I think you should build an index of possible choices of each character from the dictionary. Here is an example code:
import random
def build_char_codes(d):
result = {}
for key, val in d.items():
for i in range(len(val)):
ch = val[i]
if ch not in result:
result[ch] = {key: [i]}
else:
result[ch][key] = result[ch].get(key, []) + [i]
return result
def get_code(ch, char_codes):
key = random.sample(char_codes[ch].keys(), 1)[0]
char_pos = random.choice(char_codes[ch][key])
code = '{}.{}'.format(key, char_pos)
return code
char_codes = build_char_codes(dictionary)
given_string = 'You are what you eat.'
codes = [get_code(ch, char_codes) for ch in given_string]
print(' '.join(codes))
Notes:
char_codes index all possible choices of each character in the dictionary
it sample all the key in dictionary first (uniformly random), and then it sample the position in the string (uniformly random). But it is not sampling uniformly among all the possible choices of a character.
In preparation for the transformation, you could create a dictionary with each letter in the ICD description mapping to a list of codes that contain it at various indexes.
Then, the transformation process would simply be a matter of picking one of the code.index from the entry in the dictionary for each letter in the given string:
preparation ...
with open(fileName,'r') as f:
icd = [line.split(" ",1) for line in f.read().split("\n")]
icdLetters = dict() # list of ICD codes with index for each possible letter
for code,description in icd:
for i,letter in enumerate(description):
icdLetters.setdefault(letter,[]).append(f"{code}.{i}")
transformation....
import random
given_string = 'You are what you eat.'
result = [ random.choice(icdLetters.get(c,["-"])) for c in given_string ]
output:
print(result)
['A66.0', 'T80.22.35', 'S53.136.34', 'C40.90.33', 'S53.136.43', 'Z96.621.12', 'B57.30.24', 'H59.121.55', 'V14.1.43', 'S93.121.47', 'H59.121.9', 'V04.92.17', 'T80.22.80', 'O16.1.22', 'T25.61.10', 'S53.136.34', 'F44.6.32', 'M67.232.29', 'M89.771.34', 'S93.121.7', 'Z68.36.29']
If you want to save some memory, your dictionary could store indexes in the main list of icd codes and descriptions instead of the formatted values:
with open(fileName,'r') as f:
icd = [line.split(" ",1) for line in f.read().split("\n")]
icdLetters = dict()
for codeIndex,(code,description) in enumerate(icd):
for letterIndex,letter in enumerate(description):
icdLetters.setdefault(letter,[]).append((codeIndex,letterIndex))
import random
def letterToCode(letter):
if letter not in icdLetters: return "-"
codeIndex,letterIndex = random.choice(icdLetters[letter])
return f"{icd[codeIndex][0]}.{letterIndex}"
given_string = 'You are what you eat.'
result = [ letterToCode(c) for c in given_string ]
I'm trying to add on values to a key after making a dictionary.
This is what I have so far:
movie_list = "movies.txt" # using a file that contains this order on first line: Title, year, genre, director, actor
in_file = open(movie_list, 'r')
in_file.readline()
def list_maker(in_file):
movie1 = str(input("Enter in a movie: "))
movie2 = str(input("Enter in another movie: "))
d = {}
for line in in_file:
l = line.split(",")
title_year = (l[0], l[1]) # only then making the tuple ('Title', 'year')
for i in range(4, len(l)):
d = {title_year: l[i]}
if movie1 or movie2 == l[0]:
print(d.values())
The output I get it:
Enter in a movie: 13 B
Enter in another movie: 1920
{('13 B', '(2009)'): 'R. Madhavan'}
{('13 B', '(2009)'): 'Neetu Chandra'}
{('13 B', '(2009)'): 'Poonam Dhillon\n'}
{('1920', '(2008)'): 'Rajneesh Duggal'}
{('1920', '(2008)'): 'Adah Sharma'}
{('1920', '(2008)'): 'Anjori Alagh\n'}
{('1942 A Love Story', '(1994)'): 'Anil Kapoor'}
{('1942 A Love Story', '(1994)'): 'Manisha Koirala'}
{('1942 A Love Story', '(1994)'): 'Jackie Shroff\n'}
.... so on and so forth. I get the whole list of movies.
How would I go about doing so if I wanted to enter in those two movies (any 2 movies as a union of the values to the key (movie1, movie2) )?
Example:
{('13 B', '(2009)'): 'R. Madhavan', 'Neetu Chandra', 'Poonam Dhillon'}
{('1920', '(2008)'): 'Rajneesh Duggal', 'Adah Sharma', 'Anjori Alagh'}
Sorry if the output isn't completely what you want, but here's how you should do it:
d = {}
for line in in_file:
l = line.split(",")
title_year = (l[0], l[1])
people = []
for i in range(4, len(l)):
people.append(l[i]) # we append items to the list...
d = {title_year: people} # ...and then make the dict so that the list is in it.
if movie1 or movie2 == l[0]:
print(d.values())
Basically, what we are doing here is that we are making a list, and then setting the list to a key inside of the dict.
My list is formatted like:
gymnastics_school,participant_name,all-around_points_earned
I need to divide it up by schools but keep the scores.
import collections
def main():
names = ["gymnastics_school", "participant_name", "all_around_points_earned"]
Data = collections.namedtuple("Data", names)
data = []
with open('state_meet.txt','r') as f:
for line in f:
line = line.strip()
items = line.split(',')
items[2] = float(items[2])
data.append(Data(*items))
These are examples of how they're set up:
Lanier City Gymnastics,Ben W.,55.301
Lanier City Gymnastics,Alex W.,54.801
Lanier City Gymnastics,Sky T.,51.2
Lanier City Gymnastics,William G.,47.3
Carrollton Boys,Cameron M.,61.6
Carrollton Boys,Zachary W.,58.7
Carrollton Boys,Samuel B.,58.6
La Fayette Boys,Nate S.,63
La Fayette Boys,Kaden C.,62
La Fayette Boys,Cohan S.,59.1
La Fayette Boys,Cooper J.,56.101
La Fayette Boys,Avi F.,53.401
La Fayette Boys,Frederic T.,53.201
Columbus,Noah B.,50.3
Savannah Metro,Levi B.,52.801
Savannah Metro,Taylan T.,52
Savannah Metro,Jacob S.,51.5
SAAB Gymnastics,Dawson B.,58.1
SAAB Gymnastics,Dean S.,57.901
SAAB Gymnastics,William L.,57.101
SAAB Gymnastics,Lex L.,52.501
Suwanee Gymnastics,Colin K.,57.3
Suwanee Gymnastics,Matthew B.,53.201
After processing it should look like:
Lanier City Gymnastics:participants(4)
as it own list
Carrollton Boys(3)
as it own list
La Fayette Boys(6)
etc.
I would recommend putting them in dictionaries:
data = {}
with open('state_meet.txt','r') as f:
for line in f:
line = line.strip()
items = line.split(',')
items[2] = float(items[2])
if items[0] in data:
data[items[0]].append(items[1:])
else:
data[items[0]] = [items[1:]]
Then access schools could be done in the following way:
>>> data['Lanier City Gymnastics']
[['Ben W.',55.301],['Alex W.',54.801],['Sky T'.,51.2],['William G.',47.3]
EDIT:
Assuming you need the whole dataset as a list first, then you want to divide it into smaller lists you can generate the dictionary from the list:
data = []
with open('state_meet.txt','r') as f:
for line in f:
line = line.strip()
items = line.split(',')
items[2] = float(items[2])
data.append(items)
#perform median or other operation on your data
nested_data = {}
for items in data:
if items[0] in data:
data[items[0]].append(items[1:])
else:
data[items[0]] = [items[1:]]
nested_data[item[0]]
When you need to get a subset of a list you can use slicing:
mylist[start:stop:step]
where start, stop and step are optional (see link for more comprehensive introduction)