Related
data = [
"Andromeda - Shrub",
"Bellflower - Flower",
"China Pink - Flower",
"Daffodil - Flower",
"Evening Primrose - Flower",
"French Marigold - Flower",
"Hydrangea - Shrub",
"Iris - Flower",
"Japanese Camellia - Shrub",
"Lavender - Shrub",
"Lilac - Shrub",
"Magnolia - Shrub",
"Peony - Shrub",
"Queen Anne's Lace - Flower",
"Red Hot Poker - Flower",
"Snapdragon - Flower",
"Sunflower - Flower",
"Tiger Lily - Flower",
"Witch Hazel - Shrub",
]
flowers = []
shrubs = []
for plant in data:
if "- Flower" in plant:
flowers.append(plant)
else:
shrubs.append(plant)
print(flowers)
print(shrubs)
Split it on - and then take the first split
for plant in data:
plant_name, category = plant.split(' -', 1)
if "Flower" in category:
flowers.append(plant_name)
else:
shrubs.append(plant_name)
You could use:
for plant in data:
to_append = plant.split(" - ")[0]
if "- Flower" in plant:
flowers.append(to_append)
else:
shrubs.append(to_append)
This will split the plant using " - " and make an array, then save to the to_append variable the first index of that array.
So for example if you take "Andromeda - Shrub" this will create an array {"Andromeda", "Shrub"} and its 0 index will be "Andromeda", which you then add to the new array.
There are multiple ways how to do that, one of them is to use replace() function, like this:
if "- Flower" in plant:
name = plant.replace("- Flower", "")
flowers.append(name)
print([flower[:flower.index(' - Flower')] for flower in flowers])
To omit the " - Flower" suffix from the strings when printing. List comprehension is very useful, I highly recommend you check it out eventually
flowers = [x.split(' - ')[0] for x in data if x.split(' - ')[1] == 'Flower']
shrubs = [x.split(' - ')[0] for x in data if x.split(' - ')[1] == 'Shrub']
print(flowers)
print('--------')
print(shrubs)
Result:
['Bellflower', 'China Pink', 'Daffodil', 'Evening Primrose', 'French Marigold', 'Iris', "Queen Anne's Lace", 'Red Hot Poker', 'Snapdragon', 'Sunflower', 'Tiger Lily']
['Andromeda', 'Hydrangea', 'Japanese Camellia', 'Lavender', 'Lilac', 'Magnolia', 'Peony', 'Witch Hazel']
z3py guys have provided a code what is based here https://github.com/0vercl0k/z3-playground/blob/master/einstein_riddle_z3.py . However comparing to this https://artificialcognition.github.io/who-owns-the-zebra the solution is rather complicated, long and ugly. I do not really want to switch the libraries as z3py seems more advanced and maintained. So I started to work on my version, but I fail to declare some parts (lack of knowledge or not possible?). Here is what I have and where I get stuck (2 comments):
from z3 import *
color = Int('color')
nationality = Int('nationality')
beverage = Int('beverage')
cigar = Int('cigar')
pet = Int('pet')
house = Int('house')
color_variations = Or(color==1, color==2, color==3, color==4, color==5)
nationality_variations = Or(nationality==1, nationality==2, nationality==3, nationality==4, nationality==5)
beverage_variations = Or(beverage==1, beverage==2, beverage==3, beverage==4, beverage==5)
cigar_variations = Or(cigar==1, cigar==2, cigar==3, cigar==4, cigar==5)
pet_variations = Or(pet==1, pet==2, pet==3, pet==4, pet==5)
house_variations = Or(house==1, house==2, house==3, house==4, house==5)
s = Solver()
s.add(color_variations)
s.add(nationality_variations)
s.add(beverage_variations)
s.add(cigar_variations)
s.add(pet_variations)
s.add(house_variations)
# This is not right
#s.add(Distinct([color, nationality, beverage, cigar, pet]))
s.add(And(Implies(nationality==1, color==1), Implies(color==1, nationality==1))) #the Brit (nationality==1) lives in the red (color==1) house
s.add(And(Implies(nationality==2, pet==1), Implies(pet==1, nationality==2))) #the Swede (nationality==2) keeps dogs (pet==1) as pets
s.add(And(Implies(nationality==3, beverage==1), Implies(beverage==1, nationality==3))) #the Dane (nationality==3) drinks tea (beverage=1)
s.add(And(Implies(color==2, beverage==2), Implies(beverage==2, color==2))) #the green (color==2) house's owner drinks coffee (beverage==2)
s.add(And(Implies(cigar==1, pet==2), Implies(pet==2, cigar==1))) #the person who smokes Pall Mall (cigar==1) rears birds ([pet==2])
s.add(And(Implies(color==4, cigar==2), Implies(cigar==2, color==4))) #the owner of the yellow (color==4) house smokes Dunhill (cigar==2)
s.add(And(Implies(house==3, beverage==3), Implies(beverage==3, house==3))) #the man living in the center (hause==3) house drinks milk (beverage==3)
s.add(And(Implies(nationality==4, house==1), Implies(house==1, nationality==4))) #the Norwegian (nationality==4) lives in the first house (house==1)
s.add(And(Implies(cigar==3, beverage==4), Implies(beverage==4, cigar==3))) #the owner who smokes BlueMaster (cigar==3) drinks beer (beverage==4)
s.add(And(Implies(nationality==5, cigar==4), Implies(cigar==4, nationality==5))) #the German (nationality==5) smokes Prince (cigar==4)
# I can't figure our this part, so I can keep it short and efficient
# the green (color==2) house is on the left of the white (color==3) house
Currently looking into direction of ForAll and Functions
You should use an enumeration for the different kinds of things here. Also, you can't just get away with having one color variable: After all, each house has a different color, and you want to track it separately. A better idea is to make color, nationality, etc., all uninterpreted functions; mapping numbers to colors, countries, etc., respectively.
Here's the Haskell solution for this problem, using the SBV library which uses z3 via the SMTLib interface, following the strategy I described: https://hackage.haskell.org/package/sbv-8.8/docs/src/Documentation.SBV.Examples.Puzzles.Fish.html
Translating this strategy to Python, we have:
from z3 import *
# Sorts of things we have
Color , (Red , Green , White , Yellow , Blue) = EnumSort('Color' , ('Red' , 'Green' , 'White' , 'Yellow' , 'Blue'))
Nationality, (Briton , Dane , Swede , Norwegian, German) = EnumSort('Nationality', ('Briton' , 'Dane' , 'Swede' , 'Norwegian', 'German'))
Beverage , (Tea , Coffee , Milk , Beer , Water) = EnumSort('Beverage' , ('Tea' , 'Coffee' , 'Milk' , 'Beer' , 'Water'))
Pet , (Dog , Horse , Cat , Bird , Fish) = EnumSort('Pet' , ('Dog' , 'Horse' , 'Cat' , 'Bird' , 'Fish'))
Sport , (Football, Baseball, Volleyball, Hockey , Tennis) = EnumSort('Sport' , ('Football', 'Baseball', 'Volleyball', 'Hockey' , 'Tennis'))
# Uninterpreted functions to match "houses" to these sorts. We represent houses by regular symbolic integers.
c = Function('color', IntSort(), Color)
n = Function('nationality', IntSort(), Nationality)
b = Function('beverage', IntSort(), Beverage)
p = Function('pet', IntSort(), Pet)
s = Function('sport', IntSort(), Sport)
S = Solver()
# Create a new fresh variable. We don't care about its name
v = 0
def newVar():
global v
i = Int("v" + str(v))
v = v + 1
S.add(1 <= i, i <= 5)
return i
# Assert a new fact. This is just a synonym for add, but keeps everything uniform
def fact0(f):
S.add(f)
# Assert a fact about a new fresh variable
def fact1(f):
i = newVar()
S.add(f(i))
# Assert a fact about two fresh variables
def fact2(f):
i = newVar()
j = newVar()
S.add(i != j)
S.add(f(i, j))
# Assert two houses are next to each other
def neighbor(i, j):
return (Or(i == j+1, j == i+1))
fact1 (lambda i : And(n(i) == Briton, c(i) == Red)) # The Briton lives in the red house.
fact1 (lambda i : And(n(i) == Swede, p(i) == Dog)) # The Swede keeps dogs as pets.
fact1 (lambda i : And(n(i) == Dane, b(i) == Tea)) # The Dane drinks tea.
fact2 (lambda i, j: And(c(i) == Green, c(j) == White, i == j-1)) # The green house is left to the white house.
fact1 (lambda i : And(c(i) == Green, b(i) == Coffee)) # The owner of the green house drinks coffee.
fact1 (lambda i : And(s(i) == Football, p(i) == Bird)) # The person who plays football rears birds.
fact1 (lambda i : And(c(i) == Yellow, s(i) == Baseball)) # The owner of the yellow house plays baseball.
fact0 ( b(3) == Milk) # The man living in the center house drinks milk.
fact0 ( n(1) == Norwegian) # The Norwegian lives in the first house.
fact2 (lambda i, j: And(s(i) == Volleyball, p(j) == Cat, neighbor(i, j))) # The man who plays volleyball lives next to the one who keeps cats.
fact2 (lambda i, j: And(p(i) == Horse, s(j) == Baseball, neighbor(i, j))) # The man who keeps the horse lives next to the one who plays baseball.
fact1 (lambda i : And(s(i) == Tennis, b(i) == Beer)) # The owner who plays tennis drinks beer.
fact1 (lambda i : And(n(i) == German, s(i) == Hockey)) # The German plays hockey.
fact2 (lambda i, j: And(n(i) == Norwegian, c(j) == Blue, neighbor(i, j))) # The Norwegian lives next to the blue house.
fact2 (lambda i, j: And(s(i) == Volleyball, b(j) == Water, neighbor(i, j))) # The man who plays volleyball has a neighbor who drinks water.
# Determine who owns the fish
fishOwner = Const("fishOwner", Nationality)
fact1 (lambda i: And(n(i) == fishOwner, p(i) == Fish))
r = S.check()
if r == sat:
m = S.model()
print(m[fishOwner])
else:
print("Solver said: %s" % r)
When I run this, I get:
$ python a.py
German
Showing that the fish-owner is German. I think your original problem had a different but similar set of constraints, you can easily use the same strategy to solve your original.
It's also instructional to look at the output of:
print(m)
in the sat case. This prints:
[v5 = 4,
v9 = 1,
v16 = 2,
v12 = 5,
v14 = 1,
v2 = 2,
v0 = 3,
v10 = 2,
v18 = 4,
v15 = 2,
v6 = 3,
v7 = 1,
v4 = 5,
v8 = 2,
v17 = 1,
v11 = 1,
v1 = 5,
v13 = 4,
fishOwner = German,
v3 = 4,
nationality = [5 -> Swede,
2 -> Dane,
1 -> Norwegian,
4 -> German,
else -> Briton],
color = [5 -> White,
4 -> Green,
1 -> Yellow,
2 -> Blue,
else -> Red],
pet = [3 -> Bird,
1 -> Cat,
2 -> Horse,
4 -> Fish,
else -> Dog],
beverage = [4 -> Coffee,
3 -> Milk,
5 -> Beer,
1 -> Water,
else -> Tea],
sport = [1 -> Baseball,
2 -> Volleyball,
5 -> Tennis,
4 -> Hockey,
else -> Football]]
Ignore all the assignments to vN variables, those are the ones we used internally for modeling purposes. But you can see how z3 mapped each of the uninterpreted functions to the corresponding values. For all of these, the value mapped is the number of the house to the corresponding value that satisfies the puzzle constraints. You can programmatically extract a full solution to the puzzle as needed by using the information contained in this model.
link for original txt file
https://medusa.ugent.be/en/exercises/187053144/description/wM6YaQUbWdHKPhQX/media/ICD.txt
This is what I got:
given_string = 'You are what you eat.'
dictionary ={'D89.1': 'Cryoglobulinemia', 'M87.332': 'Other secondary osteonecrosis of left radius', 'M25.57': 'Pain in ankle and joints of foot', 'H59.111': 'Intraoperative hemorrhage and hematoma of right eye and adnexa complicating an ophthalmic procedure', 'I82.5Z9': 'Chronic embolism and thrombosis of unspecified deep veins of unspecified distal lower extremity', 'T38.3X': 'Poisoning by, adverse effect of and underdosing of insulin and oral hypoglycemic [antidiabetic] drugs', 'H95.52': 'Postprocedural hematoma of ear and mastoid process following other procedure', 'Q90.1': 'Trisomy 21, mosaicism (mitotic nondisjunction)', 'X83.8': 'Intentional self-harm by other specified means', 'H02.145': 'Spastic ectropion of left lower eyelid', 'M67.341': 'Transient synovitis, right hand', 'P07.32': 'Preterm newborn, gestational age 29 completed weeks', 'R44.8': 'Other symptoms and signs involving general sensations and perceptions', 'R03.1': 'Nonspecific low blood-pressure reading', 'Q03': 'Congenital hydrocephalus', 'C11.0': 'Malignant neoplasm of superior wall of nasopharynx', 'C44.4': 'Other and unspecified malignant neoplasm of skin of scalp and neck', 'N48.5': 'Ulcer of penis', 'T50.2X1': 'Poisoning by carbonic-anhydrase inhibitors, benzothiadiazides and other diuretics, accidental (unintentional)', 'V92.13': 'Drowning and submersion due to being thrown overboard by motion of other powered watercraft', 'D30.0': 'Benign neoplasm of kidney', 'M08.06': 'Unspecified juvenile rheumatoid arthritis, knee', 'T41.5X4': 'Poisoning by therapeutic gases, undetermined', 'T59.3X2': 'Toxic effect of lacrimogenic gas, intentional self-harm', 'S84.91': 'Injury of unspecified nerve at lower leg level, right leg', 'Z80.4': 'Family history of malignant neoplasm of genital organs', 'M05.34': 'Rheumatoid heart disease with rheumatoid arthritis of hand', 'Y36.531': 'War operations involving thermal radiation effect of nuclear weapon, civilian', 'H59.88': 'Other intraoperative complications of eye and adnexa, not elsewhere classified', 'R29.91': 'Unspecified symptoms and signs involving the musculoskeletal system', 'M71.139': 'Other infective bursitis, unspecified wrist', 'S00.441': 'External constriction of right ear', 'V04': 'Pedestrian injured in collision with heavy transport vehicle or bus', 'C92.1': 'Chronic myeloid leukemia, BCR/ABL-positive', 'I82.60': 'Acute embolism and thrombosis of unspecified veins of upper extremity', 'I75.89': 'Atheroembolism of other site', 'S51.031': 'Puncture wound without foreign body of right elbow', 'Z01.110': 'Encounter for hearing examination following failed hearing screening', 'I06.8': 'Other rheumatic aortic valve diseases', 'Z68.25': 'Body mass index (BMI) 25.0-25.9, adult', 'A66': 'Yaws', 'S78.921': 'Partial traumatic amputation of right hip and thigh, level unspecified', 'F44': 'Dissociative and conversion disorders', 'O87.8': 'Other venous complications in the puerperium', 'K04.3': 'Abnormal hard tissue formation in pulp', 'V38.7': 'Person on outside of three-wheeled motor vehicle injured in noncollision transport accident in traffic accident', 'V36.1': 'Passenger in three-wheeled motor vehicle injured in collision with other nonmotor vehicle in nontraffic accident', 'B94.9': 'Sequelae of unspecified infectious and parasitic disease', 'K50.911': "Crohn's disease, unspecified, with rectal bleeding", 'S00.52': 'Blister (nonthermal) of lip and oral cavity', 'T43.1': 'Poisoning by, adverse effect of and underdosing of monoamine-oxidase-inhibitor antidepressants', 'B99.8': 'Other infectious disease', 'S97.12': 'Crushing injury of lesser toe(s)', 'S02.69': 'Fracture of mandible of other specified site', 'V29.10': 'Motorcycle passenger injured in collision with unspecified motor vehicles in nontraffic accident', 'Z68.35': 'Body mass index (BMI) 35.0-35.9, adult', 'A81.2': 'Progressive multifocal leukoencephalopathy', 'V44.4': 'Person boarding or alighting a car injured in collision with heavy transport vehicle or bus', 'M62.51': 'Muscle wasting and atrophy, not elsewhere classified, shoulder', 'M62.151': 'Other rupture of muscle (nontraumatic), right thigh', 'V52.2': 'Person on outside of pick-up truck or van injured in collision with two- or three-wheeled motor vehicle in nontraffic accident', 'E09.622': 'Drug or chemical induced diabetes mellitus with other skin ulcer', 'S43.492': 'Other sprain of left shoulder joint', 'M08.212': 'Juvenile rheumatoid arthritis with systemic onset, left shoulder', 'R00.0': 'Tachycardia, unspecified', 'G21.8': 'Other secondary parkinsonism', 'W58.01': 'Bitten by alligator', 'D46.1': 'Refractory anemia with ring sideroblasts', 'H61.32': 'Acquired stenosis of external ear canal secondary to inflammation and infection', 'H95.0': 'Recurrent cholesteatoma of postmastoidectomy cavity', 'Z72.4': 'Inappropriate diet and eating habits', 'Z68.41': 'Body mass index (BMI) 40.0-44.9, adult', 'S20.172': 'Other superficial bite of breast, left breast', 'I63.232': 'Cerebral infarction due to unspecified occlusion or stenosis of left carotid arteries', 'M14.811': 'Arthropathies in other specified diseases classified elsewhere, right shoulder', 'E13.41': 'Other specified diabetes mellitus with diabetic mononeuropathy', 'H02.53': 'Eyelid retraction', 'V95.49': 'Other spacecraft accident injuring occupant', 'D74.0': 'Congenital methemoglobinemia', 'D60.1': 'Transient acquired pure red cell aplasia', 'T52.1X2': 'Toxic effect of benzene, intentional self-harm', 'O71.2': 'Postpartum inversion of uterus', 'M08.439': 'Pauciarticular juvenile rheumatoid arthritis, unspecified wrist', 'M01.X72': 'Direct infection of left ankle and foot in infectious and parasitic diseases classified elsewhere', 'H95.3': 'Accidental puncture and laceration of ear and mastoid process during a procedure', 'C74.92': 'Malignant neoplasm of unspecified part of left adrenal gland', 'G00': 'Bacterial meningitis, not elsewhere classified', 'M19.011': 'Primary osteoarthritis, right shoulder', 'G72.49': 'Other inflammatory and immune myopathies, not elsewhere classified', 'Z68.34': 'Body mass index (BMI) 34.0-34.9, adult', 'V86.64': 'Passenger of military vehicle injured in nontraffic accident', 'L20.9': 'Atopic dermatitis, unspecified', 'S65.51': 'Laceration of blood vessel of other and unspecified finger', 'B67.1': 'Echinococcus granulosus infection of lung', 'S08.81': 'Traumatic amputation of nose', 'Z36.5': 'Encounter for antenatal screening for isoimmunization', 'S59.22': 'Salter-Harris Type II physeal fracture of lower end of radius', 'M66.359': 'Spontaneous rupture of flexor tendons, unspecified thigh', 'I69.919': 'Unspecified symptoms and signs involving cognitive functions following unspecified cerebrovascular disease', 'I25.700': 'Atherosclerosis of coronary artery bypass graft(s), unspecified, with unstable angina pectoris', 'V24.0': 'Motorcycle driver injured in collision with heavy transport vehicle or bus in nontraffic accident', 'S53.025': 'Posterior dislocation of left radial head', 'Q72.819': 'Congenital shortening of unspecified lower limb', 'G44.82': 'Headache associated with sexual activity', 'M93.2': 'Osteochondritis dissecans', 'V44.6': 'Car passenger injured in collision with heavy transport vehicle or bus in traffic accident', 'O90.89': 'Other complications of the puerperium, not elsewhere classified', 'T83.518': 'Infection and inflammatory reaction due to other urinary catheter', 'Z02.9': 'Encounter for administrative examinations, unspecified', 'S55.091': 'Other specified injury of ulnar artery at forearm level, right arm'}
Each character of the string must be replaced by randomly choosing among all possible Hippocrates-codes that encode the character, and return result contain code where character is in, and index of character in value
so. this is the answer that I supposed to get
A66.0 M62.51.29 V44.6.68 H95.3.70 M08.06.26 S51.031.39 V92.13.17 V95.49.25 P07.32.46 C11.0.44 V04.45 E13.41.30 G21.8.5 R00.0.4 V52.2.54 B67.1.38 V24.0.43 M01.X72.10 C74.92.35 G72.49.35 Z68.41.24
and, this is the answer that i got.
F44.6.4 S78.922.3 W36.1.17 S93.121.2 E10.32.39 A00.1.12 S90.464.3 T37.1X.9 T43.2.17 W24.0.3 Q60.3.5 V59.9.14 S66.911.5 W93.42 V14.1.34 Y92.139.14 T21.06.12 T65.89.6 Q95.3.4 S85.161.16 S93.121.7 T37.1X.18 V49.60.23 T37.1X5.7 F98.29.16 J10.89.14
for get that I wrote code like this
import re
import random
class Hippocrates:
def __init__(self, code):
self.code = code
def description(self, x):
line_list = []
split_point = []
k = []
v = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
for key, value in d.items():
if x == key:
return d[key]
else:
raise ValueError('invalid ICD-code')
def character(self, numb):
line_list = []
split_point = []
k = []
v = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
rev = numb[::-1]
revs = rev.split('.',1)
r1 =(revs[1][::-1])
r2 = (revs[0][::-1])
for key, value in d.items():
if r1 == key:
answer = d[key]
result = answer[int(r2)]
return result
else:
raise ValueError('invalid Hippocrates-code')
def codes(self, char):
line_list = []
split_point = []
k = []
v = []
r_v = []
code_result = []
des_result = []
des_result2 = []
location = []
final = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
for i in v:
for x in i:
if x == char:
r_v.append(i)
for key, value in d.items():
for i in r_v:
if i == value:
code_result.append(key)
for key in d.keys():
for i in code_result:
if i == key:
des_result.append(d[i])
for i in des_result:
if i not in des_result2:
des_result2.append(i)
for i in des_result2:
regex = re.escape(char)
a = [m.start() for m in re.finditer(regex,i)]
location.append(a)
location = (sum(location,[]))
for i in range(len(code_result)):
answer = (str(code_result[i]) +'.'+ str(location[i]))
final.append(answer)
return (set(final))
def encode(self, plaintxt):
line_list = []
split_point = []
#key of dictionary
k = []
#value of dictionary
v = []
#description that contain character with index
r = []
#list of possible choice
t = []
#randomly choosen result from t
li_di = []
#descriptoin
des = []
#index of char in description
index_char = []
#answer to print
resul = []
dictlist = []
answers = []
with open(self.code) as f:
for line in f:
for i in line:
if i == " ":
split_point.append(line.find(i))
with open(self.code) as f:
for line in f:
line_list.append(line.rstrip())
for i in line_list:
a = i.split(" ", 1)
k.append(a[0])
v.append(a[1])
d = dict(zip(k, v))
print(d)
for key, value in d.items():
for i in plaintxt:
if i in value:
answer = d[key] +':'+ str(d[key].index(i))
r.append(answer)
print(r)
a = len(plaintxt)
b=0
for i in range(len(r)):
t.append(r[b::a])
b+=1
if b == len(plaintxt):
break
for i in t:
li_di.append(random.choice(i))
for i in li_di:
sep = i.split(":", 1)
des.append(sep[0])
index_char.append(sep[1])
print(index_char)
for i in des:
for key, value in d.items():
if i == value:
resul.append(key)
print(resul)
for i in range(len(resul)):
answers.append(resul[i]+'.'+index_char[i]+'')
return(" ".join(answers))
the codes that represent character in given_string should be in same order with, original given string, but i messed it up. how can i fix this?
This should work for your encode function:
def encode(self, plaintxt):
code_map = {}
codes = []
with open(self.code) as f:
for line in f:
line = line.rstrip().split(' ', 1)
code_map[line[0]] = line[1]
for ch in plaintxt:
matches = []
for key, value in code_map.items():
pos = -1
while True:
pos = value.find(ch, pos + 1)
if pos != -1:
matches.append((key, pos))
else:
break
if not matches:
raise ValueError(f'Character {ch} cannot be encoded as there are no matches')
code_tuple = random.choice(matches)
code, idx = code_tuple
codes.append(f'{code}.{idx}')
return ' '.join(codes)
Edit: I updated this to make it more space-efficient, by getting rid of char_map and appending codes as it goes
First, it creates a dict of keys as codes and values as the corresponding strings. Then it iterates through the given plaintxt string, and searches all of the values of the dict for matches (including multiple matches in a single value), and adds this to a matches list of tuples, where each tuple contains a suitable code and the index of the match. If there are no matches, it raises a ValueError as soon as it runs into an issue. It chooses randomly from each list of tuples to choose some code and index pair, and appends this to a list on the fly, and then at the end it joins this list to make your encoded string.
If memory is not a problem, I think you should build an index of possible choices of each character from the dictionary. Here is an example code:
import random
def build_char_codes(d):
result = {}
for key, val in d.items():
for i in range(len(val)):
ch = val[i]
if ch not in result:
result[ch] = {key: [i]}
else:
result[ch][key] = result[ch].get(key, []) + [i]
return result
def get_code(ch, char_codes):
key = random.sample(char_codes[ch].keys(), 1)[0]
char_pos = random.choice(char_codes[ch][key])
code = '{}.{}'.format(key, char_pos)
return code
char_codes = build_char_codes(dictionary)
given_string = 'You are what you eat.'
codes = [get_code(ch, char_codes) for ch in given_string]
print(' '.join(codes))
Notes:
char_codes index all possible choices of each character in the dictionary
it sample all the key in dictionary first (uniformly random), and then it sample the position in the string (uniformly random). But it is not sampling uniformly among all the possible choices of a character.
In preparation for the transformation, you could create a dictionary with each letter in the ICD description mapping to a list of codes that contain it at various indexes.
Then, the transformation process would simply be a matter of picking one of the code.index from the entry in the dictionary for each letter in the given string:
preparation ...
with open(fileName,'r') as f:
icd = [line.split(" ",1) for line in f.read().split("\n")]
icdLetters = dict() # list of ICD codes with index for each possible letter
for code,description in icd:
for i,letter in enumerate(description):
icdLetters.setdefault(letter,[]).append(f"{code}.{i}")
transformation....
import random
given_string = 'You are what you eat.'
result = [ random.choice(icdLetters.get(c,["-"])) for c in given_string ]
output:
print(result)
['A66.0', 'T80.22.35', 'S53.136.34', 'C40.90.33', 'S53.136.43', 'Z96.621.12', 'B57.30.24', 'H59.121.55', 'V14.1.43', 'S93.121.47', 'H59.121.9', 'V04.92.17', 'T80.22.80', 'O16.1.22', 'T25.61.10', 'S53.136.34', 'F44.6.32', 'M67.232.29', 'M89.771.34', 'S93.121.7', 'Z68.36.29']
If you want to save some memory, your dictionary could store indexes in the main list of icd codes and descriptions instead of the formatted values:
with open(fileName,'r') as f:
icd = [line.split(" ",1) for line in f.read().split("\n")]
icdLetters = dict()
for codeIndex,(code,description) in enumerate(icd):
for letterIndex,letter in enumerate(description):
icdLetters.setdefault(letter,[]).append((codeIndex,letterIndex))
import random
def letterToCode(letter):
if letter not in icdLetters: return "-"
codeIndex,letterIndex = random.choice(icdLetters[letter])
return f"{icd[codeIndex][0]}.{letterIndex}"
given_string = 'You are what you eat.'
result = [ letterToCode(c) for c in given_string ]
Dataset:
> df
Id Clean_Data
1918916 Luxury Apartments consisting 11 towers Well equipped gymnasium Swimming Pool Toddler Pool Health Club Steam Room Sauna Jacuzzi Pool Table Chess Billiards room Carom Table Tennis indoor games
1495638 near medavakkam junction calm area near global hospital
1050651 No Pre Emi No Booking Amount No Floor Rise Charges No Processing Fee HLPROJECT HIGHLIGHTS
Below is the code which is successfully returning the matching words in ngrams from the list of values in Category.py
df['one_word_tokenized_text'] =df["Clean_Data"].str.split()
df['bigram'] = df['Clean_Data'].apply(lambda row: list(ngrams(word_tokenize(row), 2)))
df['trigram'] = df['Clean_Data'].apply(lambda row: list(ngrams(word_tokenize(row), 3)))
df['four_words'] = df['Clean_Data'].apply(lambda row: list(ngrams(word_tokenize(row), 4)))
token=pd.Series(df["one_word_tokenized_text"])
Lid=pd.Series(df["Id"])
matches= token.apply(lambda x: pd.Series(x).str.extractall("|".join(["({})".format(cat) for cat in Categories.HealthCare])))
match_list= [[m for m in match.values.ravel() if isinstance(m, str)] for match in matches]
match_df = pd.DataFrame({"ID":Lid,"jc1":match_list})
def match_word(feature, row):
categories = []
for bigram in row.bigram:
joined = ' '.join(bigram)
if joined in feature:
categories.append(joined)
for trigram in row.trigram:
joined = ' '.join(trigram)
if joined in feature:
categories.append(joined)
for fourwords in row.four_words:
joined = ' '.join(fourwords)
if joined in feature:
categories.append(joined)
return categories
match_df['Health1'] = df.apply(partial(match_word, HealthCare), axis=1)
match_df['HealthCare'] = match_df[match_df.columns[[1,2]]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)
Category.py
category = [('steam room','IN','HealthCare'),
('sauna','IN','HealthCare'),
('Jacuzzi','IN','HealthCare'),
('Aerobics','IN','HealthCare'),
('yoga room','IN','HealthCare'),]
HealthCare= [e1 for (e1, rel, e2) in category if e2=='HealthCare']
Output:
ID HealthCare
1918916 Jacuzzi
1495638
1050651 Aerobics, Jacuzzi, yoga room
Here if I mention the features in "Category list" in the exact letter case as mentioned in the dataset, then the code identifies it and returns the value, else it won't.
So I want my code to be case insensitive and even track "Steam Room","Sauna" under health category. I tried with ".lower()" function, but am not sure how to implement it.
edit 2: only category.py is updated
Category.py
category = [('steam room','IN','HealthCare'),
('sauna','IN','HealthCare'),
('jacuzzi','IN','HealthCare'),
('aerobics','IN','HealthCare'),
('Yoga room','IN','HealthCare'),
('booking','IN','HealthCare'),
]
category1 = [value[0].capitalize() for index, value in enumerate(category)]
category2 = [value[0].lower() for index, value in enumerate(category)]
test = []
test2 =[]
for index, value in enumerate(category1):
test.append((value, category[index][1],category[index][2]))
for index, value in enumerate(category2):
test2.append((value, category[index][1],category[index][2]))
category = category + test + test2
HealthCare = [e1 for (e1, rel, e2) in category if e2=='HealthCare']
Your unaltered dataset
import pandas as pd
from nltk import ngrams, word_tokenize
import Categories
from Categories import *
from functools import partial
data = {'Clean_Data':['Luxury Apartments consisting 11 towers Well equipped gymnasium Swimming Pool Toddler Pool Health Club Steam Room Sauna Jacuzzi Pool Table Chess Billiards room Carom Table Tennis indoor games',
'near medavakkam junction calm area near global hospital',
'No Pre Emi No Booking Amount No Floor Rise Charges No Processing Fee HLPROJECT HIGHLIGHTS '],
'Id' : [1918916, 1495638,1050651]}
df = pd.DataFrame(data)
df['one_word_tokenized_text'] =df["Clean_Data"].str.split()
df['bigram'] = df['Clean_Data'].apply(lambda row: list(ngrams(word_tokenize(row), 2)))
df['trigram'] = df['Clean_Data']).apply(lambda row: list(ngrams(word_tokenize(row), 3)))
df['four_words'] = df['Clean_Data'].apply(lambda row: list(ngrams(word_tokenize(row), 4)))
token=pd.Series(df["one_word_tokenized_text"])
Lid=pd.Series(df["Id"])
matches= token.apply(lambda x: pd.Series(x).str.extractall("|".join(["({})".format(cat) for cat in Categories.HealthCare])))
match_list= [[m for m in match.values.ravel() if isinstance(m, str)] for match in matches]
match_df = pd.DataFrame({"ID":Lid,"jc1":match_list})
def match_word(feature, row):
categories = []
for bigram in row.bigram:
joined = ' '.join(bigram)
if joined in feature:
categories.append(joined)
for trigram in row.trigram:
joined = ' '.join(trigram)
if joined in feature:
categories.append(joined)
for fourwords in row.four_words:
joined = ' '.join(fourwords)
if joined in feature:
categories.append(joined)
return categories
match_df['Health1'] = df.apply(partial(match_word, HealthCare), axis=1)
match_df['HealthCare'] = match_df[match_df.columns[[1,2]]].apply(lambda x: ','.join(x.dropna().astype(str)),axis=1)enize(row), 4)))
Output
print match_df
+--------+----------------+-------------+------------------------------------+
|ID |jc1 |Health1 |HealthCare |
+--------+----------------+-------------+------------------------------------+
|1918916 |[sauna, jacuzzi]| |['sauna', 'jacuzzi'],['steam room'] |
+--------+----------------+-------------+------------------------------------+
|1495638 | | | |
+--------+----------------+-------------+------------------------------------+
|1050651 | [Booking] | | ['Booking'],[] | |
+--------+----------------+-------------+------------------------------------+
How can i put all the Rows value to cell
rows = db().select(i.INV_ITEMCODE, n.INV_NAME, orderby=i.INV_ITEMCODE, join=n.on(i.POS_TASKCODE == n.POS_TASKCODE))
for r in rows:
code = str(r.db_i_item.INV_ITEMCODE)
desc = str(r.db_i_name.INV_NAME)
row = [dict(rows=rows)]
cell = [code, desc]
row = [dict(cell=cell, id=str(+1))]
records = []
total = []
result = None
result = dict(records=str(total), total='1', row=row , page='1') #records should get the total cell
return result
the RESULT return only ONE cell value
dict: {'records': '[]', 'total': '1', 'page': '1', 'row': [{'cell': ['LUBS001', 'Hav. Fully Synthetic 1L'], 'id': '1'}]}
but the ROWS have the query:
Rows: db_i_item.INV_ITEMCODE,db_i_name.INV_NAME
LUBS001,Hav. Fully Synthetic 1L
LUBS002,Hav. Formula 1L
LUBS003,Hav. SF 1L
LUBS004,Hav. Plus 2T 200ML
LUBS005,Havoline Plus 2T 1L
LUBS006,Havoline Super 4T 1L
LUBS007,Havoline EZY 4T 1L
LUBS008,Delo Sports 1L
LUBS009,Delo Gold Multigrade 1L
LUBS010,Delo Gold Monograde 1L
LUBS011,Delo Silver 1L
LUBS012,Super Diesel 1L
LUBS013,Brake Fluid 250ML
LUBS014,Brake Fluid 500ML
LUBS015,Brake Fluid 1L
LUBS016,Texamatic ATF 1L
LUBS020,Coolant
LUBS21,Delo
PET001,DIESEL
PET002,SILVER
PET003,GOLD
PET004,REGULAR
PET005,KEROSENE
got it :D
'items = db(q1 & q2 & search).select(i.INV_ITEMCODE, n.INV_NAME, m.INV_KIND, p.INV_PRICE, m.INV_DRTABLE, p.INV_PRICECODE, p.INV_ITEMCODEX, orderby=o)
ri = 0
rows = []
for ri, r in enumerate(items):
if r.db_i_matrix.INV_KIND == 'W':
kind = 'Wet'
else:
kind = 'Dry'
cell = [ str(ri + 1), str(r.db_i_item.INV_ITEMCODE), str(r.db_i_name.INV_NAME), str(kind), str(r.db_i_price.INV_PRICE),
str(r.db_i_matrix.INV_DRTABLE), str(r.db_i_price.INV_PRICECODE), str(r.db_i_price.INV_ITEMCODEX)]
records = ri + 1
rows += [dict(id=str(ri + 1), cell=cell)]
ikind = dict(records=records, totals='1', rows=rows)'