String.Strip is skipping a character the second for loop - python

For some reason after the second loop in my array the code is skipping a character for some reason.
I think here is the problem:
for word in range(int(len(ShortArray))):
localString = LongArray[word]
#print(word)
if localString[:2] == ShortArray[word]:
print(LongArray[word])
print(word)
Here is the full code:
kleuren = ["Rood","Geel","Groen","Blauw","Wit","Paars","Oranje","Zwart"]
KleurenShort = []
def splitArray(string):
for lenght in range(int(len(string) / 2)):
KleurenShort.append(string[:2])
print(KleurenShort)
string = string.strip(string[:2])
return KleurenShort
def tekst_naar_kleur(string):
return 0
def matchFirst2Letters(ShortArray,LongArray):
for word in range(int(len(ShortArray))):
localString = LongArray[word]
#print(word)
if localString[:2] == ShortArray[word]:
print(LongArray[word])
print(word)
matchFirst2Letters(splitArray("RoGeGrBl"),kleuren)
The outcome is:
['Ro']
['Ro', 'Ge']
['Ro', 'Ge', 'rB']
['Ro', 'Ge', 'rB', 'l']
when it should be:
['Ro']
['Ro', 'Ge']
['Ro', 'Ge', 'Gr']
['Ro', 'Ge', 'Gr', 'Bl']

The problem is the use of the string.strip() method.
'aaaaaabcdb'.strip('ab')
gives 'cd' as every instance of 'a' and 'b' in your input string is removed. You can simply get rid of the first two letters of the input string by indexing:
'abcde'[2:] will give 'cde'.
Implemented in your code the corrected version is:
kleuren = ["Rood","Geel","Groen","Blauw","Wit","Paars","Oranje","Zwart"]
KleurenShort = []
def splitArray(string):
for lenght in range(int(len(string) / 2)):
KleurenShort.append(string[:2])
print(KleurenShort)
string = string[2:]
return KleurenShort
def tekst_naar_kleur(string):
return 0
def matchFirst2Letters(ShortArray,LongArray):
for word in range(int(len(ShortArray))):
localString = LongArray[word]
#print(word)
if localString[:2] == ShortArray[word]:
print(LongArray[word])
print(word)
matchFirst2Letters(splitArray("RoGeGrBl"),kleuren)
which outputs
['Ro']
['Ro', 'Ge']
['Ro', 'Ge', 'Gr']
['Ro', 'Ge', 'Gr', 'Bl']
Rood
0
Geel
1
Groen
2
Blauw
3
With the answer from the comment linked below, your splitArray function simply becomes:
def splitArray(string):
return [string[i:i+2] for i in range(0, len(string), 2)]

Related

I am working on a semantic word search in python using cosine similarity. It is giving me Process finished with exit code 0 without showing an output?

using cosine similarity I am trying to find the semantic word comparison. I have posted the code below for reference, in the code, I have added the stopwords, which are the words I don't want to be found during the search. I have opened the text file with which I want the reference words (also given below ) to be compared. I am also adding a limit of 3 words to the search means any word less than three characters is to be considered a stop word. while running the code it is giving me the process finished with exit code 0 and I can't get an output from the code. Would really appreciate some help. Thank you in advance.
import math
import re
stopwords = set (["is", "a", "about", "above", "above", "across", "after", "afterwards", "again", "against", "all", "almost",
"alonll", "with", "within", "without", "would", "yet", "you", "your",
"yours", "yourself", "yourselves", "the"])
with open("ref.txt", "r") as f:
lines = f.readlines()
def build_frequency_vector(content: str) -> dict[str, int]:
vector = {}
word_seq = re.split("[ ,;.!?]+", content)
for words in word_seq:
if words not in stopwords and len(words) >= 3:
words = words.lower()
if words in vector:
vector[words] = vector[words] + 1
else:
vector[words] = 1
return vector
refWords = ['spain', 'anchovy',
'france', 'internet', 'china', 'mexico', 'fish', 'industry', 'agriculture', 'fishery', 'tuna', 'transport',
'italy', 'web', 'communication', 'labour', 'fish', 'cod']
refWordsDict = {}
for refWord in refWords:
refWordsDict[refWord] = {}
for line in lines:
line = line.lower()
temp = build_frequency_vector(line)
if refWord not in temp:
continue
for word in temp:
if word not in stopwords and len(word) >= 3 and word != refWord:
refWordsDict[refWord][word] = refWordsDict[refWord].get(word, 0) + temp[word]
def product(v1: dict[str, int], v2: dict[str, int]) -> float:
sp = 0.0
for word in v1:
sp += v1[word] * v2.get(word, 0)
return sp
def cosineSimilarity(s1: str, s2: str) -> float :
d1 = build_frequency_vector(word1)
d2 = build_frequency_vector(word2)
return product(d1, d2) / (math.sqrt(product(d1, d1) * product(d2, d2)))
bests = {}
for word1 in refWords:
bestSimilarity = 0
for word2 in refWords:
if word1 != word2:
similarity: float = cosineSimilarity(refWordsDict[word1], refWordsDict[word2])
if similarity > bestSimilarity:
bestSimilarity = similarity
bests[word1] = (word2, bestSimilarity)
for item in bests:
print(item, "->", bests[item])
I am very new to python and not able to find a solution.

Split each line in a file based on delimitters

This is the sample data in a file. I want to split each line in the file and add to a dataframe. In some cases they have more than 1 child. So whenever they have more than one child new set of column have to be added child2 Name and DOB
(P322) Rashmika Chadda 15/05/1995 – Rashmi C 12/02/2024
(P324) Shiva Bhupati 01/01/1994 – Vinitha B 04/08/2024
(P356) Karthikeyan chandrashekar 22/02/1991 – Kanishka P 10/03/2014
(P366) Kalyani Manoj 23/01/1975 - Vandana M 15/05/1995 - Chandana M 18/11/1998
This is the code I have tried but this splits only by taking "-" into consideration
with open("text.txt") as read_file:
file_contents = read_file.readlines()
content_list = []
temp = []
for each_line in file_contents:
temp = each_line.replace("–", " ").split()
content_list.append(temp)
print(content_list)
Current output:
[['(P322)', 'Rashmika', 'Chadda', '15/05/1995', 'Rashmi', 'Chadda', 'Teega', '12/02/2024'], ['(P324)', 'Shiva', 'Bhupati', '01/01/1994', 'Vinitha', 'B', 'Sahu', '04/08/2024'], ['(P356)', 'Karthikeyan', 'chandrashekar', '22/02/1991', 'Kanishka', 'P', '10/03/2014'], ['(P366)', 'Kalyani', 'Manoj', '23/01/1975', '-', 'Vandana', 'M', '15/05/1995', '-', 'Chandana', 'M', '18/11/1998']]
Final output should be like below
Code
Parent_Name
DOB
Child1_Name
DOB
Child2_Name
DOB
P322
Rashmika Chadda
15/05/1995
Rashmi C
12/02/2024
P324
Shiva Bhupati
01/01/1994
Vinitha B
04/08/2024
P356
Karthikeyan chandrashekar
22/02/1991
Kanishka P
10/03/2014
P366
Kalyani Manoj
23/01/1975
Vandana M
15/05/1995
Chandana M
18/11/1998
I'm not sure if you want it as a list or something else.
To get lists:
result = []
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
# reconstruct
for i, person in enumerate(t):
person = person.split(" ")
# print(person)
# remove code
if i==0:
res = [person.pop(0)]
res.extend([" ".join(person[:2]), person[2]])
result.append(res)
print(result)
Which would give the below output:
[['P322', 'Rashmika Chadda', '15/05/1995', 'Rashmi C', '12/02/2024'], ['P324', 'Shiva Bhupati', '01/01/1994', 'Vinitha B', '04/08/2024'], ['P356', 'Karthikeyan chandrashekar', '22/02/1991', 'Kanishka P', '10/03/2014'], ['P366', 'Kalyani Manoj', '23/01/1975', 'Vandana M', '15/05/1995', 'Chandana M', '18/11/1998']]
You can organise a bit more the data using dictionnary:
result = {}
for t in text[:]:
# remove the \n at the end of each line
t = t.strip()
# remove the parenthesis you don't wnt
t = t.replace("(", "")
t = t.replace(")", "")
# split on space
t = t.split(" – ")
for i, person in enumerate(t):
# split name
person = person.split(" ")
# remove code
if i==0:
code = person.pop(0)
if i==0:
result[code] = {"parent_name": " ".join(person[:2]), "parent_DOB": person[2], "children": [] }
else:
result[code]['children'].append({f"child{i}_name": " ".join(person[:2]), f"child{i}_DOB": person[2]})
print(result)
Which would give this output:
{'P322': {'children': [{'child1_DOB': '12/02/2024',
'child1_name': 'Rashmi C'}],
'parent_DOB': '15/05/1995',
'parent_name': 'Rashmika Chadda'},
'P324': {'children': [{'child1_DOB': '04/08/2024',
'child1_name': 'Vinitha B'}],
'parent_DOB': '01/01/1994',
'parent_name': 'Shiva Bhupati'},
'P356': {'children': [{'child1_DOB': '10/03/2014',
'child1_name': 'Kanishka P'}],
'parent_DOB': '22/02/1991',
'parent_name': 'Karthikeyan chandrashekar'},
'P366': {'children': [{'child1_DOB': '15/05/1995',
'child1_name': 'Vandana M'},
{'child2_DOB': '18/11/1998', 'child2_name': 'Chandana M'}],
'parent_DOB': '23/01/1975',
'parent_name': 'Kalyani Manoj'}}
In the end, to have an actual table, you would need to use pandas but that will require for you to fix the number of children max so that you can pad the empty cells.

can't return a list with desired order from a python function

wanna write a code that would add to burgerlist the items of my_order with the following order: first and last element of burgerlist should be bread, second and pre-last element should be mayonnaise(if it exist among the arguments while calling function), then beef / chicken, then vegitables.
pls help to understand what to change here
def my_odrer(*g):
ingredients = [['long_bread', 'circle_bread'], ['mayonnaise', 'ketchup'], ['beef', 'chicken'],
['cucumber', 'tomato', 'onion']]
burgerlist = []
for i in g:
if i in ingredients[0]:
burgerlist.insert(0, i)
elif i in ingredients[1]:
burgerlist.insert(1, i)
elif i in ingredients[2]:
burgerlist.append(i)
elif i in ingredients[3]:
burgerlist.append(i)
if burgerlist[1] == 'mayonnaise':
burgerlist.append(burgerlist[1])
burgerlist.append(burgerlist[0])
return burgerlist
print(my_odrer('circle_bread', 'beef', 'tomato', 'mayonnaise', 'ketchup'))
the output is: ['circle_bread', 'ketchup', 'mayonnaise', 'beef', 'tomato', 'circle_bread']
but I want to get: ['circle_bread', 'mayonnaise', 'ketchup', 'beef', 'tomato','mayonnaise', 'circle_bread']
Create 3 lists containing the ingredients that should be at the beginning, middle, and end. Then concatenate them to produce the final result.
def my_odrer(*g):
breads = {'long_bread', 'circle_bread'}
condiments = {'ketchup'} # mayonnaise not included, since it's handled specially
meats = {'beef', 'chicken'}
vegetables = {'cucumber', 'tomato', 'onion'}
beginning = []
middle = []
end = []
for item in g:
if item in breads:
beginning.append(item)
end.append(item)
if "mayonnaise" in g:
beginning.append("mayonnaise")
end.insert(-1, "mayonnaise")
for item in g:
if item in condiments:
middle.append(item)
for item in g:
if item in meats:
middle.append(item)
for item in g:
if item in vegetables:
middle.append(item)
return beginning + middle + end
It works after I added an additional condition in your final if statement.
def my_odrer(*g):
ingredients = [['long_bread', 'circle_bread'], ['mayonnaise', 'ketchup'], ['beef', 'chicken'],
['cucumber', 'tomato', 'onion']]
burgerlist = []
for i in g:
if i in ingredients[0]:
burgerlist.insert(0, i)
elif i in ingredients[1]:
burgerlist.insert(1, i)
elif i in ingredients[2]:
burgerlist.append(i)
elif i in ingredients[3]:
burgerlist.append(i)
if burgerlist[1] == 'mayonnaise':
burgerlist.append(burgerlist[1])
elif burgerlist[2] == 'mayonnaise':
burgerlist[1], burgerlist[2] = burgerlist[2], burgerlist[1]
burgerlist.append(burgerlist[1])
burgerlist.append(burgerlist[0])
return burgerlist

Generate strings using translations of several characters, mapped to several others

I'm facing quite a tricky problem in my python code. I looked around and was not able to find anyone with a similar problem.
I'd like to generate strings translating some characters into several, different ones.
I'd like that original characters, meant to be replaced (translated), to be replaced by several different ones.
What I'm looking to do is something like this :
text = "hi there"
translations = {"i":["b", "c"], "r":["e","f"]}
result = magicfunctionHere(text,translations)
print(result)
> [
"hb there",
"hc there",
"hi theee",
"hi thefe",
"hb theee",
"hb thefe",
"hc theee",
"hc thefe"
]
The result contains any combination of the original text with 'i' and 'r' replaced respectively by 'b' and 'c', and 'e' and 'f'.
I don't see how to do that, using itertools and functions like permutations, product etc...
I hope I'm clear enough, it is quite a specific problem !
Thank you for your help !
def magicfunction(ret, text, alphabet_location, translations):
if len(alphabet_location) == 0:
ret.append(text)
return ret
index = alphabet_location.pop()
for w in translations[text[index]]:
ret = magicfunction(ret, text[:index] + w + text[index + 1:], alphabet_location, translations)
alphabet_location.append(index)
return ret
def magicfunctionHere(text, translations):
alphabet_location = []
for key in translations.keys():
alphabet_location.append(text.find(key))
translations[key].append(key)
ret = []
ret = magicfunction(ret, text, alphabet_location, translations)
ret.pop()
return ret
text = "hi there"
translations = {"i":["b", "c"], "r":["e","f"]}
result = magicfunctionHere(text,translations)
print(result)
One crude way to go would be to use a Nested Loop Constructin 2 steps (Functions) as depicted in the Snippet below:
def rearrange_characters(str_text, dict_translations):
tmp_result = []
for key, value in dict_translations.items():
if key in str_text:
for replacer in value:
str_temp = str_text.replace(key, replacer, 1)
if str_temp not in tmp_result:
tmp_result.append(str_temp)
return tmp_result
def get_rearranged_characters(str_text, dict_translations):
lst_result = rearrange_characters(str_text, dict_translations)
str_joined = ','.join(lst_result)
for str_part in lst_result:
str_joined = "{},{}".format(str_joined, ','.join(rearrange_characters(str_part, dict_translations)))
return set(str_joined.split(sep=","))
text = "hi there"
translations = {"i": ["b", "c"], "r":["e","f"]}
result = get_rearranged_characters(text, translations)
print(result)
## YIELDS: {
'hb theee',
'hc thefe',
'hc there',
'hi thefe',
'hb thefe',
'hi theee',
'hc theee',
'hb there'
}
See also: https://eval.in/960803
Another equally convoluted approach would be to use a single function with nested loops like so:
def process_char_replacement(str_text, dict_translations):
tmp_result = []
for key, value in dict_translations.items():
if key in str_text:
for replacer in value:
str_temp = str_text.replace(key, replacer, 1)
if str_temp not in tmp_result:
tmp_result.append(str_temp)
str_joined = ','.join(tmp_result)
for str_part in tmp_result:
tmp_result_2 = []
for key, value in dict_translations.items():
if key in str_part:
for replacer in value:
str_temp = str_part.replace(key, replacer, 1)
if str_temp not in tmp_result_2:
tmp_result_2.append(str_temp)
str_joined = "{},{}".format(str_joined, ','.join(tmp_result_2))
return set(str_joined.split(sep=","))
text = "hi there"
translations = {"i": ["b", "c"], "r":["e","f"]}
result = process_char_replacement(text, translations)
print(result)
## YIELDS: {
'hb theee',
'hc thefe',
'hc there',
'hi thefe',
'hb thefe',
'hi theee',
'hc theee',
'hb there'
}
Refer to: https://eval.in/961602

Python detect if string contains specific length substring of chars from specific set

So given a set of chars and length
s = set('abc')
l = 5
How can I ensure that a string doesn't contain substrings like
abcab
aaaaa
Length needs to be around 60 so I can't just generate all substrings.
You can iterate through each character of the string and keep track of the previous number of characters that are elements of s.
def hasSubstring(s, l):
length = 0
for c in str:
if c in s:
length += 1
else:
length = 0
if length > l:
return True
return False
What about using product and list-comprehension.
from itertools import product
s = set('abc')
l = 5
omit = ['abcab','aaaaa']
def sorter(s,l,omit):
s= ''.join(list(s))
unsrted = [''.join(it) for it in list(product(s,repeat=l))]
filrted = [value for value in unsrted if value not in omit]#just filter here based on the list omit
return filrted
print sorter(s, l, omit)
Output-
['aaaac', 'aaaab', 'aaaca', 'aaacc', 'aaacb', 'aaaba', 'aaabc', 'aaabb', 'aacaa', 'aacac', 'aacab', 'aacca', 'aaccc', 'aaccb', 'aacba', 'aacbc', 'aacbb', 'aabaa', 'aabac', 'aabab', 'aabca', 'aabcc', 'aabcb', 'aabba', 'aabbc', 'aabbb', 'acaaa', 'acaac', 'acaab', 'acaca', 'acacc', 'acacb', 'acaba', 'acabc', 'acabb', 'accaa', 'accac', 'accab', 'accca', 'acccc', 'acccb', 'accba', 'accbc', 'accbb', 'acbaa', 'acbac', 'acbab', 'acbca', 'acbcc', 'acbcb', 'acbba', 'acbbc', 'acbbb', 'abaaa', 'abaac', 'abaab', 'abaca', 'abacc', 'abacb', 'ababa', 'ababc', 'ababb', 'abcaa', 'abcac', 'abcca', 'abccc', 'abccb', 'abcba', 'abcbc', 'abcbb', 'abbaa', 'abbac', 'abbab', 'abbca', 'abbcc', 'abbcb', 'abbba', 'abbbc', 'abbbb', 'caaaa', 'caaac', 'caaab', 'caaca', 'caacc', 'caacb', 'caaba', 'caabc', 'caabb', 'cacaa', 'cacac', 'cacab', 'cacca', 'caccc', 'caccb', 'cacba', 'cacbc', 'cacbb', 'cabaa', 'cabac', 'cabab', 'cabca', 'cabcc', 'cabcb', 'cabba', 'cabbc', 'cabbb', 'ccaaa', 'ccaac', 'ccaab', 'ccaca', 'ccacc', 'ccacb', 'ccaba', 'ccabc', 'ccabb', 'cccaa', 'cccac', 'cccab', 'cccca', 'ccccc', 'ccccb', 'cccba', 'cccbc', 'cccbb', 'ccbaa', 'ccbac', 'ccbab', 'ccbca', 'ccbcc', 'ccbcb', 'ccbba', 'ccbbc', 'ccbbb', 'cbaaa', 'cbaac', 'cbaab', 'cbaca', 'cbacc', 'cbacb', 'cbaba', 'cbabc', 'cbabb', 'cbcaa', 'cbcac', 'cbcab', 'cbcca', 'cbccc', 'cbccb', 'cbcba', 'cbcbc', 'cbcbb', 'cbbaa', 'cbbac', 'cbbab', 'cbbca', 'cbbcc', 'cbbcb', 'cbbba', 'cbbbc', 'cbbbb', 'baaaa', 'baaac', 'baaab', 'baaca', 'baacc', 'baacb', 'baaba', 'baabc', 'baabb', 'bacaa', 'bacac', 'bacab', 'bacca', 'baccc', 'baccb', 'bacba', 'bacbc', 'bacbb', 'babaa', 'babac', 'babab', 'babca', 'babcc', 'babcb', 'babba', 'babbc', 'babbb', 'bcaaa', 'bcaac', 'bcaab', 'bcaca', 'bcacc', 'bcacb', 'bcaba', 'bcabc', 'bcabb', 'bccaa', 'bccac', 'bccab', 'bccca', 'bcccc', 'bcccb', 'bccba', 'bccbc', 'bccbb', 'bcbaa', 'bcbac', 'bcbab', 'bcbca', 'bcbcc', 'bcbcb', 'bcbba', 'bcbbc', 'bcbbb', 'bbaaa', 'bbaac', 'bbaab', 'bbaca', 'bbacc', 'bbacb', 'bbaba', 'bbabc', 'bbabb', 'bbcaa', 'bbcac', 'bbcab', 'bbcca', 'bbccc', 'bbccb', 'bbcba', 'bbcbc', 'bbcbb', 'bbbaa', 'bbbac', 'bbbab', 'bbbca', 'bbbcc', 'bbbcb', 'bbbba', 'bbbbc', 'bbbbb']

Categories

Resources