How to combine 3 validation checks into one loop? - python

For example I have a tuple such as
tup = [['P Y T F EY EN', 'p y t h o n'], ['R O K', 'r o x']]
I then separate the tuple into lists such as
lst1 = [['P', 'Y', 'T', 'F', 'EY', 'EN'], ['R', 'O', 'K']]
lst2 = [['p', 'y', 't', 'h', 'o', 'n'], ['r', 'o', 'x']]
The 3 conditions I have are as follows:
First the length of the 1st element in the tuple must be equal to that of the 2nd
for i in tup:
if not len(tup[0].split()) == len(tup[1].split()) :
count +=1
break
2nd condition is that for every element in lst1, each character in the element must be in another document such as a csv file
for i in lst1:
for j in i:
if j not in file:
count+=1
break
3rd condition is that every element in lst2, each character must also be in another document
for i in lst2:
for j in i:
if j not in other_file:
count+=1
break
As you can see I want the count to increase each time one of these conditions is broken. I also don't want the counts to overlap and to skip onto the next row if a condition is broken while appending to the count.

Maybe this will help:
I am assuming the files are small enough to be read all at once:
f = open('doc1.csv', 'r') # read all of doc1.csv now
doc1 = f.read()
f.close()
f = open('doc2.csv', 'r') # read all of doc2.csv now
doc2 = f.read()
f.close()
count = 0 # count of all docs that are invalid
for item in tup:
l1 = item[0].split() # get list version of first and string
l2 = item[1].split()
if len(l1) != len(l2) or not all([char in doc1 for char in l1]) or not all([char in doc2 for char in l2]): # check if lengths are same, if any character in l1 is not in doc1, and any char in l2 is not in doc2
count += 1
print count

First of all, there are two issues with your example:
1) tup is a list, not tuple;
2) tup[0] = ['P Y T F EY EN', 'p y t h o n']; tup[1] = ['R O K', 'r o x'];
Both of them are list, and cannot do split()
If you would like to calculate the total count, you could do in one statement like the following:
print sum([ not len(i[0].split()) == len(i[1].split()) for i in tup ] + \
[ j not in file for j in i for i in lst1 ] + \
[ j not in other_file for j in i for i in lst2 ])

Related

how to separate alternating digits and characters in string into dict or list?

'L134e2t1C1o1d1e1'
the original string is "LeetCode"
but I need to separate strings from digits, digits can be not only single-digit but also 3-4 digits numbers like 345.
My code needs to separate into dict of key values; keys are characters and numbers is the digit right after the character. Also create 2 lists of separate digits, letters only.
expected output:
letters = ['L', 'e', 't', 'C', 'o', 'd', 'e']
digits = [134,2,1,1,1,1,1]
This code is not properly processing this.
def f(s):
d = dict()
letters = list()
# letters = list(filter(lambda x: not x.isdigit(), s))
i = 0
while i < len(s):
print('----------------------')
if not s[i].isdigit():
letters.append(s[i])
else:
j = i
temp = ''
while j < len(s) and s[j].isdigit():
j += 1
substr = s[i:j]
print(substr)
i += 1
print('----END -')
print(letters)
With the following modification your function separates letters from digits in s:
def f(s):
letters = list()
digits = list()
i = 0
while i < len(s):
if not s[i].isdigit():
letters.append(s[i])
i += 1
else:
j = i
temp = ''
while j < len(s) and s[j].isdigit():
j += 1
substr = s[i:j]
i = j
digits.append(substr)
print(letters)
print(digits)
f('L134e2t1C1o1d1e1')
As said in my comments you didn't update i after the inner loop terminates which made i go back to a previous and already processed index.
If I would be limited to not use regex I would do it following way
text = 'L134e2t1C1o1d1e1'
letters = [i for i in text if i.isalpha()]
digits = ''.join(i if i.isdigit() else ' ' for i in text).split()
print(letters)
print(digits)
output
['L', 'e', 't', 'C', 'o', 'd', 'e']
['134', '2', '1', '1', '1', '1', '1']
Explanation: for letters I use simple list comprehension with condition, .isalpha() is str method which check if string (in this consisting of one character) is alphabetic. For digits (which should be rather called numbers) I replace non-digits using single space, turn that into string using ''.join then use .split() (it does split on one or more whitespaces). Note that digits is now list of strs rather than ints, if that is desired add following line:
digits = list(map(int,digits))
Your string only had two e's, so I've added one more to complete the example. This is one way you could do it:
import re
t = 'L1e34e2t1C1o1d1e1'
print(re.sub('[^a-zA-Z]', '', t))
Result:
LeetCode
I know you cannot use regex, but to complete this answer, I'll just add a solution:
def f(s):
d = re.findall('[0-9]+', s)
l = re.findall('[a-zA-Z]', s)
print(d)
print(l)
f(t)
Result:
['134', '2', '1', '1', '1', '1', '1']
['L', 'e', 't', 'C', 'o', 'd', 'e']
You edited your question and I got a bit confused, so here is a really exhaustive code giving you a list of letters, list of the numbers, the dict with the number associated with the number, and finally the sentence with corresponding number of characters ...
def f(s):
letters = [c for c in s if c.isalpha()]
numbers = [c for c in s if c.isdigit()]
mydict = {}
currentKey = ""
for c in s:
print(c)
if c.isalpha():
mydict[c] = [] if c not in mydict.keys() else mydict[c]
currentKey = c
elif c.isdigit():
mydict[currentKey].append(c)
sentence = ""
for i in range(len(letters)):
count = int(numbers[i])
while count > 0:
sentence += letters[i]
count -= 1
print(letters)
print(numbers)
print(mydict)
print(sentence)
letters = []
digits = []
dig = ""
for letter in 'L134e2t1C1o1d1e1':
if letter.isalpha():
# do not add empty string to list
if dig:
# append dig to list of digits
digits.append(dig)
dig = ""
letters.append(letter)
# if it is a actual letter continue
continue
# add digits to `dig`
dig = dig + letter
Try this. The idea is to skip all actual letters and add the digits to dig.
I know there's an accepted answer but I'll throw this one in anyway:
letters = []
digits = []
lc = 'L134e2t1C1o1d1e1'
n = None
for c in lc:
if c.isalpha():
if n is not None:
digits.append(n)
n = None
letters.append(c)
else:
if n is None:
n = int(c)
else:
n *= 10
n += int(c)
if n is not None:
digits.append(n)
for k, v in zip(letters, digits):
dct.setdefault(k, []).append(v)
print(letters)
print(digits)
print(dct)
Output:
['L', 'e', 't', 'C', 'o', 'd', 'e']
[134, 2, 1, 1, 1, 1, 1]
{'L': [134], 'e': [2, 1], 't': [1], 'C': [1], 'o': [1], 'd': [1]}

replace each character with the first greater character after it

I am trying to replace each character with the next greater character after it and if there is no greater character after it then replace it by '#'
if I have a circular list of characters [K, M, Y, R, E, J, A]
the output should be [M, Y, #, Y, J, K, K]
'M is greater than k so replace K with M and Y is greater than M so replace M with Y and so one ( greater here means comes after it )'
this is the code that I have tried but it gives wrong output
def que1(input_list):
for i in range (len (input_list)-1) :
j=i+1
for j in range (len(input_list)-1):
if input_list[i]<input_list[j]:
input_list[i]=input_list[j]
if input_list[i]>input_list[j]:
input_list[i]='#'
return input_list
Another way to do it :
def que1(i_list):
input_list=i_list*2
output_list=[]
for i in range (len (i_list)) :
found=False
for j in range (i+1,len(input_list)-1):
if input_list[i]<input_list[j]:
output_list.append(input_list[j])
found=True
break
if not found:
output_list.append('#')
return output_list
output:
['M', 'Y', '#', 'Y', 'J', 'K', 'K']
The following code should do what you want.
def que1(input_list):
output_list = []
# loop through the input list with index
for index, c in enumerate(input_list):
# split list at c and create new list
# [all greater characters after c, all greater character before c]
tmp = [i for i in input_list[index:] if i > c] + [i for i in input_list[:index] if i > c]
# if list empty there is no greater character
if len(tmp) == 0:
output_list.append('#')
# else choose the first greater character
else:
output_list.append(tmp[0])
return output_list
Input: ['K', 'M', 'Y', 'R', 'E', 'J', 'A']
Output: ['M', 'Y', '#', 'Y', 'J', 'K', 'K']
Here is another way to do it.
def next_greatest(data):
out = []
for idx in range(len(data)):
char = data[idx]
cs = iter(data[idx + 1:] + data[:idx])
while True:
try:
curr = next(cs)
except StopIteration:
out.append("#")
break
if curr > char:
out.append(curr)
break
return out
next_greatest(chars)
>> ['M', 'Y', '#', 'Y', 'J', 'K', 'K']

How to get certain number of alphabets from a list?

I have a 26-digit list. I want to print out a list of alphabets according to the numbers. For example, I have a list(consisting of 26-numbers from input):
[0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
I did like the output to be like this:
[e,e,l,s]
'e' is on the output 2-times because on the 4-th index it is the 'e' according to the English alphabet formation and the digit on the 4-th index is 2. It's the same for 'l' since it is on the 11-th index and it's digit is 1. The same is for s. The other letters doesn't appear because it's digits are zero.
For example, I give another 26-digit input. Like this:
[1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
The output should be:
[a,b,b,c,c,d,d,d,e,e,e,e,g,g,g,h,h,h,h,i,i,i,i,j,k,k,k,l,m,m,m,m,n,n,n,n,o,u,u,u,u,v,v,w,w,w,x,x,y,y,z]
Is, there any possible to do this in Python 3?
You can use chr(97 + item_index) to get the respective items and then multiply by the item itself:
In [40]: [j * chr(97 + i) for i, j in enumerate(lst) if j]
Out[40]: ['ee', 'l', 's']
If you want them separate you can utilize itertools module:
In [44]: from itertools import repeat, chain
In [45]: list(chain.from_iterable(repeat(chr(97 + i), j) for i, j in enumerate(lst) if j))
Out[45]: ['e', 'e', 'l', 's']
Yes, it is definitely possible in Python 3.
Firstly, define an example list (as you did) of numbers and an empty list to store the alphabetical results.
The actual logic to link with the index is using chr(97 + index), ord("a") = 97 therefore, the reverse is chr(97) = a. First index is 0 so 97 remains as it is and as it iterates the count increases and your alphabets too.
Next, a nested for-loop to iterate over the list of numbers and then another for-loop to append the same alphabet multiple times according to the number list.
We could do this -> result.append(chr(97 + i) * my_list[i]) in the first loop itself but it wouldn't yield every alphabet separately [a,b,b,c,c,d,d,d...] rather it would look like [a,bb,cc,ddd...].
my_list = [1,2,2,3,4,0,3,4,4,1,3,1,4,4,1,0,0,0,0,0,4,2,3,2,2,1]
result = []
for i in range(len(my_list)):
if my_list[i] > 0:
for j in range(my_list[i]):
result.append(chr(97 + i))
else:
pass
print(result)
An alternative to the wonderful answer by #Kasramvd
import string
n = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
res = [i * c for i, c in zip(n, string.ascii_lowercase) if i]
print(res) # -> ['ee', 'l', 's']
Your second example produces:
['a', 'bb', 'cc', 'ddd', 'eeee', 'ggg', 'hhhh', 'iiii', 'j', 'kkk', 'l', 'mmmm', 'nnnn', 'o', 'uuuu', 'vv', 'www', 'xx', 'yy', 'z']
Splitting the strings ('bb' to 'b', 'b') can be done with the standard schema:
[x for y in something for x in y]
Using a slightly different approach, which gives the characters individually as in your example:
import string
a = [0,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0]
alphabet_lookup = np.repeat(np.arange(len(a)), a)
letter_lookup = np.array(list(string.ascii_lowercase))
res = letter_lookup[alphabet_lookup]
print(res)
To get
['e' 'e' 'l' 's']

Split string > list of sublists of words and characters

No imports allowed (it's a school assignment).
Wish to split a random string into a list of sublists. Words in a sublist, all other characters (including whitespace) would be in a sublist containing only one item. Anyone have some advice on how to do this;
part = "Hi! Goodmorning, I'm fine."
list = [[H,i],[!],[_],[G,o,o,d,m,o,r,n,i,n,g],[,],[_],[I],['],[m],[_],[f,i,n,e],[.]]
This does the trick:
globalList = []
letters = "abcdefghijklmnopqrstuvwxyz"
message = "Hi! Goodmorning, I'm fine."
sublist = []
for char in message:
#if the character is in the list of letters, append it to the current substring
if char.lower() in letters:
sublist.append(char)
else:
#add the previous sublist (aka word) to globalList, if it is not empty
if sublist:
globalList.append(sublist)
#adds the single non-letter character to the globalList
globalList.append([char])
#initiates a fresh new sublist
sublist = []
print(globalList)
#output is [['H', 'i'], ['!'], [' '], ['G', 'o', 'o', 'd', 'm', 'o', 'r', 'n', 'i', 'n', 'g'], [','], [' '], ['I'], ["'"], ['m'], [' '], ['f', 'i', 'n', 'e'], ['.']]
Try this out :
part = "Hi! Goodmorning, I'm fine."
n = part.count(" ")
part = part.split()
k = 0
# Add spaces to the list
for i in range(1,n+1):
part.insert(i+k, "_")
k += 1
new = [] # list to return
for s in part:
new.append([letter for letter in s])
part = "Hi! Goodmorning, I'm fine."
a = []
b = []
c = 0
for i in part:
if i.isalpha():
if c == 1:
a.append(b)
b=[]
b.append(i)
c = 0
else:
b.append(i)
else:
a.append(b)
b=[]
b.append(i)
c = 1
a.append(b)
print a

Read all possible sequential substrings in Python

If I have a list of letters, such as:
word = ['W','I','N','E']
and need to get every possible sequence of substrings, of length 3 or less, e.g.:
W I N E, WI N E, WI NE, W IN E, WIN E etc.
What is the most efficient way to go about this?
Right now, I have:
word = ['W','I','N','E']
for idx,phon in enumerate(word):
phon_seq = ""
for p_len in range(3):
if idx-p_len >= 0:
phon_seq = " ".join(word[idx-(p_len):idx+1])
print(phon_seq)
This just gives me the below, rather than the sub-sequences:
W
I
W I
N
I N
W I N
E
N E
I N E
I just can't figure out how to create every possible sequence.
Try this recursive algorithm:
def segment(word):
def sub(w):
if len(w) == 0:
yield []
for i in xrange(1, min(4, len(w) + 1)):
for s in sub(w[i:]):
yield [''.join(w[:i])] + s
return list(sub(word))
# And if you want a list of strings:
def str_segment(word):
return [' '.join(w) for w in segment(word)]
Output:
>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]
>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']
As there can either be a space or not in each of three positions (after W, after I and after N), you can think of this as similar to bits being 1 or 0 in a binary representation of a number ranging from 1 to 2^3 - 1.
input_word = "WINE"
for variation_number in xrange(1, 2 ** (len(input_word) - 1)):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
Edit: To include only variations with sequences of 3 characters or less (in the general case where input_word may be longer than 4 characters), we can exclude cases where the binary representation contains 3 zeroes in a row. (We also start the range from a higher number in order to exclude the cases which would have 000 at the beginning.)
for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)):
if not '000' in bin(variation_number):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
My implementation for this problem.
#!/usr/bin/env python
# this is a problem of fitting partitions in the word
# we'll use itertools to generate these partitions
import itertools
word = 'WINE'
# this loop generates all possible partitions COUNTS (up to word length)
for partitions_count in range(1, len(word)+1):
# this loop generates all possible combinations based on count
for partitions in itertools.combinations(range(1, len(word)), r=partitions_count):
# because of the way python splits words, we only care about the
# difference *between* partitions, and not their distance from the
# word's beginning
diffs = list(partitions)
for i in xrange(len(partitions)-1):
diffs[i+1] -= partitions[i]
# first, the whole word is up for taking by partitions
splits = [word]
# partition the word's remainder (what was not already "taken")
# with each partition
for p in diffs:
remainder = splits.pop()
splits.append(remainder[:p])
splits.append(remainder[p:])
# print the result
print splits
As an alternative answer , you can do it with itertools module and use groupby function for grouping your list and also i use combination to create a list of pair index for grouping key : (i<=word.index(x)<=j) and at last use set for get a unique list .
Also note that you can got a unique combination of pair index at first by this method that when you have pairs like (i1,j1) and (i2,j2) if i1==0 and j2==3 and j1==i2 like (0,2) and (2,3) it mean that those slices result are same you need to remove one of them.
All in one list comprehension :
subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]]
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
Demo in details :
>>> cl=list(combinations(range(len(word)),2))
>>> cl
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl]
>>> new_l
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]]
>>> last=[[''.join(i) for i in j] for j in new_l]
>>> last
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']]
>>> set([' '.join(j) for j in last])
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
>>> for i in set([' '.join(j) for j in last]):
... print i
...
WIN E
W IN E
W INE
WI NE
WINE
>>>
i think it can be like this:
word = "ABCDE"
myList = []
for i in range(1, len(word)+1,1):
myList.append(word[:i])
for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1):
myList.append(word[j:i])
print(myList)
print(sorted(set(myList), key=myList.index))
return myList

Categories

Resources