Splitting a string in python at the colon character - python

This is my code to split a string list at the colon:
this is more info to maybe help with the question
my_file = open("Accounts.txt", "r")
rawAccounts = my_file.read()
Accounts = []
b = 0
j = 0
x = 0
size = 0
dummy= "c"
lessrawAccounts = rawAccounts.split("\n")
while x != 100000:
size = len(lessrawAccounts[j])
if lessrawAccounts[j[b]] != ":":
Accounts[j[b]] = lessrawAccounts[j[b]]
b = b + 1
else:
j = j + 1
while b <= size:
Accounts[j[b]] = lessrawAccounts[j[b]]
b = b + 1

If you want to store only emails from your list on the basis of semicolons you can use this...
lessrawAccounts = ['JohnDoe#gmail.com:userpass']
Accounts = []
passwords = []
for line in lessrawAccounts:
Accounts.append(line.split(":")[0])
passwords.append(line.split(":")[1])
print(Accounts,passwords)

it would be clearer if you gave examples of the strings you wanted to split.
To answer your "question", the reader needs to parse you code to try to work out what you want to do.
Your question is titled more or less "how to split a string at the : character".
before,_,after = "before:after".partition(":")
the partition function splits a string according to a partition string (it can be more than one character). It returns three values, I have discarded the middle value, since the middle value is the partitioning string.

Related

insert open and close tags at the positions corresponding to a list of tups

I have to mark up a text inserting tags in a string as follows:
mystring = '123456789'
postions = [(2,4),(6,8)]
result = '12<tag1>34</tag1>56<tag2>78</tag2>9'
positions is a list of tuples that gives the start and end of the tag.
(In principle the spans of the tags do not overlap)
I started assumend that I would need all the positions of the tags flatten out:
import itertools
pos2 = list(itertools.chain(*positions))
which gives:
[2, 4, 6, 8]
That allows me to do this:
stringWithTags= ''
opentag=True
nrtag=1
for j,character in enumerate(mystring):
i=j+1
if i in pos2:
if opentag:
toadd = character + '<tag' + str(nrtag) +'>'
else:
toadd = character + '</tag' + str(nrtag) +'>'
nrtag = nrtag + 1
stringWithTags = stringWithTags + toadd
opentag = not(opentag)
else:
stringWithTags = stringWithTags + character
stringWithTags
that works, but its quite horrible code, and incredible verbose.
This problem should be well known and there might be out of the box solutions I am not aware of.
Any suggestions?
EDIT 1: I also considered my code not water tight because the moment the spans overlap this will not fly and a solution valid in both cases
I think that easy way is to split string to list and work with it by indexes from positions like this: split string, loop over positions-list, add prefix tag or suffix tag by position, finally join list to new string.
mystring = '123456789'
postions = [(2,4),(6,8)]
str_list = [c for c in mystring]
for r in postions:
for _, c in enumerate(r):
str_list[c] = ('</tag>{}' if _ else '<tag>{}').format(str_list[c])
new_string = ''.join(str_list)
mystring = list('123456789')
postions = [(2,4),(6,8)]
shift = 0
for pos in postions:
tag = "<tag>"
mystring.insert(pos[0] + shift, tag)
shift += 1
mystring.insert(pos[1] + shift, tag.replace('<', '</'))
shift += 1
print(''.join(mystring))
output: 12<tag>34</tag>56<tag>78</tag>9
you can write more code for tag variable. I just wrote a sample.
assuming the tags are listed in order and they don't cross each other what you could do is :
mystring = '123456789'
positions = [(2,4),(6,8)]
reversed_position = reversed(positions)
nb_tags = len(postions)
for tup in reversed_position:
mystring = mystring[:tup[1]]+f"</tag{nb_tags}>"+mystring[tup[1]:]
mystring = mystring[:tup[0]]+f"<tag{nb_tags}>"+mystring[tup[0]:]
nb_tags-=1
mystring == result returns True

Adding comma every 3 digit problem, in python, what's different using list or using format

I am practicing an algorithm on a website.
I want to add data(number) comma(,) every 3 digit.
But 'a', which variable I made, can't be the collect answer.
But 'b', which variable I searched, is the collect answer.
Can you tell me why 'a' is not the same as 'b'
length = 8
data = "12421421"
inv_result = []
for index in range(length):
if index % 3 == 0:
inv_result.append(',')
inv_result.append(str(data[index]))
else:
inv_result.append(str(data[index]))
result = inv_result[::-1]
#first comma delete
result.pop()
a = ''.join(result)
b = format(int(datas),",")
print(a)
print(b)
print(a == b)
result is
12,412,421
12,421,421
False
Your problem is that you didn't reverse the data in the beginning. The following (slightly cleaned up) code works:
length = 8
data = "12421421"
inv_data = data[::-1]
inv_result = []
for index in range(length):
if index % 3 == 0:
inv_result.append(',')
inv_result.append(str(inv_data[index]))
result = inv_result[::-1]
#first comma delete
result.pop()
a = ''.join(result)
b = format(int(data),",")
print(a)
print(b)
print(a == b)
because you are making it backwards with this line:
result = inv_result[::-1]
If you didn't reverse the order, then you would have the right order.
result = inv_result
result.pop(0) # remove first character which is a comma
But this only works if the number of digits is a multiple of three. For example, if your digits were 1234, then doing it this way would result in 123,4 instead of the desired 1,234.
So you have to reverse the string in the beginning or go through it in reverse order. Then leave the later inversion and pop() like you had it.
for index in range(length):
if index % 3 == 0:
inv_result.append(',')
inv_result.append(str(inv_data[-1-index]))# count from -1 to more negative, equivalent to going backwards through string
result = inv_result[::-1]
#first comma delete
result.pop()
A solution with comprehension:
data = "12421421"
len_data = len(data)
triplets_num = len_data // 3
remainder = len_data % 3
triplets = [data[:remainder]] if remainder else []
triplets += [data[remainder+i*3:remainder+3+i*3] for i in range(triplets_num)]
result = ','.join(triplets)
print(result)

python operators = and == counter

i wrote a .txt file in which I put operators = and ==. I wrote a code which will count number of = and ==, but i dont get correct number.
lexicalClass = file.readlines()
for lex in lexicalClass:
newList = re.findall('\S+', lex)
for element in newList:
if len(re.findall('[a-z]+|[0-9]+', element)):
identifiers.append(re.findall('[a-z]+|[0-9]+', element))
num = len(re.findall('\=', element))
if int(num):
if int(num) % 2 == 1:
for i in range(int((num-1)/2)):
equal.append('==')
assignment.append('=')
else:
for i in range(int(num/2)):
equal.append('==')
print(str(len(equal)))
print(str(len(assignment)))
My .txt file : a == b a = b c = d
And as you can see my output should be 1 and 2, but im getting 0 in both.
You could probably do this with lookahead and lookbehind assertions:
one_equals = r"(?<!=)=(?!=)" # a "=" not followed or preceded by a =
two_equals = r"(?<!=)==(?!=)" # "==" not followed or preceded by a =
assignment = 0
equals = 0
with open("yourfilename.txt") as f:
for line in f:
equal += len(re.findall(one_equals, line))
assignment += len(re.findall(two_equals, line))
If this is Python source code, the correct way to do this is with the ast module, using ast.walk() and counting instances of the ast.Assign and ast.Eq nodes:
import ast
with open("yourfilename.txt") as f:
parsed_source = ast.parse(f.read())
nodes = list(ast.walk(parsed_source))
equals = sum(isinstance(n, ast.Eq) for n in nodes)
assignments = sum(isinstance(n, ast.Assign) for n in nodes)
If you don't really care about the efficiency of algorithm, this is a fairly simple solution:
file = open("asd.txt")
total_double_eq_count = 0
total_single_eq_count = 0
#iterate over the lines of file
for line in file:
#count of '=='s in the line
double_eq_count = line.count("==")
#count of '='s which are not followed by an another '='.
single_eq_count = line.count("=") - 2*double_eq_count
total_double_eq_count += double_eq_count
total_single_eq_count += single_eq_count
print(total_double_eq_count)
print(total_single_eq_count)
But this is relatively fast compared to a equivalent python code since we are using builtin methods for string processing. At least on small inputs.

combinations with python

I am trying to generate combination of ID's
Input: cid = SPARK
oupout: list of all the comibnations as below, position of each element should be constant. I am a beginner in python any help here is much appreciated.
'S****'
'S***K'
'S**R*'
'S**RK'
'S*A**'
'S*A*K'
'S*AR*'
'S*ARK'
'SP***'
'SP**K'
'SP*R*'
'SP*RK'
'SPA**'
'SPA*K'
'SPAR*'
'SPARK'
I tried below, I need a dynamic code:
cid = 'SPARK'
# print(cid.replace(cid[1],'*'))
# cu_len = lenth of cid [SPARK] here which is 5
# com_stars = how many stars i.e '*' or '**'
def cubiod_combo_gen(cu_len, com_stars, j_ite, i_ite):
cubiodList = []
crange = cu_len
i = i_ite #2 #3
j = j_ite #1
# com_stars = ['*','**','***','****']
while( i <= crange):
# print(j,i)
if len(com_stars) == 1:
x = len(com_stars)
n_cid = cid.replace(cid[j:i],com_stars)
i += x
j += x
cubiodList.append(n_cid)
elif len(com_stars) == 2:
x = len(com_stars)
n_cid = cid.replace(cid[j:i],com_stars)
i += x
j += x
cubiodList.append(n_cid)
elif len(com_stars) == 3:
x = len(com_stars)
n_cid = cid.replace(cid[j:i],com_stars)
i += x
j += x
cubiodList.append(n_cid)
return cubiodList
#print(i)
#print(n_cid)
# for item in cubiodList:
# print(item)
print(cubiod_combo_gen(5,'*',1,2))
print(cubiod_combo_gen(5,'**',1,3))
For every character in your given string, you can represent it as a binary string, using a 1 for a character that stays the same and a 0 for a character to replace with an asterisk.
def cubiod_combo_gen(string, count_star):
str_list = [char0 for char0 in string] # a list with the characters of the string
itercount = 2 ** (len(str_list)) # 2 to the power of the length of the input string
results = []
for config in range(itercount):
# return a string of i in binary representation
binary_repr = bin(config)[2:]
while len(binary_repr) < len(str_list):
binary_repr = '0' + binary_repr # add padding
# construct a list with asterisks
i = -1
result_list = str_list.copy() # soft copy, this made me spend like 10 minutes debugging lol
for char in binary_repr:
i += 1
if char == '0':
result_list[i] = '*'
if char == '1':
result_list[i] = str_list[i]
# now we have a possible string value
if result_list.count('*') == count_star:
# convert back to string and add to list of accepted strings
result = ''
for i in result_list:
result = result + i
results.append(result)
return results
# this function returns the value, so you have to use `print(cubiod_combo_gen(args))`
# comment this stuff out if you don't want an interactive user prompt
string = input('Enter a string : ')
count_star = input('Enter number of stars : ')
print(cubiod_combo_gen(string, int(count_star)))
It iterates through 16 characters in about 4 seconds and 18 characters in about 17 seconds. Also you made a typo on "cuboid" but I left the original spelling
Enter a string : DPSCT
Enter number of stars : 2
['**SCT', '*P*CT', '*PS*T', '*PSC*', 'D**CT', 'D*S*T', 'D*SC*', 'DP**T', 'DP*C*', 'DPS**']
As a side effect of this binary counting, the list is ordered by the asterisks, where the earliest asterisk takes precedence, with next earliest asterisks breaking ties.
If you want a cumulative count like 1, 4, 5, and 6 asterisks from for example "ABCDEFG", you can use something like
star_counts = (1, 4, 5, 6)
string = 'ABCDEFG'
for i in star_counts:
print(cubiod_combo_gen(string, star_counts))
If you want the nice formatting you have in your answer, try adding this block at the end of your code:
def formatted_cuboid(string, count_star):
values = cubiod_combo_gen(string, count_star)
for i in values:
print(values[i])
I honestly do not know what your j_ite and i_ite are, but it seems like they have no use so this should work. If you still want to pass these arguments, change the first line to def cubiod_combo_gen(string, count_star, *args, **kwargs):
I am not sure what com_stars does, but to produce your sample output, the following code does.
def cuboid_combo(cid):
fill_len = len(cid)-1
items = []
for i in range(2 ** fill_len):
binary = f'{i:0{fill_len}b}'
#print(binary, 'binary', 'num', i)
s = cid[0]
for idx, bit in enumerate(binary,start=1):
if bit == '0':
s += '*'
else: # 'bit' == 1
s += cid[idx]
items.append(s)
return items
#cid = 'ABCDEFGHI'
cid = 'DPSCT'
result = cuboid_combo(cid)
for item in result:
print(item)
Prints:
D****
D***T
D**C*
D**CT
D*S**
D*S*T
D*SC*
D*SCT
DP***
DP**T
DP*C*
DP*CT
DPS**
DPS*T
DPSC*
DPSCT

Python - Split a string into list after a certain number of special characters

I have a python program which does a SOAP request to a server, and it works fine:
I get the answer from the server, parse it, clean it, and when I am done, I end up with a string like that:
name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|...
Basically, it is a string with values delimited by "|". I also know the structure of the database I am requesting, so I know that it has 6 columns and various rows. I basically need to split the string after every 6th "|" character, to obtain something like:
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|...
Can you tell me how to do that in Python? Thank you!
Here's a functional-style solution.
s = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
for row in map('|'.join, zip(*[iter(s.split('|'))] * 6)):
print(row + '|')
output
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|
For info on how zip(*[iter(seq)] * rowsize) works, please see the links at Splitting a list into even chunks.
data = "name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|"
splits = data.split('|')
splits = list(filter(None, splits)) # Filter empty strings
row_len = 6
rows = ['|'.join(splits[i:i + row_len]) + '|' for i in range(0, len(splits), row_len)]
print(rows)
>>> ['name|value|value_name|default|seq|last_modify|', 'record_type|1|Detail|0|0|20150807115904|', 'zero_out|0|No|0|0|20150807115911|', 'out_ind|1|Partially ZeroOut|0|0|20150807115911|']
How about this:
a = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
b = a.split('|')
c = [b[6*i:6*(i+1)] for i in range(len(b)//6)] # this is a very workable form of data storage
print('\n'.join('|'.join(i) for i in c)) # produces your desired output
# prints:
# name|value|value_name|default|seq|last_modify
# record_type|1|Detail|0|0|20150807115904
# zero_out|0|No|0|0|20150807115911
# out_ind|1|Partially ZeroOut|0|0|20150807115911
Here is a flexible generator approach:
def splitOnNth(s,d,n, keep = False):
i = s.find(d)
j = 1
while True:
while i > 0 and j%n != 0:
i = s.find(d,i+1)
j += 1
if i < 0:
yield s
return #end generator
else:
yield s[:i+1] if keep else s[:i]
s = s[i+1:]
i = s.find(d)
j = 1
#test runs, showing `keep` in action:
test = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
for s in splitOnNth(test,'|',6,True): print(s)
print('')
for s in splitOnNth(test,'|',6): print(s)
Output:
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|
name|value|value_name|default|seq|last_modify
record_type|1|Detail|0|0|20150807115904
zero_out|0|No|0|0|20150807115911
out_ind|1|Partially ZeroOut|0|0|20150807115911
There are really many ways to do it. Even with a loop:
a = 'name|value|value_name|default|seq|last_modify|record_type|1|Detail|0|0|20150807115904' \
'|zero_out|0|No|0|0|20150807115911|out_ind|1|Partially ZeroOut|0|0|20150807115911|'
new_a = []
ind_start, ind_end = 0, 0
for i in range(a.count('|')// 6):
for i in range(6):
ind_end = a.index('|', ind_end+1)
print(a[ind_start:ind_end + 1])
new_a.append(a[ind_start:ind_end+1])
ind_start = ind_end+1
The print is just to saw the results, you remove it:
name|value|value_name|default|seq|last_modify|
record_type|1|Detail|0|0|20150807115904|
zero_out|0|No|0|0|20150807115911|
out_ind|1|Partially ZeroOut|0|0|20150807115911|

Categories

Resources