Insert value at multiple specified positions into a python string/array - python

I would like to insert values at multiple specified positions into a python string/array.
eg for my input string : SARLSAMLVPVTPEVKPK
at specified positions: 1,5,12
the desired output: S*ARLS*AMLVPVT*PEVKPK
I tried:
seq="SARLSAMLVPVTPEVKPK" #string
pos=[1,5,12] #positions
arr=list(seq) #convert string to array
arr.insert(pos,"*") # NOT WORK!
arr.insert(pos[0],"*")
print(''.join(arr))
It seems I can only insert a position at a time and thus the indices of the specified positions for the next insert would have to change.
Is there an elegant way of doing this or would I have to loop through the insert positions adding +1 for each additional insert position?
I hope this make sense!
Many thanks,
Curly.

Just insert them in reverse order:
seq="SARLSAMLVPVTPEVKPK" #string
pos=[1,5,12] #positions
arr = list(seq)
for idx in sorted(pos, reverse=True):
arr.insert(idx,"*")
print ''.join(arr)

Something like this would do:
seq="SARLSAMLVPVTPEVKPK" #string
pos=[1,5,12] #positions
arr=list(seq) #convert string to array
_ = map(lambda k: arr.insert(k, "*"), pos[::-1])
print(''.join(arr))
or
seq="SARLSAMLVPVTPEVKPK" #string
pos=[1,5,12] #positions
arr=list(seq) #convert string to array
for k in pos[::-1]:
arr.insert(k, "*")
print(''.join(arr))

Simple Way:
temp = ""
temp += seq[:pos[0]]
temp += "*"
for i in range(1,len(pos)):
temp += seq[pos[i-1]:pos[i]]
temp += "*"
temp += seq[pos[-1]:]
print (temp) # 'S*ARLS*AMLVPVT*PEVKPK'

Related

How to convert numeric string from a sublist in Python

I'm a freshie. I would like to convert a numeric string into int from a sublist in Python. But not getting accurate results. 😔
countitem = 0
list_samp = [['1','2','blue'],['1','66','green'],['1','88','purple']]
for list in list_samp:
countitem =+1
for element in list:
convert_element = int(list_samp[countitem][0])
list_samp[countitem][1] = convert_element
You can do it like this:
list_samp = [['1','2','blue'],['1','66','green'],['1','88','purple']]
me = [[int(u) if u.isdecimal() else u for u in v] for v in list_samp]
print(me)
The correct way to do it:
list_samp = [['1','2','blue'],['1','66','green'],['1','88','purple']]
list_int = [[int(i) if i.isdecimal() else i for i in l] for l in list_samp]
print(list_int)
Let's go through the process step-by-step
countitem = 0
list_samp = [['1','2','blue'],['1','66','green'],['1','88','purple']]
#Let's traverse through the list
for list in list_samp: #gives each list
for i in range(len(list)): # get index of each element in sub list
if list[i].isnumeric(): # Check if all characters in the string is a number
list[i] = int(list[i]) # store the converted integer in the index i

specific characters printing with Python

given a string as shown below,
"[xyx],[abc].[cfd],[abc].[dgr],[abc]"
how to print it like shown below ?
1.[xyz]
2.[cfd]
3.[dgr]
The original string will always maintain the above-mentioned format.
I did not realize you had periods and commas... that adds a bit of trickery. You have to split on the periods too
I would use something like this...
list_to_parse = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
count = 0
for i in list_to_parse.split('.'):
for j in i.split(','):
string = str(count + 1) + "." + j
if string:
count += 1
print(string)
string = None
Another option is split on the left bracket, and then just re-add it with enumerate - then strip commas and periods - this method is also probably a tiny bit faster, as it's not a loop inside a loop
list_to_parse = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
for index, i in enumerate(list.split('[')):
if i:
print(str(index) + ".[" + i.rstrip(',.'))
also strip is really "what characters to remove" not a specific pattern. so you can add any characters you want removed from the right, and it will work through the list until it hits a character it can't remove. there is also lstrip() and strip()
string manipulation can always get tricky, so pay attention. as this will output a blank first object, so index zero isn't printed etc... always practice and learn your needs :D
You can use split() function:
a = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
desired_strings = [i.split(',')[0] for i in a.split('.')]
for i,string in enumerate(desired_strings):
print(f"{i+1}.{string}")
This is just a fun way to solve it:
lst = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
count = 1
var = 1
for char in range(0, len(lst), 6):
if var % 2:
print(f"{count}.{lst[char:char + 5]}")
count += 1
var += 1
output:
1.[xyx]
2.[cfd]
3.[dgr]
explanation : "[" appears in these indexes: 0, 6, 12, etc. var is for skipping the next pair. count is the counting variable.
Here we can squeeze the above code using list comprehension and slicing instead of those flag variables. It's now more Pythonic:
lst = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
lst = [lst[i:i+5] for i in range(0, len(lst), 6)][::2]
res = (f"{i}.{item}" for i, item in enumerate(lst, 1))
print("\n".join(res))
You can use RegEx:
import regex as re
pattern=r"(\[[a-zA-Z]*\])\,\[[a-zA-Z]*\]\.?"
results=re.findall(pattern, '[xyx],[abc].[cfd],[abc].[dgr],[abc]')
print(results)
Using re.findall:
import re
s = "[xyx],[abc].[cfd],[abc].[dgr],[abc]"
print('\n'.join(f'{i+1}.{x}' for i,x in
enumerate(re.findall(r'(\[[^]]+\])(?=,)', s))))
Output:
1.[xyx]
2.[cfd]
3.[dgr]

How to separate different input formats from the same text file with Python

I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end

Add string in a certain interval position in Python

I have string like this 718868538ddwe. I want to insert back slash ("\") at interval of 3.
I need output like this: 718\868\538\ddw\e.
You can use str.join with a list comprehension:
x = '718868538ddwe'
res = '\\'.join([x[3*i: 3*(i+1)] for i in range(len(x) // 3 + 1)])
print(res)
# 718\868\538\ddw\e
def chunks(input_str):
current = input_str
while current:
next, current = current[:3], current[3:]
yield next
str = ''.join([chunk + '/' for chunk in chunks(input_str)])

Concatenate strings if they have an overlapping region

I am trying to write a script that will find strings that share an overlapping region of 5 letters at the beginning or end of each string (shown in example below).
facgakfjeakfjekfzpgghi
pgghiaewkfjaekfjkjakjfkj
kjfkjaejfaefkajewf
I am trying to create a new string which concatenates all three, so the output would be:
facgakfjeakfjekfzpgghiaewkfjaekfjkjakjfkjaejfaefkajewf
Edit:
This is the input:
x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
**the list is not ordered
What I've written so far *but is not correct:
def findOverlap(seq)
i = 0
while i < len(seq):
for x[i]:
#check if x[0:5] == [:5] elsewhere
x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
findOverlap(x)
Create a dictionary mapping the first 5 characters of each string to its tail
strings = {s[:5]: s[5:] for s in x}
and a set of all the suffixes:
suffixes = set(s[-5:] for s in x)
Now find the string whose prefix does not match any suffix:
prefix = next(p for p in strings if p not in suffixes)
Now we can follow the chain of strings:
result = [prefix]
while prefix in strings:
result.append(strings[prefix])
prefix = strings[prefix][-5:]
print "".join(result)
A brute-force approach - do all combinations and return the first that matches linking terms:
def solution(x):
from itertools import permutations
for perm in permutations(x):
linked = [perm[i][:-5] for i in range(len(perm)-1)
if perm[i][-5:]==perm[i+1][:5]]
if len(perm)-1==len(linked):
return "".join(linked)+perm[-1]
return None
x = ('facgakfjeakfjekfzpgghi', 'kjfkjaejfaefkajewf', 'pgghiaewkfjaekfjkjakjfkj')
print solution(x)
Loop over each pair of candidates, reverse the second string and use the answer from here

Categories

Resources