pattern finding in a string python - python

I try to create a modified LZW which will find patterns of words inside a string. My problem is that 1st element is '' and last is not checked if it is in the list. I saw the pseudo-code from here : https://www.cs.duke.edu/csed/curious/compression/lzw.html . Here is my script for compression:
string = 'this is a test a test this is pokemon'
diction = []
x = ""
count = 0
for c in string.split():
print (c)
print (x)
#x = x + " " + c
if x in diction:
x += " " + c
#print("debug")
else:
#print(x)
diction.append(x)
x = c
count +=1
#print(count)
print (diction)
I tried to fix the 2nd problem by 'appending' a random word to the end of the string but I don't think that's the best solution.
For the 1st problem I tried just to define the variable "x" as str or None but I get this < class 'str' > inside the list.

The link deals with character and splitting a string will give an array of words.
In order to get not an empty string in the dictionary and parsing the last element.
string = 'this is a test a test this is pokemon'
diction = []
x = ""
count = 0
for c in string.split():
print (c)
if x+" "+c in diction:
x += " " + c
else:
diction.append(x+" "+c)
x = c
count +=1
print (diction)
But perhaps you would like something like :
string = 'this is a test a test this is pokemon'
diction = []
x = ""
count = 0
for c in string:
print (c)
if x+c in diction:
x += c
else:
diction.append(x+c)
x = c
count +=1
print (diction)

I'm not sure what the code pretends, but to fix the issues that you mentioned I think you could do this:
string = 'this is a test a test this is pokemon'
diction = []
x = None
count = 0
for c in string.split():
if x in diction:
x += " " + c
else:
if x: diction.append(x)
x = c
count += 1
if not x in diction: diction.append(x)
print (diction)
The output for that code would be:
['this', 'is', 'a', 'test', 'a test', 'this is', 'pokemon']

Related

How can I fix this error for popping a word in a list/string? (Python 3.x)

I'm not exactly the kind of guy you call "good" at coding. In this particular scenario, on line 13, I'm trying to pop the first word in the list until I'm done, but it keeps giving me the 'str' object can not be interpreted as an integer issue.
What am I doing wrong here?
n = n.split(" ")
N = n[0]
K = n[1]
f1 = input()
f1 = f1.split(" ")
f1 = list(f1)
current = 0
for x in f1:
while current <= 7:
print(x)
f1 = list(f1.pop()[0])
current = current + len(x)
if current > 7:
print("\n")
current = 0
According your comments, this program will split lines to contain max K characters:
K = 7
s = "hello my name is Bessie and this is my essay"
out, cnt = [], 0
for word in s.split():
l = len(word)
if cnt + l <= K:
cnt += l
if not out:
out.append([word])
else:
out[-1].append(word)
else:
cnt = l
out.append([word])
print("\n".join(" ".join(line) for line in out))
Prints:
hello my
name is
Bessie
and this
is my
essay
You could try splitting the string on the index, and inserting a newline there. Each time you do this your string gets one character longer, so we can use enumerate (which starts counting at zero) to add a number to our slice indexes.
s = 'Thanks for helping me'
new_line_index = [7,11, 19]
for i, x in enumerate(new_line_index):
s = s[:x+i] + '\n' + s[x+i:]
print(s)
Output
Thanks
for
helping
me

Python: how to replace a substring with a number of its occurences?

Let's say I have a string presented in the following fashion:
st = 'abbbccccaaaAAbccc'
The task is to encode it so that single characters are followed by a number of their occurences:
st = 'a1b3c4a3A2b1c3'
I know one possible solution but it's too bulky and primitive.
s = str(input())
l = len(s)-1
c = 1
t = ''
if len(s)==1:
t = t +s+str(c)
else:
for i in range(0,l):
if s[i]==s[i+1]:
c +=1
elif s[i]!=s[i+1]:
t = t + s[i]+str(c)
c = 1
for j in range(l,l+1):
if s[-1]==s[-2]:
t = t +s[j]+str(c)
elif s[-1]!=s[-2]:
t = t +s[j]+str(c)
c = 1
print(t)
Is there any way to solve this shortly and elegantly?
P.S: I'm an unexperienced Python user and a new StackOverflow member, so I beg my pardon if the question is asked incorrectly.
Take advantage of the standard library:
from itertools import groupby
st = "abbbccccaaaAAbccc"
print("".join("{}{}".format(key, len(list(group))) for key, group in groupby(st)))
Output:
a1b3c4a3A2b1c3
>>>
just loop through and count. There are more graceful snippets but this will get the job done and is clear:
count = 1
char = st[0]
new_st = []
for c in st[1:]:
if c == char:
count += 1
else:
new_st.append(char + str(count))
char = c
count = 1
new_st.append(char + str(count))
s2= "".join(new_st)
print(s2) # 'a1b3c4a3A2b1c3'
If you want a fancy recursive solution:
def compose(s):
if not s:
return ""
count = 1
for char in s[1:]:
if s[0] != char:
break
count += 1
return s[0] + str(count) + compose(s[count:])

How to concatenate line space string in python

I have a python code which is on seperate line and want to get a response with "\n". When I write a code
txt="""a
b
c
d"""
txt = str.join(" ", txt.splitlines())
x = txt.split()
s = ""
for item in x:
s += item + "\ n"
print(s)
it gives me response correctly because i have a space:
a\ nb\ nc\ nd\ n
But if i take sspace between \ and n I get response back as
a
b
c
d
I want one blob and should be give me response as a blob of one character.
Great thank you. I just had to add two // and it worked.
txt="""a
b
c
d"""
txt = str.join(" ", txt.splitlines())
x = txt.split()
s = ""
for item in x:
s += item + "\\n"
print(s)

How to use a dictionary for the following string?

I am developing a script which is used to count the elements from a sequence given. I already found a way to improve this task, but i was wondering if it is possible to use a dictionary when the letters included in the string are not the ones that actually count and how to print then anyway.
For instance:
sequence = str(input('Enter DNA sequence:'))
print ('Your sequence contain:',len(sequence), 'bases', 'with the following
structure:')
adenine = sequence.count("A") + sequence.count("a")
thymine = sequence.count("T") + sequence.count("t")
cytosine = sequence.count("C") + sequence.count("c")
guanine = sequence.count ("G") + sequence.count("g")
print("adenine =", adenine)
print("thymine=", thymine)
print("cytosine=", cytosine)
print("guanine=", guanine)
I was thinking in a dictionary like this:
dicc = {adenine:["A","a"], thymine:["T" ,"t"],
cytosine:["C","c"], guanine:["G","g"]
}
But I dont know how to print those letters which are not nucleotides if they are given in the sequence, for instance, in the following sequence the result should be something like this:
sequence = AacGTtxponwxs:
your sequence contain 13 bases with the following structure:
adenine = 2
thymine = 2
cytosine = 1
thymine = 2
p is not a DNA value
x is not a DNA value
o is not a DNA value
n is not a DNA value
w is not a DNA value
s is not a DNA value
Using collections.Counter (which is a dict-like class), you can be more DRY:
from collections import Counter
sequence = 'AacGTtxponwxs'
s = sequence.lower()
bases = ['adenine', 'thymine', 'cytosine', 'guanine']
non_bases = [x for x in s if x not in (b[0] for b in bases)]
c = Counter(s)
for base in bases:
print('{} = {}'.format(base, c[base[0]]))
# adenine = 2
# thymine = 2
# cytosine = 1
# guanine = 1
for n in non_bases:
print('{} is not a DNA value'.format(n))
# o is not a DNA value
# n is not a DNA value
# p is not a DNA value
# s is not a DNA value
# w is not a DNA value
# x is not a DNA value
try this out
sequence = 'AacGTtxponwxs'
adenine = 0
thymine = 0
cytosine = 0
guanine = 0
outputstring = []
for elem in sequence:
if elem in ('a','A'):
adenine += 1
elif elem in ('T','t'):
thymine += 1
elif elem in ('C','c'):
cytosine += 1
elif elem in ('G','g'):
guanine += 1
else:
outputstring.append('{} is not a DNA value'.format(elem))
print ('your sequence contain {} bases with the following structure:'.format(len(sequence)))
print ('adenine = ',adenine )
print ('thymine = ',thymine )
print ('cytosine = ',cytosine )
print ('thymine = ',guanine )
print ("\n".join(outputstring))
output:
your sequence contain 13 bases with the following structure:
adenine = 2
thymine = 2
cytosine = 1
thymine = 1
x is not a DNA value
p is not a DNA value
o is not a DNA value
n is not a DNA value
w is not a DNA value
x is not a DNA value
s is not a DNA value
#Are you studying bioinformatics at HAN? I remember this as my assignment lol
#3 years ago
sequence = str(input('Enter DNA sequence:'))
sequence.lower()
count_sequence = 0
countA = 0
countT = 0
countG = 0
countC = 0
countNotDNA = 0
for char in sequence:
if char in sequence:
count_sequence+=1
if char == 'a':
countA +=1
if char == 't':
countT +=1
if char == 'g':
countG +=1
if char == 'c':
countC +=1
else:
countNotDNA+=1
print("sequence is", count_sequence, "characters long containing:","\n", countA, "Adenine","\n", countT, "Thymine","\n", countG, "Guanine","\n", countC, "Cytosine","\n", countNotDNA, "junk bases")
There you go :)

Python: Converting word to list of letters, then returning indexes of the letters against lower case alphabet

I have already completed the task but in its most basic form looking for help shortening it and so it can apply to any word not just one with eight letters, here's what I've got so far (bit long for what it does):
alpha = map(chr, range(97, 123))
word = "computer"
word_list = list(word)
one = word[0]
two = word[1]
three = word[2]
four = word[3]
five = word[4]
six = word[5]
seven = word[6]
eight = word[7]
one_index = str(alpha.index(one))
two_index = str(alpha.index(two))
three_index = str(alpha.index(three))
four_index = str(alpha.index(four))
five_index = str(alpha.index(five))
six_index = str(alpha.index(six))
seven_index = str(alpha.index(seven))
eight_index = str(alpha.index(eight))
print (one + "=" + one_index)
print (two + "=" + two_index)
print (three + "=" + three_index)
print (four + "=" + four_index)
print (five + "=" + five_index)
print (six + "=" + six_index)
print (seven + "=" + seven_index)
print (eight + "=" + eight_index)
What you are probably looking for is a for-loop.
Using a for-loop your code could look like this:
word = "computer"
for letter in word:
index = ord(letter)-97
if (index<0) or (index>25):
print ("'{}' is not in the lowercase alphabet.".format(letter))
else:
print ("{}={}".format(letter, str(index+1))) # +1 to make a=1
If you use
for letter in word:
#code
the following code will be executed for every letter in the word (or element in word if word is a list for example).
A good start to learn more about loops is here: https://en.wikibooks.org/wiki/Python_Programming/Loops
You can find tons of ressources in the internet covering this topic.
Use for loop for loop,
alpha = map(chr, range(97, 123))
word = "computer"
for l in word:
print '{} = {}'.format(l,alpha.index(l.lower()))
Result
c = 2
o = 14
m = 12
p = 15
u = 20
t = 19
e = 4
r = 17
Start with a dict that maps each letter to its number.
import string
d = dict((c, ord(c)-ord('a')) for c in string.lowercase)
Then pair each letter of your string to the appropriate index.
result = [(c, d[c]) for c in word]
thanks for the help managed to solve it myself in a different way using a function and a while loop, not as short but will work for all lower case words:
alpha = map(chr, range (97,123))
word = "computer"
count = 0
y = 0
def indexfinder (number):
o = word[number]
i = str(alpha.index(o))
print (o + "=" + i)
while count < len(word):
count = count + 1
indexfinder (y)
y = y+1

Categories

Resources