Finding Multiple Sequences in a File by Position after SpecifIcally Repeating Characters

Finding Multiple Sequences in a File by Position after SpecifIcally Repeating Characters - python

i am trying to write this code, so that i can get my sequences of different samples in a file after line breaks by position, the output is always blank for some reason, can you help me?
import readline
count = 0
brk = 0
with open("file.txt") as f:
while (count < 35):
l = f.readline()[brk + 2]
sp = raw_input ("Starting Position:")
sp = int(sp)
rl = sp + 6
print(l[sp:rl])
print(l[-30:0])
count = count + 1
brk = brk + 2
print ("Done")

In the line l = f.readline()[brk + 2] the program puts one character into variable l. So, when you are trying to print substring of l (in the lines print(l[sp:rl]) and print(l[-30:0])), the program prints empty lines. It is expected result.
To find this you could just add print l right after assigning of l.
It seems that you are trying to read 2-nd, 4-th, 6-th, etc lines of the file. To do it you can do something like this:
brk = 0
with open("file.txt") as f:
f.readline()
f.readline() #skip both first lines
while (count < 35):
l = f.readline()
f.readline() #skip next line
sp = raw_input ("Starting Position:")
sp = int(sp)
rl = sp + 6
print(l[sp:rl])
print(l[-30:0])
count = count + 1
brk = brk + 2
Also print(l[-30:0]) must always print empty line. It seems that you need print(l[-30:]) (last 30 characters of the string l).

Related

I am struggling a little on getting this code to work in python

Now honestly, I think this could be entirely wrong as I don't really know what I am doing and just kinda through some stuff together, so help would be appreciated.
This is the code I got, including starting code that cannot be changed.
# DO NOT CHANGE ANY CODE IN THE MAIN FUNCTION
def main():
input_file = open('strings.txt', 'r') # Open a file for reading
for line in input_file: # Use a for loop to read each line in the file
manipulate_text(line)
print()
def manipulate_text(line):
# Delete the following line, then implement the function as indicated
line = line.upper()
line = line.strip()
letters = []
for char in line:
if char.isalpha():
if char not in letters.count(line):
letters[char] = 1
else:
letters[char] += 1
for everyLetter in letters:
print("{0} {1}".format(everyLetter, letters[everyLetter]))
The .txt file it uses just contain:
Csc.565
Magee, Mississippi
A stitch in time saves nine.
And these are the instructions I have been given, also in this .count is what needs to be used, as shown in my code.
The manipulate_text() function accepts one string as input. The function should do the following with the string parameter:
⦁ Convert all the letters of the string to uppercase, strip the leading and trailing whitespace, and output the string.
⦁ Count and display the frequency of each letter in the string. Ignore all non-alpha characters.
For example, if this is the contents of strings.txt:
Csc.565
Magee, Mississippi
A stitch in time saves nine.
This would be the output of your program:
CSC.565
C 2
S 1
MAGEE, MISSISSIPPI
M 2
A 1
G 1
E 2
I 4
S 4
P 2
A STITCH IN TIME SAVES NINE.
A 2
S 3
T 3
I 4
C 1
H 1
N 3
M 1
E 3
V 1

Here's the code you wanted:
# DO NOT CHANGE ANY CODE IN THE MAIN FUNCTION
def main():
input_file = open('strings.txt', 'r') # Open a file for reading
for line in input_file: # Use a for loop to read each line in the file
manipulate_text(line)
print()
def manipulate_text(line):
line = line.upper()
line = line.strip()
letters = {} # Dict[Char: No. of occurrences]
print(line)
for char in line:
if char.isalpha():
if char not in list(letters.keys()): # If char not in our dict
letters[char] = 1 # One occurrence
else:
letters[char] += 1 # Add one occurrence
for i in letters:
print("{0} {1}".format(i, letters[i]))
main() # Call main
Output:
CSC.565
C 2
S 1
MAGEE, MISSISSIPPI
M 2
A 1
G 1
E 2
I 4
S 4
P 2
A STITCH IN TIME SAVES NINE.
A 2
S 3
T 3
I 4
C 1
H 1
N 3
M 1
E 3
V 1
In response to your comment:
# DO NOT CHANGE ANY CODE IN THE MAIN FUNCTION
def main():
input_file = open('strings.txt', 'r') # Open a file for reading
for line in input_file: # Use a for loop to read each line in the file
manipulate_text(line)
print()
def manipulate_text(line):
line = line.upper()
line = line.strip()
letters = {} # Dict[Char: No. of occurrences]
print(line)
for char in line:
if char.isalpha():
if list(letters.keys()).count(char) == 0: # If char not in our dict
letters[char] = 1 # One occurrence
else:
letters[char] += 1 # Add one occurrence
for i in letters:
print("{0} {1}".format(i, letters[i]))
main() # Call main
In reponse to your second comment:
Use these instead of manipulate_text():
If you don't care about ordering:
def manipulate_text(line):
line = [i for i in line.upper() if i.isalpha()] # List comprehension!
for i in set(line): # set() changes it to all unique keys, loses order
print(i, line.count(i)) # .count()
If you care about ordering:
def manipulate_text(line):
line = [i for i in line.upper() if i.isalpha()] # List comprehension!
uniques = []
for i in line:
if i not in uniques:
print(i, line.count(i)) # .count()
uniques += [i]

Print data between positions within a loop

I have one files.
File1 which has 3 columns. Data are tab separated
File1:
2 4 Apple
6 7 Samsung
Let's say if I run a loop of 10 iteration. If the iteration has value between column 1 and column 2 of File1, then print the corresponding 3rd column from File1, else print "0".
The columns may or may not be sorted, but 2nd column is always greater than 1st. Range of values in the two columns do not overlap between lines.
The output Result should look like this.
Result:
0
Apple
Apple
Apple
0
Samsung
Samsung
0
0
0
My program in python is here:
chr5_1 = [[]]
for line in file:
line = line.rstrip()
line = line.split("\t")
chr5_1.append([line[0],line[1],line[2]])
# Here I store all position information in chr5_1 list in list
chr5_1.pop(0)
for i in range (1,10):
for listo in chr5_1:
L1 = " ".join(str(x) for x in listo[:1])
L2 = " ".join(str(x) for x in listo[1:2])
L3 = " ".join(str(x) for x in listo[2:3])
if int(L1) <= i and int(L2) >= i:
print(L3)
break
else:
print ("0")
break
I am confused with loop iteration and it break point.

Try this:
chr5_1 = dict()
for line in file:
line = line.rstrip()
_from, _to, value = line.split("\t")
for i in range(int(_from), int(_to) + 1):
chr5_1[i] = value
for i in range (1, 10):
print chr5_1.get(i, "0")

I think this is a job for else:
position_information = []
with open('file1', 'rb') as f:
for line in f:
position_information.append(line.strip().split('\t'))
for i in range(1, 11):
for start, through, value in position_information:
if i >= int(start) and i <= int(through):
print value
# No need to continue searching for something to print on this line
break
else:
# We never found anything to print on this line, so print 0 instead
print 0
This gives the result you're looking for:
0
Apple
Apple
Apple
0
Samsung
Samsung
0
0
0

Setup:
import io
s = '''2 4 Apple
6 7 Samsung'''
# Python 2.x
f = io.BytesIO(s)
# Python 3.x
#f = io.StringIO(s)
If the lines of the file are not sorted by the first column:
import csv, operator
reader = csv.reader(f, delimiter = ' ', skipinitialspace = True)
f = list(reader)
f.sort(key = operator.itemgetter(0))
Read each line; do some math to figure out what to print and how many of them to print; print stuff; iterate
def print_stuff(thing, n):
while n > 0:
print(thing)
n -= 1
limit = 10
prev_end = 1
for line in f:
# if iterating over a file, separate the columns
begin, end, text = line.strip().split()
# if iterating over the sorted list of lines
#begin, end, text = line
begin, end = map(int, (begin, end))
# don't exceed the limit
begin = begin if begin < limit else limit
# how many zeros?
gap = begin - prev_end
print_stuff('0', gap)
if begin == limit:
break
# don't exceed the limit
end = end if end < limit else limit
# how many words?
span = (end - begin) + 1
print_stuff(text, span)
if end == limit:
break
prev_end = end
# any more zeros?
gap = limit - prev_end
print_stuff('0', gap)

Ciphering through the non letters

So here's my problem...I’m trying to make it so any character not part of the alphabet isn’t shifted in the text file but it does 'take up' a shift amount. For example, if I wanted to implement 2 shift amounts - 3 shifts and 4 shifts - to the statement "alarm clock" the a would be shifted by 3, the l by 4, the second a by 3, r by 4, m by 3, and then the space wouldnt be shifted but would "take up" a space, so it would cause the c to shift by 3. Here's what I have so far
import sys
file = input("Enter input file: ")
shifts = input("Enter shift amounts: ").split()
code = input("Encode (E) or Decode (D)?")
if code == "E":
with open(file) as f:
lines = f.read().replace("e","zw").splitlines()
for line in lines:
mid = len(line) // 2
line_list = list(line)
line_list.insert(mid,'hokie')
new_line = ('hokie' + (''.join(line_list) + 'hokie'))
for index, char in enumerate(new_line):
index = index % len(shifts)
print(chr(ord(char)+int(shifts[index])),end='')
print()
example input:
file text:
Though Birnam Wood be come to Dunsinane,
And thou opposed, being of no woman born,
Yet I will try the last. Before my body
I throw my warlike shield. Lay on, Macduff;
And damned be him that first cries "Hold, enough!"
-- MacBeth
shift:
3 4 5 6 7
result:
kspolWltank Goyqer Drsi icltqpha ivpdb ar Iauvmsguca,nvnmj
kspolDri aksz vsttygzh, icltqphantn sk ur butdr hvur,nvnmj
kspolBdbz L boso yxf xmfd pmurlifya. Hgzjtxgz re esieoronk
kspolL ynyra sf afxsloec vlnnvnmjfdoh. Oed vq, Shfhzlm;ltqph
kspolDri kdqsfdg gfd lns wlfz immurliwya gwogzw "Orpi, casubjl!"oronk
kspol-- ShkspolfFecakltqph

you have to add a if,here my solution.for the punctuation，
please lookBest way to strip punctuation from a string in Python
import string
exclude = set(string.punctuation)
shifts = [3,4]
lines = ["alarm clock", "alarm ';.clock"]
for line in lines:
mid = len(line) // 2
line_list = list(line)
line_list.insert(mid,'hokie')
new_line = ('hit' + (''.join(line_list) + 'hit'))
index = 0
for char in new_line:
index = index % len(shifts)
if(char in exclude):
print(char,end='')
continue
print(chr(ord(char)+int(shifts[index])),end='')
index += 1

Python: read line after string is found

I have a file which contains blocks of lines that I would like to separate. Each block contains a number identifier in the block's header: "Block X" is the header line for the X-th block of lines. Like this:
Block X
#L E C A F X M N
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
Block Y
#L E C A F X M N
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
I can use "enumerate" to find the header line of the block as follows:
with open(filename,'r') as indata:
for num, line in enumerate(indata):
if 'Block X' in line:
startblock=num
print startblock
This will yield the line number of the first line of block #X.
However, my problem is identifying the last line of the block. To do that, I could find the next occurrence of a header line (i.e., the next block) and subtract a few numbers.
My question: how can I find the line number of a the next occurrence of a condition (i.e., right after a certain condition was met)?
I tried using enumerate again, this time indicating the starting value, like this:
with open(filename,'r') as indata:
for num, line in enumerate(indata,startblock):
if 'Block Y ' in line:
endscan=num
break
print endscan
That doesn't work, because it still begins reading the file from line 0, NOT from the line number "startblock". Instead, by starting the "enumerate" counter from a different number, the resulting value of the counter, in this case "endscan" is shifted from 0 by the amount "startblock".
Please, help! How can tell python to disregard the lines previous to "startblock"?

If you want the groups using Block as the delimiter for each section, you can use itertools.groupby:
from itertools import groupby
with open('test.txt') as f:
grp = groupby(f,key=lambda x: x.startswith("Block "))
for k,v in grp:
if k:
print(list(v) + list(next(grp, ("", ""))[1]))
Output:
['Block X\n', '#L E C A F X M N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264\n']
['Block Y\n', '#L E C A F X M N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264']
If Block can appear elsewhere but you want it only when followed by a space and a single char:
import re
with open('test.txt') as f:
r = re.compile("^Block \w$")
grp = groupby(f, key=lambda x: r.search(x))
for k, v in grp:
if k:
print(list(v) + list(next(grp, ("", ""))[1]))

You can use the .tell() and .seek() methods of file objects to move around. So for example:
with open(filename, 'r') as infile:
start = infile.tell()
end = 0
for line in infile:
if line.startswith('Block'):
end = infile.tell()
infile.seek(start)
# print all the bytes in the block
print infile.read(end - start)
# now go back to where we were so we iterate correctly
infile.seek(end)
# we finished a block, mark the start
start = end

If the difference between the header lines is uniform throughout the file, just use the distance to increase the indexing variable accordingly.
file1 = open('file_name','r')
lines = file1.readlines()
numlines = len(lines)
i=0
for line in file:
if line == 'specific header 1':
line_num1 = i
if line == 'specific header 2':
line_num2 = i
i+=1
diff = line_num2 - line_num1
Now that we know the difference between the line numbers we use for loops to acquire the data.
k=0
array = np.zeros([numlines, diff])
for i in range(numlines):
if k % diff == 0:
for j in range(diff):
array[i][j] = lines[i+j]
k+=1
% is the mod operator which returns 0 only when k is a multiple of the difference in line numbers between the two header lines in the file, which will only occur when the line corresponds to the a header line. Once the line is fixed we go on to the second for loop that fills the array so that we have a matrix that is numlines number of rows and a diff number of columns. The nonzeros rows will contain the data inbetween the header lines.
I have not tried this out, I am just writing off the top of my head. Hopefully it helps!

python increment counter if character in list and then if character not in list

just wondering if there is a better method to accomplish the following:
i want to:
1) check if line[0] is a string number, if true set counter = 1
2) go to the next line, again check if line[0] is a string number, if false, continue to increment the counter *UNTIL IT FINDS A STRING NUMBER IN line[0]
3) if it finds a string number in line[0] restart counter aka set counter = 1
below is a sample of the file:
555 xxxxxxxxxxxxxxx
r
a
n
d
o
m
data
888 xxxxxxxxxxxxxxx
r
a
n
d
o
m
data
my results are (and should be) :
1 555 xxxxxxxxxxxxxxx
2 r
3 a
4 n
5 d
6 o
7 m
8 data
1 888 xxxxxxxxxxxxxxx
2 r
3 a
4 n
5 d
6 o
7 m
8 data
here is my code and i hope someone can update it... or in other words, how would you do it? thanks
list1 = ['1','2','3','4','5','6','7','8','9','0']
counter = {}
with open('O:/py_files/countlines.txt', 'rb') as infile,\
open('O:/py_files/countlines_out.txt', 'wb') as outfile:
for line in infile:
if line[0] in list1:
counter = 1
outfile.writelines(str(counter) + ' ' + line)
print str(counter) + ' ' + line
elif line[0] not in list1:
counter = counter + 1
outfile.writelines(str(counter) + ' ' + line)
print str(counter) + ' ' + line
else:
outfile.writelines('something went wrong...')
print 'something went wrong...'

for line in infile.read().splitlines():
counter = 1 if line.split()[0].isdigit() else counter + 1

A single if statment can be used to simplfy the whole code.
with open('O:/py_files/countlines.txt', 'rb') as infile,\
open('O:/py_files/countlines_out.txt', 'wb') as outfile:
counter = 1
for line in infile:
if line[0].isdigit():
counter = 1
out = str(counter)+' '+line
outfile.write(out)
counter += 1
isdigit() method of string checkes whether the string is decimal of not.
"123".isdigit()
True
"hello".isdigit()
False
The counter is intented to be an integer. Hence counter={} is useless. If you want you can initailize counter=1
It is not necessary as the first line in the input starts with digit, the if evaluvates True here initializing the counter for you.
The program is intented to write for each line it reads. so the write statement can be kept outside the if.
The else statement is useless. if line[0] in list and if line[0] not in list are mutually exclusive. The program will never reach else

You can use isdigit() as shown below:
if line[0].isdigit():
# Your Code

counter = 0
for line in infile:
if line[0].isdigit():
counter = 1
else:
counter += 1
out = str(counter)+' '+line,
outfile.write(out)
if line[0] is a digit counter will be one or else it will increment the counter.

str.isdigit() Return true if all characters in the string are digits and there is at least one character, false otherwise. For 8-bit strings, this method is locale-dependent.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding Multiple Sequences in a File by Position after SpecifIcally Repeating Characters - python

Related

I am struggling a little on getting this code to work in python

Print data between positions within a loop

Ciphering through the non letters

Python: read line after string is found

python increment counter if character in list and then if character not in list

Categories

Resources