How to use Enumerate with Variable data properly? - python

I am trying to use enumerate with data in a variable but the variable data is getting enumerated
as a single string how can i use in the below format
Excepted output comes when i use with statement :
with open("sample.txt") as file:
for num, line in enumerate(file):
print(num, line)
output
0 sdasd
1 adad
2 adadf
but when
data = "adklkahdjsa saljdahsd \nsjdksd"
for num, line in enumerate(data):
print(num, line)
output
0 a
1 d
2 k
3 l
4 k
5 a
6 h
7 d
8 j
9 s
10 a
11
12 s ... so on

enumerate expects an iterable. In your example it takes the string as iterable an iterates over each character.
It seems what you want is to iterate over each word in the text. Then you first need to split the string into words.
Example:
data.split(' ') # split by whitespace
Full Example:
data = "adklkahdjsa saljdahsd \nsjdksd"
for num, line in enumerate(data.split(' ')):
print(num, line)

Related

I am struggling a little on getting this code to work in python

Now honestly, I think this could be entirely wrong as I don't really know what I am doing and just kinda through some stuff together, so help would be appreciated.
This is the code I got, including starting code that cannot be changed.
# DO NOT CHANGE ANY CODE IN THE MAIN FUNCTION
def main():
input_file = open('strings.txt', 'r') # Open a file for reading
for line in input_file: # Use a for loop to read each line in the file
manipulate_text(line)
print()
def manipulate_text(line):
# Delete the following line, then implement the function as indicated
line = line.upper()
line = line.strip()
letters = []
for char in line:
if char.isalpha():
if char not in letters.count(line):
letters[char] = 1
else:
letters[char] += 1
for everyLetter in letters:
print("{0} {1}".format(everyLetter, letters[everyLetter]))
The .txt file it uses just contain:
Csc.565
Magee, Mississippi
A stitch in time saves nine.
And these are the instructions I have been given, also in this .count is what needs to be used, as shown in my code.
The manipulate_text() function accepts one string as input. The function should do the following with the string parameter:
⦁ Convert all the letters of the string to uppercase, strip the leading and trailing whitespace, and output the string.
⦁ Count and display the frequency of each letter in the string. Ignore all non-alpha characters.
For example, if this is the contents of strings.txt:
Csc.565
Magee, Mississippi
A stitch in time saves nine.
This would be the output of your program:
CSC.565
C 2
S 1
MAGEE, MISSISSIPPI
M 2
A 1
G 1
E 2
I 4
S 4
P 2
A STITCH IN TIME SAVES NINE.
A 2
S 3
T 3
I 4
C 1
H 1
N 3
M 1
E 3
V 1
Here's the code you wanted:
# DO NOT CHANGE ANY CODE IN THE MAIN FUNCTION
def main():
input_file = open('strings.txt', 'r') # Open a file for reading
for line in input_file: # Use a for loop to read each line in the file
manipulate_text(line)
print()
def manipulate_text(line):
line = line.upper()
line = line.strip()
letters = {} # Dict[Char: No. of occurrences]
print(line)
for char in line:
if char.isalpha():
if char not in list(letters.keys()): # If char not in our dict
letters[char] = 1 # One occurrence
else:
letters[char] += 1 # Add one occurrence
for i in letters:
print("{0} {1}".format(i, letters[i]))
main() # Call main
Output:
CSC.565
C 2
S 1
MAGEE, MISSISSIPPI
M 2
A 1
G 1
E 2
I 4
S 4
P 2
A STITCH IN TIME SAVES NINE.
A 2
S 3
T 3
I 4
C 1
H 1
N 3
M 1
E 3
V 1
In response to your comment:
# DO NOT CHANGE ANY CODE IN THE MAIN FUNCTION
def main():
input_file = open('strings.txt', 'r') # Open a file for reading
for line in input_file: # Use a for loop to read each line in the file
manipulate_text(line)
print()
def manipulate_text(line):
line = line.upper()
line = line.strip()
letters = {} # Dict[Char: No. of occurrences]
print(line)
for char in line:
if char.isalpha():
if list(letters.keys()).count(char) == 0: # If char not in our dict
letters[char] = 1 # One occurrence
else:
letters[char] += 1 # Add one occurrence
for i in letters:
print("{0} {1}".format(i, letters[i]))
main() # Call main
In reponse to your second comment:
Use these instead of manipulate_text():
If you don't care about ordering:
def manipulate_text(line):
line = [i for i in line.upper() if i.isalpha()] # List comprehension!
for i in set(line): # set() changes it to all unique keys, loses order
print(i, line.count(i)) # .count()
If you care about ordering:
def manipulate_text(line):
line = [i for i in line.upper() if i.isalpha()] # List comprehension!
uniques = []
for i in line:
if i not in uniques:
print(i, line.count(i)) # .count()
uniques += [i]

Split array by rows

I have a very basic problem.I have wrote a code which open a .txt file which contain a numbers 1 2 3 4 5 6 7 8 9.Then it square all of it and write to other file.
Right now I want to add to this code procedure which split all of this numbers in rows and rewrite,like this:
1 4 9
16 25 36
49 64 81
My code already:
n=[]
dane = open("num.txt", "r")
for i in dane:
i = i.replace('\n','')
for j in i.split(' '):
j = int(j)
j = j**2
n.append(j)
nowy = open("newnum.txt","w")
nowy.write(str(n))
nowy.close()
The code you have written works fine expect for the writing part. For which you need to change the last three lines of code as
nowy = open("newnum.txt","w")
for i in range(0,len(n),3):
nowy.write("{} {} {}\n".format(n[i],n[i+1],n[i+2]))
nowy.close()
The for loop can be explained as,
loop through the list n that you have generated 3 at a time by using the third argument to the range function which is called step.
write out the values three at a time into the file, terminated by the newline character
The output after changing the lines of code is as expected
1 4 9
16 25 36
49 64 81
Ref:
format
range
As a complement to #Bhargav's answer, according to the doc "[a] possible idiom for clustering a data series into n-length groups [is] using zip(*[iter(s)]*n)"
You can use the star to unpack a list/tuple as arguments to format function call too.
All this will lead to a more Pythonic (or, rather crypto-Pythonic ?) version of the writing part:
with open("newnum.txt","w") as nowy:
for sublist in zip(*[iter(n)]*3):
nowy.write("{} {} {}\n".format(*sublist))
Please note the use of a context manager (with statement) to ensure proper closing of the file in all cases when exiting from the block. As other changes would be subject to discussion, that later is a must -- and you should definitively take the habit of using it
(BTW, have you noticed you never closed the dane file? A simple mistake that would have been avoided by the use of a context manager to manage that resource...)
You can try this:
strNew = ''
dane = open("num.txt", "r")
row = 0
for i in dane:
i = i.replace('\n','')
for j in i.split(' '):
row += 1
j = int(j)
j = j**2
if (row % 3) == 0:
strNew += str(j)+'\n'
else:
strNew += str(j) + ' ' # it can be ' ' or '\t'
nowy = open("newnum.txt","w")
nowy.write(strNew)
nowy.close()
The result is:
1 4 9
16 25 36
49 64 81
n=[]
dane = open("num.txt", "r")
for i in dane:
i = i.replace('\n','')
for j in i.split(' '):
j = int(j)
j = j**2
# new code added
# why str(j)? Because array in Python can only have one type of element such as string, int, etc. as opposed to tuple that can have multiple type in itself. I appended string because I wanna append \n at the end of each step(in outer loop I mean)
n.append(str(j))
# new code added
n.append("\n")
nowy = open("newnum.txt","w")
nowy.write(str(n))
nowy.close()

assign a variable to each data in a line separated by whitespaces

Basically I want to use python to read each data in the last two lines of the following file to a different variable.
The file is of the following form:
a b c
10
10 0 0
2 5
xyz
10 12 13
11 12 12.4
1 34.5 10.8
I want the output to have the following
d=11, e=12, f=12.4
g=1 h =34.5 i=10.8
How can I loop over the lines if I have say 100 lines (after xyz) each with three data. And that I need to read only say last 3 lines in it.
The following is what I did, but doesn't seem to reach anywhere.
p1=open('aaa','r')
im=open('bbb','w')
t=open('test','w')
lines=p1.readlines()
i=0
for line in lines:
Nj=[]
Nk=[]
Cx=Cy=Cz=Nx=Ny=Nz=0
i=i+1
if line.strip():
if i==1:
t.write(line)
dummy=line.strip().split()
a1=dummy[0]
a2=dummy[1]
a3=dummy[2]
print("The atoms present are %s, %s and %s" %(a1, a2,a3))
if i==2:
t.write(line)
if i==3:
t.write(line)
if i==4:
t.write(line)
if i==5:
t.write(line)
if i==6:
t.write(line)
dummy=line.strip().split()
Na1=dummy[0]
Na2=dummy[1]
Na3=dummy[2]
import string
N1=string.atoi(Na1)
N2=string.atoi(Na2)
N3=string.atoi(Na3)
print("number of %s atoms= %d "%(a1,N1))
print("number of %s atoms= %d "%(a2,N2))
print("number of %s atoms= %d "%(a3,N3))
if i==7:
t.write(line)
if i==8:
t.write(line)
for i, line in enumerate(p1):
if i==8:
dummy=line.strip().split()
Njx=dummy[0]
Njy=dummy[1]
Njz=dummy[2]
import string
Njx=string.atof(Njx)
Njy=string.atof(Njy)
Njz=string.atof(Njz)
Nj = [Njx, Njy, Njz]
elif i==9:
dummy=line.strip().split()
Nkx=dummy[0]
Nky=dummy[1]
Nkz=dummy[2]
import string
Nkx=string.atof(Nkx)
Nky=string.atof(Nky)
Nkz=string.atof(Nkz)
Nk = [Nkx, Nky, Nkz]
break
You can read the file's last two lines with
f = open(file, "r")
lines = f.readlines()[-2:] # change this if you want more than the last two lines
f.close()
split1 = lines[0].strip().split(' ') # In the example below: lines[0] = "4 5 6\n"
split2 = lines[1].strip().split(' ') # lines[1] = "7 8 9"
Then, you can assign those values to your variables:
d,e,f = [int(x) for x in split1]
g,h,i = [int(x) for x in split2]
This will assign the three values of each line to d,e,f,g,h,i, for example:
(your file)
...
1 2 3
4 5 6
7 8 9
(result)
d = 4
e = 5
f = 6
g = 7
h = 8
i = 9
Here you go
with open("text.txt", "r") as f:
# Get two last lines, remove the '\n'
contents = map(lambda s : s[:-1], f.readlines()[-2:])
# Get the three last lines,
[[d,e,f],[g,h,i]] = map(lambda s : map(float, s.split(" ")[-3:]), contents)
# Check the result
print (d,e,f,g,h,i)
Explanation :
with open("text.txt", "r") as f: is recommended way of working with file in python, see file I/O tutorial to see why.
contents = map(lambda s : s[:-1], f.readlines()[-2:]) This load the contents of f into a list of strings using readlines(), take the last two using [-2:], and remove the unnecessary '\n' by mapping lambda s : s[:-1].
At this point, our contents should contain last two lines.
The expression map(lambda s : map(float, s.split(" ")[-3:]), contents) split each of the two lines by " " then unpack it to the list [[d,e,f],[g,h,i]]. The [-3:] here is to remove the spaces in the front.

Splicing through a line of a textfile using python

I am trying to create genetic signatures. I have a textfile full of DNA sequences. I want to read in each line from the text file. Then add 4mers which are 4 bases into a dictionary.
For example: Sample sequence
ATGATATATCTATCAT
What I want to add is ATGA, TGAT, GATA, etc.. into a dictionary with ID's that just increment by 1 while adding the 4mers.
So the dictionary will hold...
Genetic signatures, ID
ATGA,1
TGAT, 2
GATA,3
Here is what I have so far...
import sys
def main ():
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {} #creates dictionary
for char in readingFile:
my_DNA = my_DNA+char
for char in my_DNA:
index = 0
DnaID=1
seq = my_DNA[index:index+4]
if (DNAseq.has_key(seq)): #checks if the key is in the dictionary
index= index +1
else :
DNAseq[seq] = DnaID
index = index+1
DnaID= DnaID+1
readingFile.close()
if __name__ == '__main__':
main()
Here is my output:
ACTC
ACTC
ACTC
ACTC
ACTC
ACTC
This output suggests that it is not iterating through each character in string... please help!
You need to move your index and DnaID declarations before the loop, otherwise they will be reset every loop iteration:
index = 0
DnaID=1
for char in my_DNA:
#... rest of loop here
Once you make that change you will have this output:
ATGA 1
TGAT 2
GATA 3
ATAT 4
TATA 5
ATAT 6
TATC 6
ATCT 7
TCTA 8
CTAT 9
TATC 10
ATCA 10
TCAT 11
CAT 12
AT 13
T 14
In order to avoid the last 3 items which are not the correct length you can modify your loop:
for i in range(len(my_DNA)-3):
#... rest of loop here
This doesn't loop through the last 3 characters, making the output:
ATGA 1
TGAT 2
GATA 3
ATAT 4
TATA 5
ATAT 6
TATC 6
ATCT 7
TCTA 8
CTAT 9
TATC 10
ATCA 10
TCAT 11
This should give you the desired effect.
from collections import defaultdict
readingFile = open("signatures.txt", "r").read()
DNAseq = defaultdict(int)
window = 4
for i in xrange(len(readingFile)):
current_4mer = readingFile[i:i+window]
if len(current_4mer) == window:
DNAseq[current_4mer] += 1
print DNAseq
index is being reset to 0 each time through the loop that starts with for char in my_DNA:.
Also, I think the loop condition should be something like while index < len(my_DNA)-4: to be consistent with the loop body.
Your index counters reset themselves since they are in the for loop.
May I make some further suggestions? My solution would look like that:
readingFile = open("signatures.txt", "r")
my_DNA=""
DNAseq = {} #creates dictionary
for line in readingFile:
line = line.strip()
my_DNA = my_DNA + line
ID = 1
index = 0
while True:
try:
seq = my_DNA[index:index+4]
if not seq in my_DNA:
DNAseq[ID] = my_DNA[index:index+4]
index += 4
ID += 1
except IndexError:
break
readingFile.close()
But what do you want to do with duplicates? E.g., if a sequence like ATGC appears twice? Should both be added under a different ID, for example {...1:'ATGC', ... 200:'ATGC',...} or shall those be omitted?
If I'm understanding correctly, you are counting how often each sequential string of 4 bases occurs? Try this:
def split_to_4mers(filename):
dna_dict = {}
with open(filename, 'r') as f:
# assuming the first line of the file, only, contains the dna string
dna_string = f.readline();
for idx in range(len(dna_string)-3):
seq = dna_string[idx:idx+4]
count = dna_dict.get(seq, 0)
dna_dict[seq] = count+1
return dna_dict
output on a file that contains only "ATGATATATCTATCAT":
{'TGAT': 1, 'ATCT': 1, 'ATGA': 1, 'TCAT': 1, 'TATA': 1, 'TATC': 2, 'CTAT': 1, 'ATCA': 1, 'ATAT': 2, 'GATA': 1, 'TCTA': 1}

Parsing text in python

I've got a text file that is structured like so
1\t 13249\n
2\t 3249\n
3\t 43254\n
etc...
It's a very simple list. I've got the file opened and I can read the lines. I have the following code:
count = 0
for x in open(filename):
count += 1
return count
What I want to do is to assign the first number of each line to a variable (say xi) and to assign the second number of each line to another variable (yi). The goal is to be able to run some statistics on these numbers.
Many thanks in advance.
No need to reinvent the wheel..
import numpy as np
for xi, yi in np.loadtxt('blah.txt'):
print(xi)
print(yi)
count = 0
for x in open(filename):
# strip removes all whitespace on the right (so the newline in this case)
# split will break a string in two based on the passed parameter
xi, yi = x.rstrip().split("\t") # multiple values can be assigned at once
count += 1
return count
>>> with open('blah.txt') as f:
... for i,xi,yi in ([i]+map(int,p.split()) for i,p in enumerate(f)):
... print i,xi,yi
...
0 1 13249
1 2 3249
2 3 43254
note that int(' 23\n') = 23
this is clearer:
Note that enumerate provides a generator which includes a counter for you.
>>> with open('blah.txt') as f:
... for count,p in enumerate(f):
... xi,yi=map(int,p.split()) #you could prefer (int(i) for i in p.split())
... print count,xi,yi
...
0 1 13249
1 2 3249
2 3 43254
with regular expression:
import re
def FUNC(path):
xi=[]
yi=[]
f=open(path).read().split("\n") # spliting file's content into a list
patt=re.compile("^\s*(\d)\t\s(\d).*") # first some whitespaces then first number
#then a tab or space second number and other characters
for iter in f:
try:
t=patt.findall(iter)[0]
xi.append(t[0])
yi.append(t[1])
except:
pass
print xi,yi
#-----------------------------
if __name__=="__main__":
FUNC("C:\\data.txt")
a simpler code:
def FUNC(path):
x=[]
y=[]
f=open(path).read().split("\n")
for i in f:
i=i.split(" ")
try:
x.append(i[0][0])
y.append(i[1][0])
except:pass
print x,y

Categories

Resources