Python reading 2 strings from the same line - python

how can I read once at a time 2 strings from a txt file, that are written on the same line?
e.g.
francesco 10

# out is your file
out.readline().split() # result is ['francesco', '10']
Assuming that your two strings are separated by whitespace. You can split based on any string (comma, colon, etc.)

Why not read just the line and split it up later? You'd have to read byte-by-byte and look for the space character, which is very inefficient. Better to read the entire line, and then split the resulting string on the space, giving you two strings.

'francesco 10'.split()
will give you ['francesco', '10'].

for line in fi:
line.split()
Its ideal to just iterate over a file object.

Related

replacing a character in a line of string

I have a .txt file with 20 lines. Each line carrying 10 zeroes separated by comma.
0,0,0,0,0,0,0,0,0,0
I want to replace every fifth 0 with 1 in each line. I tried .replace function but I know there must be some easy way in python
You can split the text string with the below command.
text_string="0,0,0,0,0,0,0,0,0,0"
string_list= text_string.split(",")
Then you can replace every fifth element in the list string_list using insert command.
for i in range(4,len(string_list),5):
string_list.insert(i,"1")
After this join the elements of the list using join method
output = "".join([str(i)+"," for i in string_list])
The output for this will be :
'0,0,0,0,1,0,0,0,0,1,0,0,'
This is one way of doing
If text in this File follows some rule, you can parse it as CSV file, and change every fifth index and rewrite it to a new file.
But if you want to modify the existing text file, like replace the character then you can use seek refer to How to modify a text file?

Treat '^M' as regular characters in Python

I have a file that contains text to generate LaTeX mathematical expressions, one per line. This file should contain exactly 103,559 lines. But some lines contain the character sequence '^M' (CTRL-v CTRL-m) either at the end or interspersed within the lines, possibly multiple times. As a result, when I try to read the lines from the file using Python, the number of lines returned is greater than expected (actually returns 104,654 lines).
How do I tell Python to not generate a newline on each occurrence of the sequence '^M'? Thank you.
Use the newline argument to open().
Nearly a duplicate of Don't convert newline when reading a file, from where I got this solution:
with open(sys.argv[1], 'r', newline='\n') as fh:
for i, line in enumerate(fh):
print(i, line)
(Be aware that, when printing as in this example, the ^M ('\r') character will put the current point at the start of a line, overwriting existing characters.)

Why doesn't .rstrip('\n') work?

Let's say doc.txt contains
a
b
c
d
and that my code is
f = open('doc.txt')
doc = f.read()
doc = doc.rstrip('\n')
print doc
why do I get the same values?
str.rstrip() removes the trailing newline, not all the newlines in the middle. You have one long string, after all.
Use str.splitlines() to split your document into lines without newlines; you can rejoin it if you want to:
doclines = doc.splitlines()
doc_rejoined = ''.join(doclines)
but now doc_rejoined will have all lines running together without a delimiter.
Because you read the whole document into one string that looks like:
'a\nb\nc\nd\n'
When you do a rstrip('\n') on that string, only the rightmost \n will be removed, leaving all the other untouched, so the string would look like:
'a\nb\nc\nd'
The solution would be to split the file into lines and then right strip every line. Or just replace all the newline characters with nothing: s.replace('\n', ''), which gives you 'abcd'.
rstrip strips trailing spaces from the whole string. If you were expecting it to work on individual lines, you'd need to split the string into lines first using something like doc.split('\n').
Try this instead:
with open('doc.txt') as f:
for line in f:
print line,
Explanation:
The recommended way to open a file is using with, which takes care of closing the file at the end
You can iterate over each line in the file using for line in f
There's no need to call rstrip() now, because we're reading and printing one line at a time
Consider using replace and replacing each instance of '\n' with ''. This would get rid of all the new line characters in the input text.

Eliminating extra commas

I am having trouble replacing three commas with one comma in a text file of data.
I am processing a large text file to put it into comma delimited format so I can query it using a database.
I do the following at the command prompt and it works:
>>> import re
>>> line = 'one,,,two'
>>> line=re.sub(',+',',',line)
>>> print line
one,two
>>>
following below is my actual code:
with open("dmis8.txt", "r") as ifp:
with open("dmis7.txt", "w") as ofp:
for line in ifp:
#join lines by removing a line ending.
line=re.sub('(?m)(MM/ANGDEC)[\r\n]+$','',line)
#various replacements of text with nothing. This removes the text
line=re.sub('IDENTIFIER','',line)
line=re.sub('PART','50-1437',line)
line=re.sub('Eval','',line)
line=re.sub('Feat','',line)
line=re.sub('=','',line)
#line=re.sub('r"++++"','',line)
line=re.sub('r"----|"',' ',line)
line=re.sub('Nom','',line)
line=re.sub('Act',' ',line)
line=re.sub('Dev','',line)
line=re.sub('LwTol','',line)
line=re.sub('UpTol','',line)
line=re.sub(':','',line)
line=re.sub('(?m)(Trend)[\r\n]*$',' ',line)
#Remove spaces replace with semicolon
line=re.sub('[ \v\t\f]+', ',', line)
#no worky line=re.sub(r",,,",',',line)
line=re.sub(',+',',',line)
#line=line.replace(",+", ",")
#line=line.replace(",,,", ",")
ofp.write(line)
This is what i get from the code above:
There are several commas together. I don't understand why they won't get replaced down to one comma.
Never mind that I don't see how the extra commas got there in the first place.
50-1437,d
2012/05/01
00/08/27
232_PD_1_DIA,PED_HL1_CR,,,12.482,12.478,-0.004,-0.021,0.020,----|++++
232_PD_2_DIA_TOP,PED_HL2_TOP,,12.482,12.483,0.001,-0.021,0.020,----|++++
232_PD_2_DIA,PED_HL2_CR,,12.482,12.477,-0.005,-0.021,0.020,----|++++
232_PD_2_DIA_BOT,PED_HL2_BOT,,12.482,12.470,-0.012,-0.021,0.020,--|--++++
raw data for reference:
PART IDENTIFIER : d
2012/05/01
00/08/27
232_PD_1_DIA Eval Feat = PED_HL1_CR MM/ANGDEC
Nom Act Dev LwTol UpTol Trend
12.482 12.478 -0.004 -0.021 0.020 ----|++++
232_PD_2_DIA_TOP Eval Feat = PED_HL2_TOP MM/ANGDEC
12.482 12.483 0.001 -0.021 0.020 ----|++++
232_PD_2_DIA Eval Feat = PED_HL2_CR MM/ANGDEC
12.482 12.477 -0.005 -0.021 0.020 ----|++++
Can someone kindly point what I am doing wrong?
thanks in advance...
Your regex is working fine. The problem is that it you concatenate the lines (by write()ing them) after you scrub them with your regex.
Instead, use "".join() on all of your lines, run re.sub() on the whole thing, and then write() it all to the file at once.
I think your problem is caused by the fact that removing line endings does not join lines, in combination with the fact that write does not add newlines to the end each string. So you have multiple input lines that look like a single line in the output.
Looking at the comments, you seem to think that just replacing the end of the line by an empty string will magically append the next line to it, but that doesn't actually work. So the three commas you're seeing are not replaced by your re.sub command because they're not in one line, they're multiple input lines (which after all the replacements are empty except for commas) which get printed to a single output line because you stripped their '\n' characters, and write doesn't automatically add '\n' to the end of each written string (unlike print).
To debug your code, just put print line after each line of code, to see what each "line" actually is - that should help you see what's going wrong.
In general, reading file formats where each "record" spans multiple lines requires more complicated methods than just a for line in file loop.

Removing tab delimited spaces from a text file using for loop

For my python class, I am working on opening a .tsv file and taking 15 rows of data, broken down in 4 columns, and turning it into lists for each line. To do this, I must remove the tabs in between each column.
I've been advised to use a for loop and loop through each line. This makes sense but I can't figure out how to remove the tabs.
Any help?
To read lines from a file, and split each line on the tab delimiter, you can do this:
rows = []
for line in open('file.tsv', 'rb'):
rows.append(line.strip().split('\t'))
Properly, this should be done using the Python CSV module (as mentioned in another answer) as this will handle escaped separators, quoted values etc.
In the more general sense, this can be done with a list comprehension:
rows = [line.split('\t') for line in file]
And, as suggested in the comments, in some cases a generator expression would be a better choice:
rows = (line.split('\t') for line in file)
See Generator Expressions vs. List Comprehensions for some discussion on when to use each.
You should use Python's stdlib csv module, particularly the csv.reader function.
rows = [row for row in csv.reader(open('yourfile.tsv', 'rb'), delimiter='\t')]
There's also a a dialect parameter that can take excel-tab to conform to Microsoft Excel's tab-delimited format.
Check out the built-in string functions. split() should do the job.
>>> line = 'word1\tword2\tword3'
>>> line.split('\t')
['word1', 'word2', 'word3']

Categories

Resources