Issue reading text file python - python

I am new to python and stuck with some issue which could be pretty easy for python expert. I am trying to read text file in python but not getting desired out put using f string.
print(f'{lines[0]} {lines[2]}')\n
I am getting output in two lines, although I didn't use \n
Hello
I am testing!
Expected output:
Hello I am testing!

It's because when you read a file at end of every line a newline character exists. So if you even print only one line you'll get a blank line after the text. You can solve it by using strip method:
print(f'{lines[0].strip()} {lines[2].strip()}')

As you can see in your text file, the text is in different lines, so python interpreted it as a newline. So, it added a new line character \n. You just have to strip the character.
print(f'{lines[0].strip('\n')} {lines[2].strip('\n')}')

Related

Appending string to list adds a '\r' to every element

I am trying to append a string to a list (seqs). The strings look something like random letters which I am reading line by line from a file. Printing out the string shows no sign of a '\r'. But printing out the list does.
seqs.append(seq)
print('seq is', seq)
gives: seq is CSMKMTIGSGTKTLHRWAFNPTQQTCVTFVYTGAAGNQNNFLTRNDCVNTC
print(seqs)
gives: 'CSMKMTIGSGTKTLHRWAFNPTQQTCVTFVYTGAAGNQNNFLTRNDCVNTC\r'
I have tried changing the file a few different ways, even writing an example file of just lines of text so I am fairly certain I am not adding a '\r' to the file. Any help would be much appreciated.
Edit: I printed the sequence differently and I do see the carriage return now. Is there anyway I can remove it? For now when I parse through the sequences I just say if it's '\r' continue. To skip it
It's reading a gzip file and opening it with:
with gzip.open(filepath, 'rb') as fin:
file_content = fin.read().decode(encoding)
lines = file_content.split('\n')
for line in lines: #separate them into a list
EDIT EDIT I just did file_content.split('\r\n')...Seemed to fix the problem, not sure why I did not think of it sooner. Thanks everyone!

How does the code know when to split into a line?

So I was learning on how to download files from the web using python but got a bit thrown by one part of the code.
Here is the code:
from urllib import request
def download_stock_data(csv_url):
response = request.urlopen(csv_url)
csv = response.read()
csv_str = str(csv)
lines = csv_str.split("\\n")
dest_url = r"stock.csv"
fx = open(dest_url, "w")
for line in lines:
fx.write(line + "\n")
fx.close()
I don't quite understand the code in the variable lines. How does it know when to split into a new line on a csv file ?
A csv file is essentially just a text file will comma separated data but they also contain new lines (via the newline ascii character).
If there a csv file with a long single comma separated line for line in lines: would only see the single line.
You can open it up in notepad++ or something to see the raw .csv file. Excel will put data seperated by commas in a cell,and data on a new line into a new row.
"\n" is where the instruction to create a new line comes from.
In the code you have presented, you are telling python to split the string you received based upon "\n". So you get a list of strings split into lines.
When you write to fx, you are inserting a newline character onto every line you write by appending "\n". If you didn't do this, then you would just get one very long line.

Reading Regular Expressions from a text file

I'm currently trying to write a function that takes two inputs:
1 - The URL for a web page
2 - The name of a text file containing some regular expressions
My function should read the text file line by line (each line being a different regex) and then it should execute the given regex on the web page source code. However, I've ran in to trouble doing this:
example
Suppose I want the address contained on a Yelp with URL = http://www.yelp.com/biz/liberty-grill-cork
where the regex is \<address\>\s*([^<]*)\\b\s*<. In Python, I then run:
address = re.search('\<address\>\s*([^<]*)\\b\s*<', web_page_source_code)
The above will work, however, if I just write the regex in a text file as is, and then read the regex from the text file, then it won't work. So reading the regex from a text file is what is causing the problem, how can I rectify this?
EDIT: This is how I'm reading the regexes from the text file:
with open("test_file.txt","r") as file:
for regex in file:
address = re.search(regex, web_page_source_code)
Just to add, the reason I want to read regexes from a text file is so that my function code can stay the same and I can alter my list of regexes easily. If anyone can suggest any other alternatives that would be great.
Your string has some backlashes and other things escaped to avoid special meaning in Python string, not only the regex itself.
You can easily verify what happens when you print the string you load from the file. If your backslashes doubled, you did it wrong.
The text you want in the file is:
File
\<address\>\s*([^<]*)\b\s*<
Here's how you can check it
In [1]: a = open('testfile.txt')
In [2]: line = a.readline()
-- this is the line as you'd see it in python code when properly escaped
In [3]: line
Out[3]: '\\<address\\>\\s*([^<]*)\\b\\s*<\n'
-- this is what it actually means (what re will use)
In [4]: print(line)
\<address\>\s*([^<]*)\b\s*<
OK, I managed to get it working. For anyone who wants to read regular expressions from text files, you need to do the following:
Ensure that regex in the text file is entered in the right format (thanks to MightyPork for pointing that out)
You also need to remove the newline '\n' character at the end
So overall, your code should look something like:
a = open("test_file.txt","r")
line = a.readline()
line = line.strip('\n')
result = re.search(line,page_source_code)

Python write to a file r

I am trying to write some basic lines to a text file using Python 3.3.2 (complete beginner here).Not sure why a number returns after the write command line. The number seems to be the length of the string. The string does get stored into the new text file and everything else seems to be okay.
Also :
>>> f=open('testfile.txt','w')
>>> f.write('this is line 1\n')
15
So, the number '15'.. not sure what it means. Every line I write would return an integer.
From docs:
f.write(string) writes the contents of string to the file, returning
the number of characters written.
A better way to write this would be
print('this is line 1', file=f)
Because it is more flexible (it takes any input type, not just str) and it automatically adds the newline character. As an added bonus, it won't echo anything in the python shell.

Music note appended to newlines Python

Specifically I have exported a csv file from Google Adwords.
I read the file line by line and change the phone numbers.
Here is the literal script:
for line in open('ads.csv', 'r'):
newdata = changeNums(line)
sys.stdout.write(newdata)
And changeNums() just performs some string replaces and returns the string.
The problem is at the end of the printed newlines is a musical note.
The original CSV does not have this note at the end of lines. Also, I cannot copy-paste the note.
Is this some kind of encoding issue or what's going on?
Try opening with universal line support:
for line in open('ads.csv', 'rU'):
# etc
Either:
the original file has some characters on it (and they're being show as this symbol in the terminal)
changeNums is creating those characters
stdout.write is sending some non interpreted newline symbol, that again is being shown by the terminal as this symbol, change this line to a print(newdata)
My guess: changeNums is adding it.
Best debugging commands:
print([ord(x) for x in line])
print([ord(x) for x in newdata])
print line == newdata
And check for the character values present in the string.
You can strip out the newlines by:
for line in open('ads.csv', 'r'):
line = line.rstrip('\n')
newdata = changeNums(line)
sys.stdout.write(newdata)
An odd "note" character at the end is usually a CR/LF newline issue between *nix and *dos/*win environments.

Categories

Resources