Trouble concatenating two strings - python

I am having trouble concatenating two strings. This is my code:
info = infl.readline()
while True:
line = infl.readline()
outfl.write(info + line)
print info + line
The trouble is that the output appears on two different lines. For example, output text looks like this:
490250633800 802788.0 953598.2
802781.968872 953674.839355 193.811523 1 0.126805 -999.000000 -999.000000 -999.000000
I want both strings on the same line.

There must be a '\n' character at the end of info. You can remove it with:
info = infl.readline().rstrip()

You should remove line breaks in the line and info variables like this :
line=line.replace("\n","")

readline will return a "\n" at the end of the string 99.99% of the time. You can get around this by calling rstrip on the result.
info = infl.readline().rstip()
while True:
#put it both places!
line = infl.readline().rstip()
outfl.write(info + line)
print info + line
readline's docs:
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line)...

Related

how to replace (or delete) a part of string from txt file in python

i am very new in python (and programming in general) and here is my issue. i would like to replace (or delete) a part of a string from a txt file which contains hundreds or thousands of lines. each line starts with the very same string which i want to delete.
i have not found a method to delete it so i tried a replace it with empty string but for some reason it doesn't work.
here is what i have written:
file = "C:/Users/experimental/Desktop/testfile siera.txt"
siera_log = open(file)
text_to_replace = "Chart: Bar Backtest: NQU8-CME [CB] 1 Min #1 | Study: free dll = 0 |"
for each_line in siera_log:
new_line = each_line.replace("text_to_replace", " ")
print(new_line)
when i print it to check if it was done, i can see that the lines are as they were before. no change was made.
can anyone help me to find out why?
each line starts with the very same string which i want to delete.
The problem is you're passing a string "text_to_replace" rather than the variable text_to_replace.
But, for this specific problem, you could just remove the first n characters from each line:
text_to_replace = "Chart: Bar Backtest: NQU8-CME [CB] 1 Min #1 | Study: free dll = 0 |"
n = len(text_to_replace)
for each_line in siera_log:
new_line = each_line[n:]
print(new_line)
If you quote a variable it becomes a string literal and won't be evaluated as a variable.
Change your line for replacement to:
new_line = each_line.replace(text_to_replace, " ")

Python DNA sequence slice gives \N as wrong content in slice result

I am surprising, I am using python to slice a long DNA Sequence (4699673 character)to a specific length supstring, it's working properly with a problem in result, after 71 good result \n start apear in result for few slices then correct slices again and so on for whole long file
the code:
import sys
filename = open("out_filePU.txt",'w')
sys.stdout = filename
my_file = open("GCF_000005845.2_ASM584v2_genomic_edited.fna")
st = my_file.read()
length = len(st)
print ( 'Sequence Length is, :' ,length)
for i in range(0,len(st[:-9])):
print(st[i:i+9], i)
figure shows the error from the result file
please i need advice on that.
Your sequence file contains multiple lines, and at the end of each line there is a line break \n. You can remove them with st = my_file.read().replace("\n", "").
Try st = re.sub('\\s', '', my_file.read()) to replace any newlines or other whitespace (you'll need to add import re at the top of your script).
Then for i in range(0,len(st[:-9]),9): to step through your data in increments of nine characters. Otherwise you're only advancing by one character each time: that's why you can see the diagonal patterns in your output.

To add a new line before a set of characters in a line using python

I have a line of huge characters in which a set of characters keep repeating. The line is : qwethisistheimportantpartqwethisisthesecondimportantpart
There are no spaces in the string. I want to add a new line before the string 'qwe' so that I can distinguish every important part from the other.
Output :
qwethisistheimportantpart
qwethisisthesecondimportantpart
I tried using
for line in infile:
if line.startswith("qwe"):
line="\n" + line
and it doesn't seem to work
str.replace() can do what you want:
line = 'qwethisistheimportantpartqwethisisthesecondimportantpart'
line = line.replace('qwe', '\nqwe')
print(line)
You can use re.split() and then join with \nqwe:
import re
s = "qwethisistheimportantpartqwethisisthesecondimportantpart"
print '\nqwe'.join(re.split('qwe', s))
Output:
qwethisistheimportantpart
qwethisisthesecondimportantpart
I hope this will help you
string = 'qwethisistheimportantpartqwethisisthesecondimportantpart'
split_factor = 'qwe'
a , b , c = map(str,string.split(split_factor))
print split_factor + b
print split_factor + c
Implemented in Python 2.7
This yields same output as you have mentioned buddy.
output:
qwethisistheimportantpart
qwethisisthesecondimportantpart

regular expressions in python using quotes

I am attempting to create a regular expression pattern for strings similar to the below which are stored in a file. The aim is to get any column for any row, the rows need not be on a single line. So for example, consider the following file:
"column1a","column2a","column
3a,", #entity 1
"column\"this is, a test\"4a"
"column1b","colu
mn2b,","column3b", #entity 2
"column\"this is, a test\"4b"
"column1c,","column2c","column3c", #entity 3
"column\"this is, a test\"4c"
Each entity consists of four columns, column 4 for entity 2 would be "column\"this is, a test\"4b", column 2 for entity 3 would be "column2c". Each column begins with a quote and closes with a quote, however you must be careful because some columns have escaped quotes. Thanks in advance!
You could do like this, ie
Read the whole file.
Split the input according to the newline character which was not preceded by a comma.
Iterate over the spitted elements and again do splitting on the comma (and also the following optional newline character) which was preceded and followed by double quotes.
Code:
import re
with open(file) as f:
fil = f.read()
m = re.split(r'(?<!,)\n', fil.strip())
for i in m:
print(re.split('(?<="),\n?(?=")', i))
Output:
['"column1a"', '"column2a"', '"column3a,"', '"column\\"this is, a test\\"4a"']
['"column1b"', '"column2b,"', '"column3b"', '"column\\"this is, a test\\"4b"']
['"column1c,"', '"column2c"', '"column3c"', '"column\\"this is, a test\\"4c"']
Here is the check..
$ cat f
"column1a","column2a","column3a,",
"column\"this is, a test\"4a"
"column1b","column2b,","column3b",
"column\"this is, a test\"4b"
"column1c,","column2c","column3c",
"column\"this is, a test\"4c"
$ python3 f.py
['"column1a"', '"column2a"', '"column3a,"', '"column\\"this is, a test\\"4a"']
['"column1b"', '"column2b,"', '"column3b"', '"column\\"this is, a test\\"4b"']
['"column1c,"', '"column2c"', '"column3c"', '"column\\"this is, a test\\"4c"']
f is the input file name and f.py is the file-name which contains the python script.
Your problem is terribly familiar to what I have to deal thrice every month :) Except I'm not using python to solve it, but I can 'translate' what I usually do:
text = r'''"column1a","column2a","column
3a,",
"column\"this is, a test\"4a"
"column1a2","column2a2","column3a2","column4a2"
"column1b","colu
mn2b,","column3b",
"column\"this is, a test\"4b"
"column1c,","column2c","column3c",
"column\"this is, a test\"4c"'''
import re
# Number of columns one line is supposed to have
columns = 4
# Temporary variable to hold partial lines
buffer = ""
# Our regex to check for each column
check = re.compile(r'"(?:[^"\\]*|\\.)*"')
# Read the file line by line
for line in text.split("\n"):
# If there's no stored partial line, this is a new line
if buffer == "":
# Check if we get 4 columns and print, if not, put the line
# into buffer so we store a partial line for later
if len(check.findall(line)) == columns:
print matches
else:
# use line.strip() if you need to trim whitespaces
buffer = line
else:
# Update the variable (containing a partial line) with the
# next line and recheck if we get 4 columns
# use line.strip() if you need to trim whitespaces
buffer = buffer + line
# If we indeed get 4, our line is complete and print
# We must not forget to empty buffer now that we got a whole line
if len(check.findall(buffer)) == columns:
print matches
buffer = ""
# Optional; always good to have a safety backdoor though
# If there is a problem with the csv itself like a weird unescaped
# quote, you send it somewhere else
elif len(check.findall(buffer)) > columns:
print "Error: cannot parse line:\n" + buffer
buffer = ""
ideone demo

How to append two strings in Python?

I have done this operation millions of times, just using the + operator! I have no idea why it is not working this time, it is overwriting the first part of the string with the new one! I have a list of strings and just want to concatenate them in one single string! If I run the program from Eclipse it works, from the command-line it doesn't!
The list is:
["UNH+1+XYZ:08:2:1A+%CONVID%'&\r", "ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&\r", "DUM'&\r"]
I want to discard the first and the last elements, the code is:
ediMsg = ""
count = 1
print "extract_the_info, lineList ",lineList
print "extract_the_info, len(lineList) ",len(lineList)
while (count < (len(lineList)-1)):
temp = ""
# ediMsg = ediMsg+str(lineList[count])
# print "Count "+str(count)+" ediMsg ",ediMsg
print "line value : ",lineList[count]
temp = lineList[count]
ediMsg += " "+temp
print "ediMsg : ",ediMsg
count += 1
print "count ",count
Look at the output:
extract_the_info, lineList ["UNH+1+XYZ:08:2:1A+%CONVID%'&\r", "ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&\r", "DUM'&\r"]
extract_the_info, len(lineList) 8
line value : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
ediMsg : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
count 2
line value : DUM'&
DUM'& : ORG+1A+77499505:ABC+++A+FR:EUR++123+1A'&
count 3
Why is it doing so!?
While the two answers are correct (use " ".join()), your problem (besides very ugly python code) is this:
Your strings end in "\r", which is a carriage return. Everything is fine, but when you print to the console, "\r" will make printing continue from the start of the same line, hence overwrite what was written on that line so far.
You should use the following and forget about this nightmare:
''.join(list_of_strings)
The problem is not with the concatenation of the strings (although that could use some cleaning up), but in your printing. The \r in your string has a special meaning and will overwrite previously printed strings.
Use repr(), as such:
...
print "line value : ", repr(lineList[count])
temp = lineList[count]
ediMsg += " "+temp
print "ediMsg : ", repr(ediMsg)
...
to print out your result, that will make sure any special characters doesn't mess up the output.
'\r' is the carriage return character. When you're printing out a string, a '\r' will cause the next characters to go at the start of the line.
Change this:
print "ediMsg : ",ediMsg
to:
print "ediMsg : ",repr(ediMsg)
and you will see the embedded \r values.
And while your code works, please change it to the one-liner:
ediMsg = ' '.join(lineList[1:-1])
Your problem is printing, and it is not string manipulation. Try using '\n' as last char instead of '\r' in each string in:
lineList = [
"UNH+1+TCCARQ:08:2:1A+%CONVID%'&\r",
"ORG+1A+77499505:PARAF0103+++A+FR:EUR++11730788+1A'&\r",
"DUM'&\r",
"FPT+CC::::::::N'&\r",
"CCD+CA:5132839000000027:0450'&\r",
"CPY+++AF'&\r",
"MON+712:1.00:EUR'&\r",
"UNT+8+1'\r"
]
I just gave it a quick look. It seems your problem arises when you are printing the text. I haven't done such things for a long time, but probably you only get the last line when you print. If you check the actual variable, I'm sure you'll find that the value is correct.
By last line, I'm talking about the \r you got in the text strings.

Categories

Resources