Python removing all \r\n within quotes in CSV file

Python removing all \r\n within quotes in CSV file - python

I have a CSV file that has some data in it. I want to replace all the newlines within "" by some character. But the new lines outside of these quotes should stay. What is the best way to achieve this?
import sys, getopt
def main(argv):
inputfile = ''
outputfile = ''
print(argv[0:])
inputfile = argv[0:]
file_object = open(argv[0:], "r")
print(file_object)
data = file.read(file_object)
strings = data.split('"')[1::2]
for string in strings:
string.replace("\r", "")
string.replace("\n", "")
print(string)
f = open("output.csv", "w")
for string in strings:
string = string.replace("\r", "")
string = string.replace("\n", "")
f.write(string)
f.close()
if __name__ == "__main__":
main(sys.argv[1])
This does not quite work, since the "" get lost as well as the ,'s.
Expected input:
“dssdlkfjsdfj \r\n ashdiowuqhduwqh \r\n”,
"3"
Expected output:
"dssdlkfjsdfj ashdiowuqhduwqh",
"3"

A real sample would help, but given in.csv:
"multi
line
data","more data"
"more multi
line data","other data"
The following will replace newlines in quotes:
import csv
with open('in.csv',newline='') as fin:
with open('out.csv','w',newline='') as fout:
r = csv.reader(fin)
w = csv.writer(fout)
for row in r:
row = [col.replace('\r\n','**') for col in row]
w.writerow(row)
out.csv:
multi**line**data,more data
more multi**line data,other data

The problem got solved in a very easy way. Create an output file, and read the input file for each character. Write each character to the output file, but toggle replace mode by using the ~ operator when a " appears. When in replace mode, replace all \r\n with '' (nothing).

Related

Removing quotes when writing list items to CSV

What I am trying to do is remove the quotes while writing the data to a new CSV file.
I have tried using s.splits, and .replaces with no luck. Can you guys point me in the right direction?
Current Code:
def createParam():
with open('testcsv.csv', 'r') as f:
reader = csv.reader(f)
csvList = list(reader)
for item in csvList:
os.mkdir(r"C:\Users\jefhill\Desktop\Test Path\\" + item[0])
with open(r"C:\Users\jefhill\Desktop\Test Path\\" + item[0] + r"\prm.263", "w+") as f:
csv.writer(f).writerow(item[1:])
f.close
Data within testcsv.csv:
0116,"139,data1"
0123,"139,data2"
0130,"35,data678"
Data output when script is ran (in each individual file):
"139,data1"
"139,data2"
"35,data678"
Data I would like:
139,data1
139,data2
35,data678

You can use str.replace to replace the " (double quotes) with '' (null).
Then split and print all but first item in the list.
with open('outputfile.csv', w) as outfile: # open the result file to be written
with open('testcsv.csv', 'r') as infile: # open the input file
for line in infile: # iterate through each line in input file
newline = line.replace('"', '') # replace double quotes with no space
outfile.write(newline.split(',',maxsplit=1)[1]) # write second element to output file after splitting the newline once
You don't need f.close() when you use with open...

python3.5.2 deleting all matching characters from a file

Given the following exemple how can i remove all "a" characters from a file that have the following content:
asdasdasd \n d1233sss \n aaa \n 123
I wrote the following solution but it does not work:
with open("testfisier","r+") as file:
for line in file:
for index in range(len(line)):
if line[index] is "a":line[index].replace("a","")

There weren't any changes because you didn't write it back to the file.
with open("testfisier", "r+") as file:
for line in file:
for index in range(len(line)):
if line[index] is "a":
replace_file = line[index].replace("a", "")
# Write the changes.
file.write(replace_file)
Or:
with open("testfisier", "r+") as f:
f.write(f.read().replace("a", ""))

Try using regexp substitution. For instance, assuming you have read in the string and named it a_string
import re
re.sub('a','',a_string,'')
This would be one of many possible solutions.
Hope this helps!

You can try this:
import re
data = open("testfisier").read()
final_data = re.sub('a+', '', data)

You can call replace on a long string. No need to call it on single chars. Also, replace does not change a string, but returns a new one:
with open("testfisier", "r+") as file:
text = file.read()
text = text.replace("a", "") # replace a's in the entire text
file.seek(0) # move file pointer back to start
file.write(text)

python: write 'backslash double-quote' string into file

I've got an input file with some string containing double quotes in it, and want to generate a C-style header file with Python.
Say,
input file: Hello "Bob"
output file: Hello \"Bob\"
I can't write the code to obtain such a file, here's what I've tried so far:
key = 'key'
val = 'Hello "Bob"'
buf_list = list()
...
val = val.replace('"', b'\x5c\x22')
# also tried: val = val.replace('"', r'\"')
# also tried: val = val.replace('"', '\\"')
buf_list.append((key + '="' + val + '";\n').encode('utf-8'))
...
for keyval in buf_list:
lang_file.write(keyval)
lang_file.close()
The output file always contains:
Hello \\\"Bob\\\"
I had no problems writing \n, \t strings into the output file.
It seems I can only write zero or two backslashes, can someone help please ?

You need to escape both the double-quote and the backslash. The following works for me (using Python 2.7):
with open('temp.txt', 'r') as f:
data = f.read()
with open('temp2.txt', 'w') as g:
g.write(data.replace('\"', '\\\"'))

The conversion of string to raw string during replacement should do.
a='Hello "Bob"'
print a.replace('"', r'\"')
The above will give you:
Hello \"Bob\"

Remove special characters from csv file using python

There seems to something on this topic already (How to replace all those Special Characters with white spaces in python?), but I can't figure this simple task out for the life of me.
I have a .CSV file with 75 columns and almost 4000 rows. I need to replace all the 'special characters' ($ # & * ect) with '_' and write to a new file. Here's what I have so far:
import csv
input = open('C:/Temp/Data.csv', 'rb')
lines = csv.reader(input)
output = open('C:/Temp/Data_out1.csv', 'wb')
writer = csv.writer(output)
conversion = '-"/.$'
text = input.read()
newtext = '_'
for c in text:
newtext += '_' if c in conversion else c
writer.writerow(c)
input.close()
output.close()
All this succeeds in doing is to write everything to the output file as a single column, producing over 65K rows. Additionally, the special characters are still present!
Sorry for the redundant question.
Thank you in advance!

I might do something like
import csv
with open("special.csv", "rb") as infile, open("repaired.csv", "wb") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
conversion = set('_"/.$')
for row in reader:
newrow = [''.join('_' if c in conversion else c for c in entry) for entry in row]
writer.writerow(newrow)
which turns
$ cat special.csv
th$s,2.3/,will-be
fixed.,even.though,maybe
some,"shoul""dn't",be
(note that I have a quoted value) into
$ cat repaired.csv
th_s,2_3_,will-be
fixed_,even_though,maybe
some,shoul_dn't,be
Right now, your code is reading in the entire text into one big line:
text = input.read()
Starting from a _ character:
newtext = '_'
Looping over every single character in text:
for c in text:
Add the corrected character to newtext (very slowly):
newtext += '_' if c in conversion else c
And then write the original character (?), as a column, to a new csv:
writer.writerow(c)
.. which is unlikely to be what you want. :^)

This doesn't seem to need to deal with CSV's in particular (as long as the special characters aren't your column delimiters).
lines = []
with open('C:/Temp/Data.csv', 'r') as input:
lines = input.readlines()
conversion = '-"/.$'
newtext = '_'
outputLines = []
for line in lines:
temp = line[:]
for c in conversion:
temp = temp.replace(c, newtext)
outputLines.append(temp)
with open('C:/Temp/Data_out1.csv', 'w') as output:
for line in outputLines:
output.write(line + "\n")

In addition to the bug pointed out by #Nisan.H and the valid point made by #dckrooney that you may not need to treat the file in a special way in this case just because it is a CSV file (but see my comment below):
writer.writerow() should take a sequence of strings, each of which would be written out separated by commas (see here). In your case you are writing a single string.
This code is setting up to read from 'C:/Temp/Data.csv' in two ways - through input and through lines but it only actually reads from input (therefore the code does not deal with the file as a CSV file anyway).
The code appends characters to newtext and writes out each version of that variable. Thus, the first version of newtext would be 1 character long, the second 2 characters long, the third 3 characters long, etc.
Finally, given that a CSV file can have quote marks in it, it may actually be necessary to deal with the input file specifically as a CSV to avoid replacing quote marks that you want to keep, e.g. quote marks that are there to protect commas that exist within fields of the CSV file. In that case, it would be necessary to process each field of the CSV file individually, then write each row out to the new CSV file.

Maybe try
s = open('myfile.cv','r').read()
chars = ('$','%','^','*') # etc
for c in chars:
s = '_'.join( s.split(c) )
out_file = open('myfile_new.cv','w')
out_file.write(s)
out_file.close()

How to read a text file into a string variable and strip newlines?

I have a text file that looks like:
ABC
DEF
How can I read the file into a single-line string without newlines, in this case creating a string 'ABCDEF'?
For reading the file into a list of lines, but removing the trailing newline character from each line, see How to read a file without newlines?.

You could use:
with open('data.txt', 'r') as file:
data = file.read().replace('\n', '')
Or if the file content is guaranteed to be one-line
with open('data.txt', 'r') as file:
data = file.read().rstrip()

In Python 3.5 or later, using pathlib you can copy text file contents into a variable and close the file in one line:
from pathlib import Path
txt = Path('data.txt').read_text()
and then you can use str.replace to remove the newlines:
txt = txt.replace('\n', '')

You can read from a file in one line:
str = open('very_Important.txt', 'r').read()
Please note that this does not close the file explicitly.
CPython will close the file when it exits as part of the garbage collection.
But other python implementations won't. To write portable code, it is better to use with or close the file explicitly. Short is not always better. See https://stackoverflow.com/a/7396043/362951

To join all lines into a string and remove new lines, I normally use :
with open('t.txt') as f:
s = " ".join([l.rstrip("\n") for l in f])

with open("data.txt") as myfile:
data="".join(line.rstrip() for line in myfile)
join() will join a list of strings, and rstrip() with no arguments will trim whitespace, including newlines, from the end of strings.

This can be done using the read() method :
text_as_string = open('Your_Text_File.txt', 'r').read()
Or as the default mode itself is 'r' (read) so simply use,
text_as_string = open('Your_Text_File.txt').read()

I'm surprised nobody mentioned splitlines() yet.
with open ("data.txt", "r") as myfile:
data = myfile.read().splitlines()
Variable data is now a list that looks like this when printed:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
Note there are no newlines (\n).
At that point, it sounds like you want to print back the lines to console, which you can achieve with a for loop:
for line in data:
print(line)

It's hard to tell exactly what you're after, but something like this should get you started:
with open ("data.txt", "r") as myfile:
data = ' '.join([line.replace('\n', '') for line in myfile.readlines()])

I have fiddled around with this for a while and have prefer to use use read in combination with rstrip. Without rstrip("\n"), Python adds a newline to the end of the string, which in most cases is not very useful.
with open("myfile.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)

Here are four codes for you to choose one:
with open("my_text_file.txt", "r") as file:
data = file.read().replace("\n", "")
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().split("\n"))
or
with open("my_text_file.txt", "r") as file:
data = "".join(file.read().splitlines())
or
with open("my_text_file.txt", "r") as file:
data = "".join([line for line in file])

you can compress this into one into two lines of code!!!
content = open('filepath','r').read().replace('\n',' ')
print(content)
if your file reads:
hello how are you?
who are you?
blank blank
python output
hello how are you? who are you? blank blank

You can also strip each line and concatenate into a final string.
myfile = open("data.txt","r")
data = ""
lines = myfile.readlines()
for line in lines:
data = data + line.strip();
This would also work out just fine.

This is a one line, copy-pasteable solution that also closes the file object:
_ = open('data.txt', 'r'); data = _.read(); _.close()

f = open('data.txt','r')
string = ""
while 1:
line = f.readline()
if not line:break
string += line
f.close()
print(string)

python3: Google "list comprehension" if the square bracket syntax is new to you.
with open('data.txt') as f:
lines = [ line.strip('\n') for line in list(f) ]

Oneliner:
List: "".join([line.rstrip('\n') for line in open('file.txt')])
Generator: "".join((line.rstrip('\n') for line in open('file.txt')))
List is faster than generator but heavier on memory. Generators are slower than lists and is lighter for memory like iterating over lines. In case of "".join(), I think both should work well. .join() function should be removed to get list or generator respectively.
Note: close() / closing of file descriptor probably not needed

Have you tried this?
x = "yourfilename.txt"
y = open(x, 'r').read()
print(y)

To remove line breaks using Python you can use replace function of a string.
This example removes all 3 types of line breaks:
my_string = open('lala.json').read()
print(my_string)
my_string = my_string.replace("\r","").replace("\n","")
print(my_string)
Example file is:
{
"lala": "lulu",
"foo": "bar"
}
You can try it using this replay scenario:
https://repl.it/repls/AnnualJointHardware

I don't feel that anyone addressed the [ ] part of your question. When you read each line into your variable, because there were multiple lines before you replaced the \n with '' you ended up creating a list. If you have a variable of x and print it out just by
x
or print(x)
or str(x)
You will see the entire list with the brackets. If you call each element of the (array of sorts)
x[0]
then it omits the brackets. If you use the str() function you will see just the data and not the '' either.
str(x[0])

Maybe you could try this? I use this in my programs.
Data= open ('data.txt', 'r')
data = Data.readlines()
for i in range(len(data)):
data[i] = data[i].strip()+ ' '
data = ''.join(data).strip()

Regular expression works too:
import re
with open("depression.txt") as f:
l = re.split(' ', re.sub('\n',' ', f.read()))[:-1]
print (l)
['I', 'feel', 'empty', 'and', 'dead', 'inside']

with open('data.txt', 'r') as file:
data = [line.strip('\n') for line in file.readlines()]
data = ''.join(data)

from pathlib import Path
line_lst = Path("to/the/file.txt").read_text().splitlines()
Is the best way to get all the lines of a file, the '\n' are already stripped by the splitlines() (which smartly recognize win/mac/unix lines types).
But if nonetheless you want to strip each lines:
line_lst = [line.strip() for line in txt = Path("to/the/file.txt").read_text().splitlines()]
strip() was just a useful exemple, but you can process your line as you please.
At the end, you just want concatenated text ?
txt = ''.join(Path("to/the/file.txt").read_text().splitlines())

This works:
Change your file to:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE
Then:
file = open("file.txt")
line = file.read()
words = line.split()
This creates a list named words that equals:
['LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN', 'GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE']
That got rid of the "\n". To answer the part about the brackets getting in your way, just do this:
for word in words: # Assuming words is the list above
print word # Prints each word in file on a different line
Or:
print words[0] + ",", words[1] # Note that the "+" symbol indicates no spaces
#The comma not in parentheses indicates a space
This returns:
LLKKKKKKKKMMMMMMMMNNNNNNNNNNNNN, GGGGGGGGGHHHHHHHHHHHHHHHHHHHHEEEEEEEE

with open(player_name, 'r') as myfile:
data=myfile.readline()
list=data.split(" ")
word=list[0]
This code will help you to read the first line and then using the list and split option you can convert the first line word separated by space to be stored in a list.
Than you can easily access any word, or even store it in a string.
You can also do the same thing with using a for loop.

file = open("myfile.txt", "r")
lines = file.readlines()
str = '' #string declaration
for i in range(len(lines)):
str += lines[i].rstrip('\n') + ' '
print str

Try the following:
with open('data.txt', 'r') as myfile:
data = myfile.read()
sentences = data.split('\\n')
for sentence in sentences:
print(sentence)
Caution: It does not remove the \n. It is just for viewing the text as if there were no \n

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python removing all \r\n within quotes in CSV file - python

The problem got solved in a very easy way. Create an output file, and read the input file for each character. Write each character to the output file, but toggle replace mode by using the ~ operator when a " appears. When in replace mode, replace all \r\n with '' (nothing).

Related

Removing quotes when writing list items to CSV

python3.5.2 deleting all matching characters from a file

python: write 'backslash double-quote' string into file

Remove special characters from csv file using python

How to read a text file into a string variable and strip newlines?

Categories

Resources