Keeping format of text (.txt) files when reading and rewriting - python

I have a .txt file containing formatting elements as \n for line breaks which I want to read and then rewrite its data until a specific line back to a new .txt file. My code looks like this:
with open (filename) as f:
content=f.readlines()
with open("lf.txt", "w") as file1:
file1.write(str(content))
file1.close
The output file lf.txt is produced correctly but it throws away the formatting of the input file. Is there a way to keep the formatting of file 1 when rewriting it to a new file?

You converted content to a string, while it's really a list of strings (lines).
Use join to convert the lines back to a string:
file1.write(''.join(content))
join is a string method, and it is activated in the example from an empty string object. The string calling this method is used as a separator for the strings joining process. In this situation we don't need any separator, just joining the strings.

Related

How to remove erronous tabs/new line from .vcf file?

I am working with a vcf file. I try to extract information from this file, but the file has errors in the format.
In this file there is a column that contains long character strings. The error is, that a number of tabs and a new line character are erronously placed within some rows of this column. So when I try to read in this tab delimited file, all columns are messed up.
I have an idea how to solve this, but don't know how to execute it in code. The string is DNA, so always has ATCG. Basically, if one could look for a number of tabs and a newline within characters ATCG and remove them, then the file is fixed:
ACTGCTGA\t\t\t\t\nCTGATCGA would become:
ACTGCTGACTGATCGA
So one would need to look into this file, look for [ACTG] followed by tabs or newlines, followed by more [ACTG], and then replace this with nothing. Any idea how to do this?
with open(file.vcf, 'r') as f:
lines = [l for l in f if not l.startswith('##')]
Here's one way with regex:
First read the file in:
import re
with open('file.vcf', 'r') as file:
dnafile = file.read()
Then write a new file with the changes:
with open('fileNew.vcf', 'w') as file:
file.write(re.sub("(?<=[ACTG]{2})((\\t)*(\\n)*)(?=[ACTG]{2})", "", dnafile))

Reading an Excel spreadsheet of Regular expressions

I'm creating a program to parse data. My dictionary is growing quite long. Therefore, I'd like to save it as a file that can be read in. Preferably xlsx, but a txt file will work too. Besides cleaning up the program, this will also allow me to call different dictionaries depending on what data is to be extracted.
The dictionary that looks like this:
import re
import pandas as pd
my_Dict = {
'cat': re.compile(r'CAT (?P<cat>.*)\n'),
'dog': re.compile(r'DOG (?P<dog>.*)\n'),
'mouse': re.compile(r'MOUSE (?P<mouse>.*)\n'),
}
What's the best format to put this in an xlsx or txt form to make it most easily readable? Then how you read it in to use as a dictionary?
I've been able to write this dictionary to a file, but it never reads back in how I just wrote it.
Thanks!
I would recommend a Comma Separated Value (.csv) file. You can treat it as a plain text file or open it in Excel without much difficulty.
Your dict would look like:
cat, CAT (?P<cat>.*)\n
dog, DOG (?P<dog>.*)\n
mouse, MOUSE (?P<mouse>.*)\n
As far as reading it, you would just need to loop over the lines and separate them at the comma, using the first part as the key and the second as the value.
my_dict = {}
with open(filename) as f:
for line in f:
# Split the line on the comma
split_line = line.split(',')
# .strip() removes either specified characters or, if not argument is given,
# leading and trailing whitespace
my_dict[split_line[0].strip()] = re.compile(split_line[1].strip())
However, if you need to include commas in your regexes or names, this will break. In that case, a Tab Seperated Value (.tsv) file would probably work. Instead of splitting on ',', you would instead split on '\t'.
If neither of these work, you can split on just about any arbitrary character, however MS Excel will recognize and be able to open both .csv and .tsv files readily.

strings converted to floats in a csv file

When I write a certain fraction in a csv file it gets automatically calculated whereas my requirement is to keep it as it is.
This is my try:
import csv
ft = "-1/-1.5" #or -1/-1.5 (removing the quotes)
print(ft)
with open("outputfile.csv","w",newline="") as infile:
writer = csv.writer(infile)
writer.writerow([ft])
Console prints it when in quotes:
-1/-1.5
However, when I write them same in a csv file it becomes like the following no matter when I try using quotes or without quotes.
0.666666667
How can I write the same in a csv file like -1/-1.5?
See the image below (this is what I'm getting right now):
If i try use a ' in a cell and then write the value, the output serves the purpose. Can I not do it programmatically?
As OP mentioned in comments use:
writer.writerow([f"'{ft}"])
will add formatting to the csv output file so M$ Excel will display the string values in string ft retaining their original string format.

python csv to tsv: in case record has comma inside

I'm trying to convert csv format to tsv.
I just used this to change comma to tab.
tsv = re.sub(',', '\t', csv)
But I can't deal with the string with comma inside, for example:
dept_no,dt,hello
1,20180411,hello its me
2,20180412,here has tab
3,20180412,"here, is, commas"
Is there any way to convert to tsv without affecting comma inside of the string?
Try following, csv module should take care of inline commas.
import csv
csv.writer(file('output.tsv', 'w+'), delimiter='\t').writerows(csv.reader(open("input.csv")))
EDIT-1
[Added open method as per discussion]
Insted of using file where we are calling constructor directly, you can use open which is preferred way as per documentation here.
Result is same.
csv.writer(open('output.tsv', 'w+'), delimiter='\t').writerows(csv.reader(open("input.csv")))
Result

How can I read a text file without any newline or tab characters?

I have a rather long sql query that I want to use in Python.
Rather than type the sql query, or paste it, I'd like to read it from a text file.
However, when I do something like
with open(file,'r') as f:
s = f.read()
The resulting string has new line characters. How can I read in the string without any of the formating characters?
Note: I could just do s.replace('\n',' ' ) but suppose the file is too large to read and pick out which characters occur.

Categories

Resources