With the following code I want to > open a file > read the contents and strip the non-required lines > then write the data to the file and also read the file for downstream analyses.
with open("chr2_head25.gtf", 'r') as f,\
open('test_output.txt', 'w+') as f2:
for lines in f:
if not lines.startswith('#'):
f2.write(lines)
f2.close()
Now, I want to read the f2 data and do further processing in pandas or other modules but I am running into a problem while reading the data(f2).
data = f2 # doesn't work
print(data) #gives
<_io.TextIOWrapper name='test_output.txt' mode='w+' encoding='UTF-8'>
data = io.StringIO(f2) # doesn't work
# Error message
Traceback (most recent call last):
File "/home/everestial007/PycharmProjects/stitcher/pHASE-Stitcher-Markov/markov_final_test/phase_to_vcf.py", line 64, in <module>
data = io.StringIO(f2)
TypeError: initial_value must be str or None, not _io.TextIOWrapper
The file is already closed (when the previous with block finishes), so you cannot do anything more to the file. To reopen the file, create another with statement and use the read attribute to read the file.
with open('test_output.txt', 'r') as f2:
data = f2.read()
print(data)
Related
I am trying to create a new file using open, mode 'x' called my_file.txt, write a line into it, and then read the file. I am getting a traceback on the first line, stating that my_file.txt exists, where it does not. Your help is appreciated!
new_file = open("my_file.txt", 'x')
new_file.close()
new_file = open("my_file.txt", 'w')
new_file.write('Let us be happy')
new_file.close()
new_file = open('my_file.txt', 'r')
for line in new_file:
line = line.strip()
print(line)
x open for exclusive creation, failing if the file already exists
and don't have to close file after each command just use "WITH" keyword to close file automatically.
I am trying to convert a large text file (size of 5 gig+) but got a
From this post, I managed to convert encoding format of a text file into a format that is readable with this:
path ='path/to/file'
des_path = 'path/to/store/file'
for filename in os.listdir(path):
with open('{}/{}'.format(path, filename), 'r+', encoding='iso-8859-11') as f:
t = open('{}/{}'.format(des_path, filename), 'w')
string = f.read()
t.write(string)
t.close()
The problem here is that when I tried to convert a text file with a large size(5 GB+). I will got this error
Traceback (most recent call last):
File "Desktop/convertfile.py", line 12, in <module>
string = f.read()
File "/usr/lib/python3.6/encodings/iso8859_11.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
MemoryError
which I know that it cannot read a file with this large. And I found from several link that I can do it by reading line by line.
So, how can I apply to the code I have to make it read line by line? What I understand about reading line by line here is that I need to read a line from f and add it to t until end of the line, right?
You can iterate on the lines of an open file.
for filename in os.listdir(path):
inp, out = open_files(filename):
for line in inp:
out.write(line)
inp.close(), out.close()
Note that I've hidden the complexity of the different paths, encodings, modes in a function that I suggest you to actually write...
Re buffering, i.e. reading/writing larger chunks of the text, Python does its own buffering undercover so this shouldn't be too slow with respect to a more complex solution.
I am making a call to the AWS API using boto3 and Python and I am writing the JSON response to a JSONfile. I am then trying to convert the JSON file to a CSV file. When I attempt to do this with the csv writer() method, I get the above error and I am not sure why.
Code:
def ResponseConvert():
dynamo = boto3.client('dynamodb')
response = dynamo.scan(
TableName='XXXX'
)
with open('vuln_data.json', 'w') as outfile:
json.dump(response, outfile, indent=4)
f = open('vuln_data.json')
data = json.load(f)
f.close()
f = csv.writer(open('vuln_data.csv', 'wb+'))
f.writerow(data.keys())
for row in data:
f.writerow(row.values())
ResponseConvert()
Traceback:
Traceback (most recent call last):
File "response_convert.py", line 21, in <module>
ResponseConvert()
File "response_convert.py", line 19, in ResponseConvert
f.writerow(row.values())
AttributeError: 'unicode' object has no attribute 'values'
CSV writers expect a file handle, not a filename.
with open('filename.csv', 'w') as f:
writer = csv.writer(f)
...
You probably want a DictWriter instead, by the way. Don't rely on the order of keys and values matching up.
I need to clean *.txt files from a folder. Some files do not have the right data format and python throws a traceback. how do I skip the improper files and resume with the others without having to remove it and re-run the program.
for filename in glob.glob('datafolder/*.txt'):
inputfile = open(filename, 'r')
npv = []
for i, line in inputfile:
npv.append(line[34:36]) # the site of fault
The error is:
Traceback (most recent call last):
File "dataprep.py", line 51, in <module>
npv.append(int(line[34:36]))
ValueError: invalid literal for int() with base 10: ''
I want to drop the current 'filename' and go ahead with the next 'filename'.
Catch the ValueError and break the for-loop.
for filename in glob.glob('datafolder/*.txt'):
inputfile = open(filename, 'r')
npv = []
for i, line in inputfile:
try:
npv.append(line[34:36]) # the site of fault
except ValueError:
break
Alternatively, you could pass the line and read other lines in the file if that makes sense to your task.
my python program gets via ssh a mail log data. When I try to go over it line per line with
with text as f:
for line in f:
try:
.... regex stuff....
I get the error:
Traceback (most recent call last):
File "xxxxxxxxxxxxxxxxxxxxxxxx", line 90, in <module>
start()
File "xxxxxxxxxxxxxxxxxxxxxxxx", line 64, in start
with text as f:
AttributeError: __exit__
That doesn't work, the only solution which works for me is the following. When I save the text as file and open it again. But the file is about 1.24 MB big, which slows the program unnecessarely. Anyone know how I can get rid of the extra saving?
....
stdin, stdout, stderr = ssh.exec_command('sudo cat /var/log/mailing.log')
text = ''.join(stdout.readlines())
text_file = open('mail.log', 'w')
text_file.write(text)
text_file.close()
ssh.close()
with open('mail.log') as f:
for line in f:
try:
.... regex stuff....
text is a string with data. You can't open that. Instead of opening it you should just loop over it
for line in text.splitlines():
try:
.... regex stuff....
You probably want to look at StringIO from the standard library, which makes a string look more or less like a file.
Or, you could just say
for line in f.split('\n'):
try:
<regex stuff>