I'm confused as to why this isn't wanting to write to an outfile.
I extract data from a txt file using np.loadtxt() I try to write to an existing file, but I'm getting this error 'numpy.float64' object is not iterable' I'm not looping through a float value, rather looping through each element of the array then writing it to an existing file.
Here's my code
mass = np.loadtxt('C:/Users/Luis/Desktop/data.txt',usecols=0)
with open('test','w') as outfile:
for i in range(len(mass)):
outfile.writelines(mass[i])
could it be that the function with open() doesn't work with NumPy arrays?
Thanks
with open() is a context manager used to work with files (works with all files). The writelines() method takes str as input argument. So you do have to convert to writelines(str(mass[i])). But still the temp variable i is iterating over the line but the max value it takes is length of the file.(length of the file may not be equal to the length of each line in the file). I think this is what you want.
mass = np.loadtxt('/content/sample_data/st.txt',usecols=0)
with open('test.txt','w') as outfile:
for line in mass:
outfile.writelines(str(line))
Related
I have a zip file which contains a text file(with millions of lines). I need to read line by line, apply some transformations to each line and write to a new file and zip it.
with zipfile.ZipFile("orginal.zip") as zf, zipfile.ZipFile("new.zip", "w") as new_zip:
with io.TextIOWrapper(zf.open("orginal_file.txt"), encoding="UTF-8") as fp, open("new.txt", "w") as new_txt:
for line in fp:
new_txt.write(f"{line} - NEW") # Some transformation
new_zip.writestr("new.txt", new_txt)
But I am getting following error in new_zip.writestr("new.txt", new_txt)
TypeError: object of type '_io.TextIOWrapper' has no len()
If I do transformation using the above method, will there be any out of memory issue(since the file can have millions of lines)?
How to identify the first line(since the first line is a header record)?
When I write using new_txt.write(f"{line} - NEW"), - NEW comes first in the line(For ex. if line is 003000000011000000, the output will be - NEW003000000011000000).
How can we ensure the file integrity(for ex. to ensure whether all lines are written in the new file.)
What causes the TypeError: object of type '_io.TextIOWrapper' has no len() error?
Thank You.
When you're doing:
new_zip.writestr("new.txt", new_txt)
you are trying to write the object new_txt as some data (text or equivalent) to the zip file as the file "new.txt". But the object new_txt is already a file. That's what gives you the error: TypeError: object of type '_io.TextIOWrapper' has no len() - it's expecting some content, but getting a file object.
From the docs:
Write a file into the archive. The contents is data, which may be either a str or a bytes instance;
Instead, what you probably want to do is use write(file):
new_zip.write("new.txt")
which should write the file "new.txt" into the zip file.
Regarding your other questions:
If I do transformation using the above method, will there be any out of memory issue(since the file can have millions of lines)?
Everything is being done with files, so probably no.
How to identify the first line(since the first line is a header record)?
Use a flag that gets set in the first iteration of the line loop
When I write using new_txt.write(f"{line} - NEW"), - NEW comes first in the line(For ex. if line is 003000000011000000, the output will be - NEW003000000011000000).
You are probably missing a newline \n from you transformation logic. The NEW in the front is probably coming from the previous line you wrote. Try adding a \n to the transformation & make sure there is no existing newline at the end of the input string.
How can we ensure the file integrity(for ex. to ensure whether all lines are written in the new file.)
Count the lines? Ideally, unless some error occurs all lines should be read without you having to worry about it.
Here is the code I modified from previous code.
But, I got this error:
TypeError: must be str not list in f1.write(head)
This is the part of code that is producing this error:
from itertools import islice
with open("input.txt") as myfile:
head = list(islice(myfile, 3))
f1.write(head)
f1.close()
Well, you have it right, using islice(filename, n) will get you the first n lines of file filename. The problem here is when you try and write these lines to another file.
The error is pretty intuitive (I've added the full error one receives in this case):
TypeError: write() argument must be str, not list
This is because f.write() accepts strings as parameters, not list types.
So, instead of dumping the list as is, write the contents of it in your other file using a for loop:
with open("input.txt", "r") as myfile:
head = list(islice(myfile, 3))
# always remember, use files in a with statement
with open("output.txt", "w") as f2:
for item in head:
f2.write(item)
Granted that the contents of the list are all of type str this works like a charm; if not, you just need to wrap each item in the for loop in an str() call to make sure it is converted to a string.
If you want an approach that doesn't require a loop, you could always consider using f.writelines() instead of f.write() (and, take a look at Jon's comment for another tip with writelines).
I have a following code:
matrix_file = open("abc.txt", "rU")
matrix = matrix_file.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.write(vals)
ea.close()
However I am getting the following error:
TypeError: expected a character buffer object
How do I buffer the output and what data type is the variable vals?
vals is a list. If you want to write a list of strings to a file, as opposed to an individual string, use writelines:
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.writelines(vals)
ea.close()
Note that this will not insert newlines for you (although in your specific case your strings already end in newlines, as pointed out in the comments). If you need to add newlines you could do the following as an example:
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.writelines([line+'\n' for line in vals])
ea.close()
The write function will only handle characters or bytes. To write arbitrary objects, use python's pickle library. Write with pickle.dump(), read them back with pickle.load().
But if what you're really after is writing something in the same format as your input, you'll have to write out the matrix values and newlines yourself.
for line in vals:
ea.write(line)
ea.close()
You've now written a file that looks like abc.txt, except that the first row and first character from each line has been removed. (You dropped those when constructing vals.)
Somehow I doubt this is what you intended, since you chose to name it abc_format.txt, but anyway this is how you write out a list of lines of text.
You cannot "write" objects to files. Rather, use the pickle module:
matrix_file = open("abc.txt", "rU")
matrix = matrix_file.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
#pickling begins!
import pickle
f = open("abc_format.txt")
pickle.dump(vals, f) #call with (object, file)
f.close()
Then read it like this:
import pickle
f = open("abc_format.txt")
vals = pickle.load(f) #exactly the same list
f.close()
You can do this with any kind of object, your own or built-in. You can only write strings and bytes to files, python's open() function just opens it like opening notepad would.
To answer your first question, vals is a list, because anything in [operation(i) for i in iterated_over] is a list comprehension, and list comprehensions make lists. To see what the type of any object is, just use the type() function; e.g. type([1,4,3])
Examples: https://repl.it/qKI/3
Documentation here:
https://docs.python.org/2/library/pickle.html and https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions
First of all instead of opening and closing the file separately you can use with statement that does the job automatically.and about the Error,as it says the write method only accepts character buffer object so you need to convert your list to a string.
For example you can use join function that join the items within an iterable object with a specific delimiter and return a concatenated string.
with open("abc.txt", "rU") as f,open("abc_format.txt",'w') as out:
matrix = f.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
out.write('\n'.join(vals))
Also as a more pythonic way as the file objects are iterators you can do it in following code and get the first line with calling its next method and pass the rest to join function :
with open("abc.txt", "rU") as f,open("abc_format.txt",'w') as out:
matrix = next(f)
out.write('\n'.join(f))
I understand what these functions do, but what would be the practical use of reading/writing some as a list, and what is the use of writing a list to a file, if you could just use write() and then later use readlines() to view it as a list?
The practical upside of writing data to disk is that you can drop it from your system memory and come back to it later.
.writelines() accepts any iterable that produces strings. It could be implemented as:
def writelines(self, lines):
for line in lines:
self.write(line)
If you call file.writelines(["abc", "def"]) then the file will contain: abcdef. And file.readlines() would return ["abcdef"] if you read it back. As you see the roundtrip produces different result.
If you call file.write(["abc", "def"]) then you get TypeError: must be string or buffer, not list. Again it is different.
I have a file - which I read it into memory as a list, then split the list based on some rule, say there is list1, list2, .., listn. Now I want to get the size of each list, and this size is the file size when this list write to a file. the following is a code I have, the file name is 'temp', which size is: 744 bytes.
from os import stat
from sys import getsizeof
print(stat('temp').st_size) # we get exactly 744 here.
# Now read file into a list and use getsizeof() function:
with open('temp', 'r') as f:
chunks = f.readlines()
print(getsizeof(chunks)) # here i get 240, which is quite different than 744.
since I can't use getsizeof() to directly get the file size (on disk), so once i get the split list, i have to write this list to a tmp file:
open('tmp','w').write("".join(list1))
print(stat('tmp','w').st_size) # Here is the value I want.
os.remove('tmp')
this solution is very slow and require a lot of write/read to disk. Is there any better way to do? thanks a lot!
Instead of writing a series of bytes to a file and then looking at the file length1, you could just check the length of the string that you would have written to the file:
print(len("".join(list1)))
Here, I'm assuming that your list contains byte strings. If it doesn't, you can always encode a byte string from your unicode string:
print(len("".join(list1).encode(your_codec)))
which I think you would need for write to work properly anyway in your original solution.
1Your original code could also give flaky (wrong!) results since you never close the file. It's not guaranteed that all the contents of the string will be written to the file when you use os.stat on it due to buffering.