Question about python list comprehension and file reading

Question about python list comprehension and file reading - python

Recently I was dealing with CSV files and tried to replace the NULL bytes in the CSV file to empty strings to make the CSV reader work. I referred to this answer but decided to do it in a different way.
The original solution is like this:
with open(file) as f:
reader = csv.reader(x.replace('\0','') for x in f)
print([x for x in reader])
But I decide to do it like this:
with open(file) as f:
for line in f:
line.replace('\0','')
f.seek(0)
reader = csv.reader(f)
print([x for x in reader])
And my approach seemed not to work as the original one, I wonder why is that?
Thank you for your time!

Take a look at the official doc of the replace function in python:
str.replace(old, new[, count])
Return a copy of the string with all
occurrences of substring old replaced by new. If the optional argument
count is given, only the first count occurrences are replaced.
In your implementation, you are calling replace but not capturing the returned replaced line anywhere.
You could instead, replace the whole file and store it in another variable or, if it is large, perform your operation inside the for loop itself.
However, the reference implementation you showed before looks better: It uses a generator that will yield replaced lines as you need them, you should stick with that.

Related

Open a csv file without separating the content automatically

I have a csv which contains text like
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
when I run the code like below:
rows = csv.reader(csvfile)
for row in rows:
print(" ".join('%s' %row for row in rows))
it will project as follow:
['AAABBBBCCCDDDDDDD']
['EEEFFFRRRTTTHHHYY']
But I want to display as a series of words like below:
AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY
Is there anything wrong in the code?

Your example looks like you simply need
with open(csvfile) as inputfile: # misnomer; not really proper CSV
for row in inputfile:
print(row.rstrip('\n'), end='')

The example you provided doesn't look like a csv file. It looks like a simple text file. The you could have something as simple as :
Input.txt
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
Solution.py
input_filename = "Input.txt"
with open(input_filename) as input_file:
print("".join(x.rstrip('\n') for x in input_file))
This is taking advantage of:
A file object can be iterated on. This will give you a new line from each iteration
Every line received from the file will have newline character at its end. Since you seem to not want it we use the method .rstrip() to remove it
The .join() method can accept any iterable even a...
Generator expression which will help us create an iterable that will accepted by .join() using .rstrip() to format every line coming from the input file.
EDIT: OK let's decompose further my answer:
When you open a file you can iterate over it. In the most simple way to explain it, let's say that it means that you do a loop over it (for line in input_file: ...).
But not only that, but with an iterator you can create another iterator by transforming each element. This is what a list comprehension or, in the case I have chosen, a generator expression does. So the expression (x.rstrip() for x in input_file) will be a iterator that takes every element of input_file and applies to it .rstrip()
The string method .join() will glue together the elements provided by an iterator using that string as a separator. Since I use here an empty string there won't be a seperator. I have used the iterator defined before for this.
I then print() the string provided by the .join() operation explained before.
I did a minor correction on my answer because there is the edge case that if there are space or tab characters at the end of a line in the input file they would have been removed if I use x.rstrip() instead of x.rstrip('\n')

You could start with an empty string, and for every row read from the csv file, remove the newline at the end and add the contents to the empty string.
joined = ""
with open(csvfile) as f:
for row in f:
joined = joined + row.replace("\n","")
print(joined)
Output:
>> AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY

How to replace a string character in place in a 2D list?

So I have to take a .csv file (which can be downloaded clicking this: http://ge.tt/7lx5Boj2/) and I want to convert it into a 2D list.
My code currently does so, but with one problem.
Each element of the nested list is being read as a big string rather than a list of elements because an apostrophe is being added at the beginning and end of each nested list.
For example, rather than:
["ID","Name","Type 1","Type 2","Generation","Legendary"]
I am getting:
['"ID","Name","Type 1","Type 2","Generation","Legendary"']
To resolve this, I tried to make a nested for loop to replace every apostrophe in the list with an empty character but my code doesn't do anything. It just prints the exact same string as if the replace operation never happened.
def read_info_file(filename):
opened_infocsv = open(filename, 'r') #opens an argued .csv file with INFO ormat.
linebylinelist = [fline.splitlines() for line in opened_infocsv] #converts entire .csv into a 2D list
opened_infocsv.close()
print(linebylinelist)
print('\n')
for i in linebylinelist:
for l in i:
l.replace("'","")
print(linebylinelist)
read_info_file('info_file5.csv')
Any ideas on fixing this? NOTE: I am not allowed to import CSV
EDIT : I tried changing .replace to .strip and it still doesn't work. I honestly have no idea how to fix this.
I believe the root of the problem has to do with the way in which I converted the CSV into a 2d list using list comprehension. Maybe it is possible to convert a CSV into a 2d list without converting the lines to strings first.

str.replace does not change current string - it returns a copy of the string with all occurrences of substring old replaced by new. You should assign the result of the function to the current list item.
for i in linebylinelist:
for kk,ss in enumerate(i):
i[kk] = ss.replace("'","")

use the csv module to read a csv file.
Also, to open a file, use a context manager. As an example, see below code.
import csv
filename = 'info_file5.csv'
with open(filename, 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row)

Writing an object to python file

I have a following code:
matrix_file = open("abc.txt", "rU")
matrix = matrix_file.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.write(vals)
ea.close()
However I am getting the following error:
TypeError: expected a character buffer object
How do I buffer the output and what data type is the variable vals?

vals is a list. If you want to write a list of strings to a file, as opposed to an individual string, use writelines:
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.writelines(vals)
ea.close()
Note that this will not insert newlines for you (although in your specific case your strings already end in newlines, as pointed out in the comments). If you need to add newlines you could do the following as an example:
ea=open("abc_format.txt",'w')
ea.seek(0)
ea.writelines([line+'\n' for line in vals])
ea.close()

The write function will only handle characters or bytes. To write arbitrary objects, use python's pickle library. Write with pickle.dump(), read them back with pickle.load().
But if what you're really after is writing something in the same format as your input, you'll have to write out the matrix values and newlines yourself.
for line in vals:
ea.write(line)
ea.close()
You've now written a file that looks like abc.txt, except that the first row and first character from each line has been removed. (You dropped those when constructing vals.)
Somehow I doubt this is what you intended, since you chose to name it abc_format.txt, but anyway this is how you write out a list of lines of text.

You cannot "write" objects to files. Rather, use the pickle module:
matrix_file = open("abc.txt", "rU")
matrix = matrix_file.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
#pickling begins!
import pickle
f = open("abc_format.txt")
pickle.dump(vals, f) #call with (object, file)
f.close()
Then read it like this:
import pickle
f = open("abc_format.txt")
vals = pickle.load(f) #exactly the same list
f.close()
You can do this with any kind of object, your own or built-in. You can only write strings and bytes to files, python's open() function just opens it like opening notepad would.
To answer your first question, vals is a list, because anything in [operation(i) for i in iterated_over] is a list comprehension, and list comprehensions make lists. To see what the type of any object is, just use the type() function; e.g. type([1,4,3])
Examples: https://repl.it/qKI/3
Documentation here:
https://docs.python.org/2/library/pickle.html and https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions

First of all instead of opening and closing the file separately you can use with statement that does the job automatically.and about the Error,as it says the write method only accepts character buffer object so you need to convert your list to a string.
For example you can use join function that join the items within an iterable object with a specific delimiter and return a concatenated string.
with open("abc.txt", "rU") as f,open("abc_format.txt",'w') as out:
matrix = f.readlines()
keys = matrix[0]
vals = [line[1:] for line in matrix[1:]]
out.write('\n'.join(vals))
Also as a more pythonic way as the file objects are iterators you can do it in following code and get the first line with calling its next method and pass the rest to join function :
with open("abc.txt", "rU") as f,open("abc_format.txt",'w') as out:
matrix = next(f)
out.write('\n'.join(f))

How to take floats from a txt to a Python list as strings

I am very new at programming. I have the following problem.
I want to take some floats from a .txt file, and add them to a Python list as strings, with a comma between them, like this:
.TXT:
194220.00 38.4397984 S 061.1720742 W 0.035
194315.00 38.4398243 S 061.1721378 W 0.036
Python:
myList = ('38.4397984,061.1720742','38.4398243,061.1721378')
Does anybody know how to do this? Thank you!

There are three key pieces you'll need to do this. You'll need to know how to open files, you'll need to know how to iterate through the lines with the file open, and you'll need to know how to split the list.
Once you know all these things, it's as simple as concatenating the pieces you want and adding them to your list.
my_list = []
with open('path/to/my/file.txt') as f:
for line in f:
words = line.split()
my_list.append(words[1] + words[3])
print mylist

Python has a method open(fileName, mode) that returns a file object.
fileName is a string with the name of the file.
mode is another a string that states how will the file used. Ex 'r' for reading and 'w' for writing.
f = open(file.txt, 'r')
This will create file object in the variable f. f has now different methods you can use to read the data in the file. The most common is f.read(size) where size is optional
text = f.read()
Will save the data in the variable text.
Now you want to split the string. String is an object and has a method called split() that creates a list of words from a string separated by white space.
myList = text.split()
In your code you gave us a tuple, which from the variable name i am not sure it was what you were looking for. Make sure to read the difference between a tuple and a list. The procedure to find a tuple is a bit different.

Selecting and printing specific rows of text file

I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.
So far this is the only approach I have gotten to work:
lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))
However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)

You can call join on the ranges.
lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))

might i suggest not storing the entire file (since it is large) as per one of your links?
f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
if i > 4 and i < 68:
n.write(text)
elif i > 88 and i < 112:
n.write(text)
else:
pass
i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.

The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines() will do that, so that specific method should be avoided.
Luckily, we have an excellent standard library in Python, itertools. itertools has lot of useful functions, and one of them is islice. islice iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:
itertools.islice(iterable, start, stop[, step])
Make an iterator that returns selected elements from the iterable. If start is non-zero,
then elements from the iterable are skipped until start is reached.
Afterward, elements are returned consecutively unless step is set
higher than one which results in items being skipped. If stop is None,
then iteration continues until the iterator is exhausted, if at all;
otherwise, it stops at the specified position. Unlike regular slicing,
islice() does not support negative values for start, stop, or step.
Can be used to extract related fields from data where the internal
structure has been flattened (for example, a multi-line report may
list a name field on every third line)
Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:
from itertools import islice
# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file:
txt = '\n'.join(islice(data_file, 10, 20))
Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.

(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:
lines = readin.readlines()
for each in lines[5:67]:
out1.write(each)
for each in lines[89:111]:
out2.write(each)

path = "c:\\someplace\\"
Open 2 text files. One for reading and one for writing
f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')
go through each line of the input file
for line in f_in:
if i_want_to_write_this_line == True:
f_out.write(line)
close the files when done
f_in.close()
f_out.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.