Open a csv file without separating the content automatically

Open a csv file without separating the content automatically - python

I have a csv which contains text like
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
when I run the code like below:
rows = csv.reader(csvfile)
for row in rows:
print(" ".join('%s' %row for row in rows))
it will project as follow:
['AAABBBBCCCDDDDDDD']
['EEEFFFRRRTTTHHHYY']
But I want to display as a series of words like below:
AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY
Is there anything wrong in the code?

Your example looks like you simply need
with open(csvfile) as inputfile: # misnomer; not really proper CSV
for row in inputfile:
print(row.rstrip('\n'), end='')

The example you provided doesn't look like a csv file. It looks like a simple text file. The you could have something as simple as :
Input.txt
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
Solution.py
input_filename = "Input.txt"
with open(input_filename) as input_file:
print("".join(x.rstrip('\n') for x in input_file))
This is taking advantage of:
A file object can be iterated on. This will give you a new line from each iteration
Every line received from the file will have newline character at its end. Since you seem to not want it we use the method .rstrip() to remove it
The .join() method can accept any iterable even a...
Generator expression which will help us create an iterable that will accepted by .join() using .rstrip() to format every line coming from the input file.
EDIT: OK let's decompose further my answer:
When you open a file you can iterate over it. In the most simple way to explain it, let's say that it means that you do a loop over it (for line in input_file: ...).
But not only that, but with an iterator you can create another iterator by transforming each element. This is what a list comprehension or, in the case I have chosen, a generator expression does. So the expression (x.rstrip() for x in input_file) will be a iterator that takes every element of input_file and applies to it .rstrip()
The string method .join() will glue together the elements provided by an iterator using that string as a separator. Since I use here an empty string there won't be a seperator. I have used the iterator defined before for this.
I then print() the string provided by the .join() operation explained before.
I did a minor correction on my answer because there is the edge case that if there are space or tab characters at the end of a line in the input file they would have been removed if I use x.rstrip() instead of x.rstrip('\n')

You could start with an empty string, and for every row read from the csv file, remove the newline at the end and add the contents to the empty string.
joined = ""
with open(csvfile) as f:
for row in f:
joined = joined + row.replace("\n","")
print(joined)
Output:
>> AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY

Related

Question about python list comprehension and file reading

Recently I was dealing with CSV files and tried to replace the NULL bytes in the CSV file to empty strings to make the CSV reader work. I referred to this answer but decided to do it in a different way.
The original solution is like this:
with open(file) as f:
reader = csv.reader(x.replace('\0','') for x in f)
print([x for x in reader])
But I decide to do it like this:
with open(file) as f:
for line in f:
line.replace('\0','')
f.seek(0)
reader = csv.reader(f)
print([x for x in reader])
And my approach seemed not to work as the original one, I wonder why is that?
Thank you for your time!

Take a look at the official doc of the replace function in python:
str.replace(old, new[, count])
Return a copy of the string with all
occurrences of substring old replaced by new. If the optional argument
count is given, only the first count occurrences are replaced.
In your implementation, you are calling replace but not capturing the returned replaced line anywhere.
You could instead, replace the whole file and store it in another variable or, if it is large, perform your operation inside the for loop itself.
However, the reference implementation you showed before looks better: It uses a generator that will yield replaced lines as you need them, you should stick with that.

How to replace a string character in place in a 2D list?

So I have to take a .csv file (which can be downloaded clicking this: http://ge.tt/7lx5Boj2/) and I want to convert it into a 2D list.
My code currently does so, but with one problem.
Each element of the nested list is being read as a big string rather than a list of elements because an apostrophe is being added at the beginning and end of each nested list.
For example, rather than:
["ID","Name","Type 1","Type 2","Generation","Legendary"]
I am getting:
['"ID","Name","Type 1","Type 2","Generation","Legendary"']
To resolve this, I tried to make a nested for loop to replace every apostrophe in the list with an empty character but my code doesn't do anything. It just prints the exact same string as if the replace operation never happened.
def read_info_file(filename):
opened_infocsv = open(filename, 'r') #opens an argued .csv file with INFO ormat.
linebylinelist = [fline.splitlines() for line in opened_infocsv] #converts entire .csv into a 2D list
opened_infocsv.close()
print(linebylinelist)
print('\n')
for i in linebylinelist:
for l in i:
l.replace("'","")
print(linebylinelist)
read_info_file('info_file5.csv')
Any ideas on fixing this? NOTE: I am not allowed to import CSV
EDIT : I tried changing .replace to .strip and it still doesn't work. I honestly have no idea how to fix this.
I believe the root of the problem has to do with the way in which I converted the CSV into a 2d list using list comprehension. Maybe it is possible to convert a CSV into a 2d list without converting the lines to strings first.

str.replace does not change current string - it returns a copy of the string with all occurrences of substring old replaced by new. You should assign the result of the function to the current list item.
for i in linebylinelist:
for kk,ss in enumerate(i):
i[kk] = ss.replace("'","")

use the csv module to read a csv file.
Also, to open a file, use a context manager. As an example, see below code.
import csv
filename = 'info_file5.csv'
with open(filename, 'r') as f:
reader = csv.reader(f)
for row in reader:
print(row)

CSV.Reader importing a list of lists

I am running the following on a csv of UIDs:
with open('C:/uid_sample.csv',newline='') as f:
reader = csv.reader(f,delimiter=' ')
uidlist = list(reader)
but the list returned is actually a list of lists:
[['27465307'], ['27459855'], ['27451353']...]
I'm using this workaround to get individual strings within one list:
for r in reader:
print(' '.join(r))
i.e.
['27465307','27459855','27451353',...]
Am I missing something where I can't do this automatically with the csv.reader or is there an issue with the formatting of my csv perhaps?

A CSV file is a file where each line, or row, contains columns that are usually delimited by commas. In your case, you told csv.reader() that your columns are delimited by a space. Since there aren't any spaces in any of the lines, each row of the csv.reader object has only one item. The problem here is that you aren't looking for a row with a single column; you are looking for a single item.
Really, you just want a list of the lines in the file. You could use f.readlines(), but that would include the newline character in each line. That actually isn't a problem if all you need to do with each line is convert it to an integer, but you might want to remove those characters. That can be done quite easily with a list comprehension:
newlist = [line.strip() for line in f]
If you are merely iterating through the lines (with afor loop, for example), you probably don't need a list. If you don't mind the newline characters, you can iterate through the file object directly:
for line in f:
uid = int(line)
print(uid)
If the newline characters need to go, you could either take them out per line:
for line in f:
line = line.strip()
...
or create a generator object:
uids = (line.strip() for line in f)
Note that reading a file is like reading a book: you can't read it again until you turn back to the first page, so remember to use f.seek(0) if you want to read the file more than once.

Replacing cell, not string

I have the following code.
import fileinput
map_dict = {'*':'999999999', '**':'999999999'}
for line in fileinput.FileInput("test.txt",inplace=1):
for old, new in map_dict.iteritems():
line = line.replace(old, new)
sys.stdout.write(line)
I have a txt file
1\tab*
*1\tab**
Then running the python code generates
1\tab999999999
9999999991\tab999999999
However, I want to replace "cell" (sorry if this is not standard terminology in python. I am using the terminology of Excel) not string.
The second cell is
*
So I want to replace it.
The third cell is
1*
This is not *. So I don't want to replace it.
My desired output is
1\tab999999999
*1\tab999999999
How should I make this? The user will tell this program which delimiter I am using. But the program should replace only the cell not string..
And also, how to have a separate output txt rather than overwriting the input?

Open a file for writing, and write to it.
Since you want to replace the exact complete values (for example not touch 1*), do not use replace. However, to analyze each value split your lines according to the tab character ('\t').
You must also remove end of line characters (as they may prevent matching last cells in a row).
Which gives
import fileinput
MAPS = (('*','999999999'),('**','999999999'))
with open('output.txt','w') as out_file:
for line in open("test.txt",'r'):
out_list = []
for inp_cell in line.rstrip('\n').split('\t'):
out_cell = inp_cell
for old, new in MAPS:
if out_cell == old:
out_cell = new
out_list.append(out_cell)
out_file.write( "\t".join(out_list) + "\n" )
There are more condensed/compact/optimized ways to do it, but I detailed each step on purpose, so that you may adapt to your needs (I was not sure this is exactly what you ask for).

the csv module can help:
#!python3
import csv
map_dict = {'*':'999999999','**':'999999999'}
with open('test.txt',newline='') as inf, open('test2.txt','w',newline='') as outf:
w = csv.writer(outf,delimiter='\t')
for line in csv.reader(inf,delimiter='\t'):
line = [map_dict[item] if item in map_dict else item for item in line]
w.writerow(line)
Notes:
with will automatically close files.
csv.reader parses and splits lines on a delimiter.
A list comprehension translates line items in the dictionary into a new line.
csv.writer writes the line back out.

Selecting and printing specific rows of text file

I have a very large (~8 gb) text file that has very long lines. I would like to pull out lines in selected ranges of this file and put them in another text file. In fact my question is very similar to this and this but I keep getting stuck when I try to select a range of lines instead of a single line.
So far this is the only approach I have gotten to work:
lines = readin.readlines()
out1.write(str(lines[5:67]))
out2.write(str(lines[89:111]))
However this gives me a list and I would like to output a file with a format identical to the input file (one line per row)

You can call join on the ranges.
lines = readin.readlines()
out1.write(''.join(lines[5:67]))
out2.write(''.join(lines[89:111]))

might i suggest not storing the entire file (since it is large) as per one of your links?
f = open('file')
n = open('newfile', 'w')
for i, text in enumerate(f):
if i > 4 and i < 68:
n.write(text)
elif i > 88 and i < 112:
n.write(text)
else:
pass
i'd also recommend using 'with' instead of opening and closing the file, but i unfortunately am not allowed to upgrade to a new enough version of python for that here : (.

The first thing you should think of when facing a problem like this, is to avoid reading the entire file into memory at once. readlines() will do that, so that specific method should be avoided.
Luckily, we have an excellent standard library in Python, itertools. itertools has lot of useful functions, and one of them is islice. islice iterates over an iterable (such as lists, generators, file-like objects etc.) and returns a generator containing the range specified:
itertools.islice(iterable, start, stop[, step])
Make an iterator that returns selected elements from the iterable. If start is non-zero,
then elements from the iterable are skipped until start is reached.
Afterward, elements are returned consecutively unless step is set
higher than one which results in items being skipped. If stop is None,
then iteration continues until the iterator is exhausted, if at all;
otherwise, it stops at the specified position. Unlike regular slicing,
islice() does not support negative values for start, stop, or step.
Can be used to extract related fields from data where the internal
structure has been flattened (for example, a multi-line report may
list a name field on every third line)
Using this information, together with the str.join method, you can e.g. extract lines 10-19 by using this simple code:
from itertools import islice
# Add the 'wb' flag if you use Windows
with open('huge_data_file.txt', 'wb') as data_file:
txt = '\n'.join(islice(data_file, 10, 20))
Note that when looping over the file object, the newline char is stripped from the lines, so you need to set \n as the joining char.

(Partial Answer) In order to make your current approach work you'll have to write line by line. For instance:
lines = readin.readlines()
for each in lines[5:67]:
out1.write(each)
for each in lines[89:111]:
out2.write(each)

path = "c:\\someplace\\"
Open 2 text files. One for reading and one for writing
f_in = open(path + "temp.txt", 'r')
f_out = open(path + output_name, 'w')
go through each line of the input file
for line in f_in:
if i_want_to_write_this_line == True:
f_out.write(line)
close the files when done
f_in.close()
f_out.close()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.