CSV.Reader importing a list of lists - python

I am running the following on a csv of UIDs:
with open('C:/uid_sample.csv',newline='') as f:
reader = csv.reader(f,delimiter=' ')
uidlist = list(reader)
but the list returned is actually a list of lists:
[['27465307'], ['27459855'], ['27451353']...]
I'm using this workaround to get individual strings within one list:
for r in reader:
print(' '.join(r))
i.e.
['27465307','27459855','27451353',...]
Am I missing something where I can't do this automatically with the csv.reader or is there an issue with the formatting of my csv perhaps?

A CSV file is a file where each line, or row, contains columns that are usually delimited by commas. In your case, you told csv.reader() that your columns are delimited by a space. Since there aren't any spaces in any of the lines, each row of the csv.reader object has only one item. The problem here is that you aren't looking for a row with a single column; you are looking for a single item.
Really, you just want a list of the lines in the file. You could use f.readlines(), but that would include the newline character in each line. That actually isn't a problem if all you need to do with each line is convert it to an integer, but you might want to remove those characters. That can be done quite easily with a list comprehension:
newlist = [line.strip() for line in f]
If you are merely iterating through the lines (with afor loop, for example), you probably don't need a list. If you don't mind the newline characters, you can iterate through the file object directly:
for line in f:
uid = int(line)
print(uid)
If the newline characters need to go, you could either take them out per line:
for line in f:
line = line.strip()
...
or create a generator object:
uids = (line.strip() for line in f)
Note that reading a file is like reading a book: you can't read it again until you turn back to the first page, so remember to use f.seek(0) if you want to read the file more than once.

Related

Open a csv file without separating the content automatically

I have a csv which contains text like
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
when I run the code like below:
rows = csv.reader(csvfile)
for row in rows:
print(" ".join('%s' %row for row in rows))
it will project as follow:
['AAABBBBCCCDDDDDDD']
['EEEFFFRRRTTTHHHYY']
But I want to display as a series of words like below:
AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY
Is there anything wrong in the code?
Your example looks like you simply need
with open(csvfile) as inputfile: # misnomer; not really proper CSV
for row in inputfile:
print(row.rstrip('\n'), end='')
The example you provided doesn't look like a csv file. It looks like a simple text file. The you could have something as simple as :
Input.txt
AAABBBBCCCDDDDDDD
EEEFFFRRRTTTHHHYY
Solution.py
input_filename = "Input.txt"
with open(input_filename) as input_file:
print("".join(x.rstrip('\n') for x in input_file))
This is taking advantage of:
A file object can be iterated on. This will give you a new line from each iteration
Every line received from the file will have newline character at its end. Since you seem to not want it we use the method .rstrip() to remove it
The .join() method can accept any iterable even a...
Generator expression which will help us create an iterable that will accepted by .join() using .rstrip() to format every line coming from the input file.
EDIT: OK let's decompose further my answer:
When you open a file you can iterate over it. In the most simple way to explain it, let's say that it means that you do a loop over it (for line in input_file: ...).
But not only that, but with an iterator you can create another iterator by transforming each element. This is what a list comprehension or, in the case I have chosen, a generator expression does. So the expression (x.rstrip() for x in input_file) will be a iterator that takes every element of input_file and applies to it .rstrip()
The string method .join() will glue together the elements provided by an iterator using that string as a separator. Since I use here an empty string there won't be a seperator. I have used the iterator defined before for this.
I then print() the string provided by the .join() operation explained before.
I did a minor correction on my answer because there is the edge case that if there are space or tab characters at the end of a line in the input file they would have been removed if I use x.rstrip() instead of x.rstrip('\n')
You could start with an empty string, and for every row read from the csv file, remove the newline at the end and add the contents to the empty string.
joined = ""
with open(csvfile) as f:
for row in f:
joined = joined + row.replace("\n","")
print(joined)
Output:
>> AAABBBBCCCDDDDDDDEEEFFFRRRTTTHHHYY

converting a string to a list from a text file

with open('task3randomtext.txt', 'r') as f:
text = f.readlines()
this is the code I use to take text from a text file and read it into my program but it reads it in as a list an will not let me split it to convert it to a array. Is there any other way or splitting it or reading it in.
If you look at the Python documentation (which you should!) you can see that readlines() returns a list containing every line in the file. To then split each line, you could do something along the lines of:
with open('times.txt') as f:
for line in f.readlines():
print line.split()

Converting tsv to tsv in python

I have a tsv-file (tab-seperated) and would like to filter out a lot of data using python before I import it into a postgresql database.
My problem is that I can't find a way to keep the format of the original file which is mandatory because otherwise the import processes won't work.
The web suggested that I should use the csv library, but no matter what delimter I use I always end up with files in a different format than the origin, e. g. files, that contain a comma after every character or files, that contain a tab after every character or files that have all data in one row.
Here is my code:
import csv
import glob
# create a list of all tsv-files in one directory
liste = glob.glob("/some_directory/*.tsv")
# go thru all the files
for item in liste:
#open the tsv-file for reading and a file for writing
with open(item, 'r') as tsvin, open('/some_directory/new.tsv', 'w') as csvout:
tsvin = csv.reader(tsvin, delimiter='\t')
# I am not sure if I have to enter a delimter here for the outfile. If I enter "delimter='\t'" like for the In-File, the outfile ends up with a tab after every character
writer = csv.writer(csvout)
# go thru all lines of the input tsv
for row in tsvin:
# do some filtering
if 'some_substring1' in row[4] or 'some_substring2' in row[4]:
#do some more filtering
if 'some_substring1' in str(row[9]) or 'some_substring1' in str(row[9]):
# now I get lost...
writer.writerow(row)
Do you have any idea what I am doing wrong? The final file has to have a tab between every field and some kind of line break at the end.
Somehow you are passing a string to w.writerow(), not a list as expected.
Remember that strings are iterable; each iteration returns a single character from the string. writerow() simply iterates over its argument writing each item separated by the delimiter character (by default a comma). So if you pass a string to writerow() it will write each character from the string separated by the delimiter.
How is it that row is a string? It could be that the delimiter for the input file is incorrect - perhaps the file does not use tabs but has fixed field widths using runs of spaces as the delimiter.
You can check whether the reader is correctly parsing your file by printing out the value of row:
for row in tsvin:
print(row)
...
If the file is being correctly parsed, expect to see that row is a list, and that each element of the list corresponds to a column/field from the file.
If it is not parsing correctly then you might see that row is a string, or that it's a list but the fields are empty and/or out of place.
It would be helpful if you added a sample of your input file to the question.

List the first words per line from a text file in Python

I need to select the first word on each line and make a list from them from a text file:
I would copy the text but it's the formatting is quite screwed up. will try
All the other text is unnecessary.
I have tried
string=[]
for line in f:
String.append(line.split(None, 1)[0]) # add only first word
from another solution, but it keeps returning a "Index out of bounds" error.
I can get the first word from the first line using string=text.partition(' ')[0]
but I do not know how to repeat this for the other lines.
I am still new to python and to the site, I hope my formatting is bearable! (when opened, I encode the text to accept symbols, like so
wikitxt=open('racinesPrefixesSuffixes.txt', 'r', encoding='utf-8')
could this be the issue?)
The reason it's raising an IndexError is because the specific line is empty.
You can do this:
words = []
for line in f:
if line.strip():
words.append(line.split(maxsplit=1)[0])
Here line.strip() is checking if the line consists of only whitespace. If it does only consist of whitespace, it will simply skip the line.
Or, if you like list comprehension:
words = [line.split(maxsplit=1)[0] for line in f if line.strip()]

Keep the new line symbols in the string when writing in a text file Python

I have a list of strings, and some of the strings contain '\n's. I want to write this list of strings into a text file and then later on read it back and store it to a list by using readlines(). I have to keep the original text; meaning not removing the new lines from the text.
If I don't remove all these new lines then of course readlines() will return a larger number of strings than the original list.
How can I achieve this? Or there's really no way and I should write in other formats instead. Thanks.
The following:
from __future__ import print_function
strings = ["asd", "sdf\n", "dfg"]
with open("output.txt", "w") as out_file:
for string in strings:
print(repr(string), file=out_file)
with open("output.txt") as in_file:
for line in in_file:
print(line.strip())
prints
'asd'
'sdf\n'
'dfg'
To print it normally (without the quotes), you can use ast.literal_eval: print(ast.literal_eval(line.strip()))

Categories

Resources