Read Tuple from csv in Python - python

I am trying to read a row in a csv, which I have previously written.
That written row looks like this when is read: ['New York', '(30,40)']
and like this: ['New York', '(30,40)'] (converts the tuple in a string).
I need to read each item from the tuple to operate with the ints, but I can't if it is read like a string because if I do something like this: tuple[0], what I get is: '(' -the first character of the string tuple-
Maybe this is a question about how I write and read the rows, which actually is this way:
def writeCSV(data,name):
fileName = name+'.csv'
with open(fileName, 'a') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(data)
def readCSV(filename):
allRows = []
with open(filename, 'rb') as f:
reader = csv.reader(f, delimiter=' ')
for row in reader:
allRows.append(row)
return allRows
What I want is to read that tuple for row, not like a string, but like a tuple to operate with each item after.
Is it possible?

You need to use ast.literal_eval() on you tuple string as:
>>> my_file_line = ['New York', '(30,40)']
>>> import ast
>>> my_tuple = ast.literal_eval(my_file_line[1])
>>> my_tuple[0]
30
Because currently the list you got after reading the file at index 1 is holding the valid string of the tuple format. ast.literal_eval will convert your tuple string to the tuple object and then you can access the tuple based on the index.

since you're producing the file yourself, why not making it right from the start:
import csv
data = ['New York', (30,40)]
with open("out.csv","w",newline="") as f:
cw=csv.writer(f)
cw.writerow(data) # wrong
cw.writerow(data[:1]+list(data[1])) # okay
the first line writes each item of data but tries to convert each item as str. tuple gets converted, where selecting another format could avoid this.
New York,"(30, 40)"
which explains the need to evaluate it afterwards.
The second write line writes 3 elements (now 3 columns), but preserves the data (except for the int part which is converted to str anyway)
New York,30,40
in that case, a simple string to integer conversion on rows does the job.
Note that csv isn't the best way to serialize mixed data. Better use json for that.

Related

How to split tsv file into smaller tsv file based on row values

I have a tsv file in.txt which I would like to split into a smaller tsv file called out.txt.
I would like to import only the rows of in.txt which contain a string value My String Value in column 6 into out.txt.
import csv
# r is textmode
# rb is binary mode
# binary mode is faster
with open('in.txt','rb') as tsvIn, open('out.txt', 'w') as tsvOut:
tsvIn = csv.reader(tsvIn, delimiter='\t')
tsvOut = csv.writer(tsvOut)
for row in tsvIn:
if "My String Value" in row:
tsvOut.writerows(row)
My output looks like this.
D,r,a,m,a
1,9,6,1,-,0,4,-,1,3
H,y,u,n, ,M,o,k, ,Y,o,o
B,e,o,m,-,s,e,o,n, ,L,e,e
M,u,-,r,y,o,n,g, ,C,h,o,i,",", ,J,i,n, ,K,y,u, ,K,i,m,",", ,J,e,o,n,g,-,s,u,k, ,M,o,o,n,",", ,A,e,-,j,a, ,S,e,o
A, ,p,u,b,l,i,c, ,a,c,c,o,u,n,t,a,n,t,',s, ,s,a,l,a,r,y, ,i,s, ,f,a,r, ,t,o,o, ,s,m,a,l,l, ,f,o,r, ,h,i,m, ,t,o, ,e,v,e,n, ,g,e,t, ,a, ,c,a,v,i,t,y, ,f,i,x,e,d,",", ,l,e,t, ,a,l,o,n,e, ,s,u,p,p,o,r,t, ,h,i,s, ,f,a,m,i,l,y,., ,H,o,w,e,v,e,r,",", ,h,e, ,m,u,s,t, ,s,o,m,e,h,o,w, ,p,r,o,v,i,d,e, ,f,o,r, ,h,i,s, ,s,e,n,i,l,e,",", ,s,h,e,l,l,-,s,h,o,c,k,e,d, ,m,o,t,h,e,r,",", ,h,i,s, ,.,.,.
K,o,r,e,a,n,",", ,E,n,g,l,i,s,h
S,o,u,t,h, ,K,o,r,e,a
It should look like this with tab separated values
Drama Hyn Mok Yoo A public accountant's salary is far to small for him...etc
There are a few things wrong with your code. Let's look at this line by line..
import csv
Import module csv. Ok.
with open('in.txt','rb') as tsvIn, open('out.txt', 'w') as tsvOut:
With auto-closed binary file read handle tsvIn from in.txt, and text write handle tsvOut from out.txt, do... (Note: you probably want to use mode wb instead of mode w; see this post)
tsvIn = csv.reader(tsvIn, delimiter='\t')
Let tsvIn be the result of the call of function reader in module csv with arguments tsvIn and delimiter='\t'. Ok.
tsvOut = csv.writer(tsvOut)
Let tsvOut be the result of the call of function writer in module csv with argument tsvOut. You proably want to add another argument, delimiter='\t', too.
for row in tsvIn:
For each element in tsvIn as row, do...
if "My String Value" in row:
If string "My String Value" is present in row. You mentioned that you wanted to show only those rows whose sixth element was equal to the string, thus you should use something like this instead...
if len(row) >= 6 and row[5] == "My String Value":
This means: If the length of row is at least 6, and the sixth element of row is equal to "My String Value", do...
tsvOut.writerows(row)
Call method writerows of object tsvOut with argument row. Remember that in Python, a string is just a sequence of characters, and a character is a single-element string. Thus, a character is a sequence. Then, we have that row is, according to the docs, a list of strings, each representing a column of the row. Thus, a row is a list of strings. Then, we have the writerows method, that expects a list of rows, that is, a list of lists of strings, that is, a list of lists of sequences of characters. It happens that you can interpret each of row's elements as a row, when it's actually a string, and each element of that string as a string (as characters are strings!). All of this means is that you'll get a messy, character-by-character output. You should try this instead...
tsvOut.writerow(row)
Method writerow expects a single row as an argument, not a list of rows, thus this will yield the expected result.
try this:
import csv
# r is textmode
# rb is binary mode
# binary mode is faster
with open('in.txt','r') as tsvIn, open('out.txt', 'w') as tsvOut:
reader = csv.reader(tsvIn, delimiter='\t')
writer = csv.writer(tsvOutm, delimiter='\t')
[writer.writerow(row) for row in reader if "My String Value" in row]

Use python to parse values from ping output into csv

I wrote a code using RE to look for "time=" and save the following value in a string. Then I use the csv.writer attribute writerow, but each number is interpreted as a column, and this gives me trouble later. Unfortunately there is no 'writecolumn' attribute. Should I save the values as an array instead of a string and write every row separately?
import re
import csv
inputfile = open("ping.txt")
teststring = inputfile.read()
values = re.findall(r'time=(\d+.\d+)', teststring)
with open('parsed_ping.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(values)
EDIT: I understood that "values" is already a list. I tried to iterate it and write a row for each item with
for item in values:
writer.writerow(item)
Now i get a space after each character, like
4 6 . 6
4 7 . 7
EDIT2: The spaces are the delimiters. If i change the delimiter to comma, i get commas between digits. I just don't get why he's interpreting each digit as a separate column.
If your csv file only contains one column, it's not really a "comma-separated file" anymore, is it?
Just write the list to the file directly:
import re
inputfile = open("ping.txt")
teststring = inputfile.read()
values = re.findall(r'time=(\d+\.\d+)', teststring)
with open('parsed_ping.csv', 'w') as csvfile:
csvfile.write("\n".join(values)
I solved this. I just needed to use square brackets in the writer.
for item in values:
writer.writerow([item])
This gives me the correct output.

Parsing CSV in Python 101

I'm trying to understand/visualise the process of parsing a raw csv data file in Python from dataquest.io's training course.
I understand that rows = data.split('\n') splits the long string of csv file into rows based on where the line break is. ie:
day1, sunny, \n day2, rain \n
becomes
day1, sunny
day2, rain
I thought the for loop would further break the data into something like:
day 1
sunny
day 2
rain
Instead the course seems to imply it would actually become a list of lists usefully. I don't understand, why does that happen?
weather_data = []
f = open("la_weather.csv", 'r')
data = f.read()
rows = data.split('\n')
for row in rows:
split_row = row.split(",")
weather_data.append(split_row)
I'm ignoring the CSV stuff and concentrating just on your list misunderstanding. When you split the row of text, it becomes a list of strings. That is, rows becomes: ["day1, sunny","day2, rain"].
The for statement, applied to a list, iterates through the elements of that list. So, on the first time through row will be "day1, sunny", the second time through it will be "day2, rain", etc.
Inside each iteration of the for loop, it creates a new list, by splitting row at the commas into, eg, ["day1"," sunny"]. All of these lists are added to the weather_data list you created at the start. You end up with a list of lists, ie [['day1', ' sunny'], ['day2', ' rain']]. If you wanted ['day1', ' sunny', 'day2', ' rain'], you could do:
for row in rows:
split_row = row.split(",")
for ele in split_row:
weather_data.append(ele)
That code does make it a list of lists.
As you say, the first split converts the data into a list, one element per line.
Then, for each line, the second split converts it into another list, one element per column.
And then the second list is appended, as a single item, to the weather_data list - which is now, as the instructions say, a list of lists.
Note that this code isn't very good - quite apart from the fact that you would always use the csv module, as others have pointed out, you would never do f.read() and then split the result. You would just do for line in f which automatically iterates over each row.
As a more pythonic and flexible way for dealing with csv files you can use csv module, instead of reading it as a raw text:
import csv
with open("la_weather.csv", 'rb') as f:
spamreader = csv.reader(f,delimiter=',')
for row in spamreader:
#do stuff
Here spamreader is a reader object and you can get the rows as tuple with looping over it.
And if you want to get all of rows within a list you can just convert the spamreader to list :
with open("la_weather.csv", 'rb') as f:
spamreader = csv.reader(f,delimiter=',')
print list(spamreader)

Extracting variable names and data from csv file using Python

I have a csv file that has each line formatted with the line name followed by 11 pieces of data. Here is an example of a line.
CW1,0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64
There are 12 lines in total, each with a unique name and data.
What I would like to do is extract the first cell from each line and use that to name the corresponding data, either as a variable equal to a list containing that line's data, or maybe as a dictionary, with the first cell being the key.
I am new to working with inputting files, so the farthest I have gotten is to read the file in using the stock solution in the documentation
import csv
path = r'data.csv'
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
print(row[0])
I am failing to figure out how to assign each row to a new variable, especially when I am not sure what the variable names will be (this is because the csv file will be created by a user other than myself).
The destination for this data is a tool that I have written. It accepts lists as input such as...
CW1 = [0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64]
so this would be the ideal end solution. If it is easier, and considered better to have the output of the file read be in another format, I can certainly re-write my tool to work with that data type.
As Scironic said in their answer, it is best to use a dict for this sort of thing.
However, be aware that dict objects do not have any "order" - the order of the rows will be lost if you use one. If this is a problem, you can use an OrderedDict instead (which is just what it sounds like: a dict that "remembers" the order of its contents):
import csv
from collections import OrderedDict as od
data = od() # ordered dict object remembers the order in the csv file
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile, delimiter = ' ')
for row in reader:
data[row[0]] = row[1:] # Slice the row up into 0 (first item) and 1: (remaining)
Now if you go looping through your data object, the contents will be in the same order as in the csv file:
for d in data.values():
myspecialtool(*d)
You need to use a dict for these kinds of things (dynamic variables):
import csv
path = r'data.csv'
data = {}
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
data[row[0]] = row[1:]
dicts are especially useful for dynamic variables and are the best method to store things like this. to access you just need to use:
data['CW1']
This solution also means that if you add any extra rows in with new names, you won't have to change anything.
If you are desperate to have the variable names in the global namespace and not within a dict, use exec (N.B. IF ANY OF THIS USES INPUT FROM OUTSIDE SOURCES, USING EXEC/EVAL CAN BE HIGHLY DANGEROUS (rm * level) SO MAKE SURE ALL INPUT IS CONTROLLED AND UNDERSTOOD BY YOURSELF).
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
exec("{} = {}".format(row[0], row[1:])
In python, you can use slicing: row[1:] will contain the row, except the first element, so you could do:
>>> d={}
>>> with open("f") as f:
... c = csv.reader(f, delimiter=',')
... for r in c:
... d[r[0]]=map(int,r[1:])
...
>>> d
{'var1': [1, 3, 1], 'var2': [3, 0, -1]}
Regarding variable variables, check How do I do variable variables in Python? or How to get a variable name as a string in Python?. I would stick to dictionary though.
An alternative to using the proper csv library could be as follows:
path = r'data.csv'
csvRows = open(path, "r").readlines()
dataRows = [[float(col) for col in row.rstrip("\n").split(",")[1:]] for row in csvRows]
for dataRow in dataRows: # Where dataRow is a list of numbers
print dataRow
You could then call your function where the print statement is.
This reads the whole file in and produces a list of lines with trailing newlines. It then removes each newline and splits each row into a list of strings. It skips the initial column and calls float() for each entry. Resulting in a list of lists. It depends how important the first column is?

Entire string in For loop, not just character by character

Using CSV writer, I am trying to write a list of strings to a file.
Each string should occupy a separate row.
sectionlist = ["cat", "dog", "frog"]
When I implement the following code:
with open('pdftable.csv', 'wt') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for i in sectionlist:
writer.writerow(i)
I create
c,a,t
d,o,g
f,r,o,g
when I want
cat
dog
frog
Why does the for loop parse each character separately and how can I pass the entire string into csv.writer together so each can be written?
It doesn't look like you even need to use csv writer.
l = ["cat", "dog", "frog"] # Don't name your variable list!
with open('pdftable.csv', 'w') as csvfile:
for word in l:
csvfile.write(word + '\n')
Or as #GP89 suggested
with open('pdftable.csv', 'w') as csvfile:
csvfile.writelines(l)
I think that what you need is:
with open('pdftable.csv', 'wt') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
for i in sectionlist:
writer.writerow([i]) # note the square brackets here
writerow treats its argument as an iterable, so if you pass a string, it will see it as if each character is one element in the row; however, you want the whole string to be an item, so you must enclose it in a list or a tuple.
PD: That said, if your particular case is not any more complex than what you are posting, you may not need csv.writer at all, as suggested by other answers.
The problem is i represents a string (word), not a list (row). Strings are iterable sequences (of characters) as well in Python so the CSV function accepts the object it without error, even though the results are "strange".
Fix sectionlist such that it is a list of lists of strings (rows) so i will be a list of strings, wrap each word in a list when used as a writerow parameter, or simply don't use writerow which expects a list of strings.
Trivially, the following structure would be saved correctly:
sectionlist = [
["cat", "meow"],
["dog"],
["frog", "hop", "pond"]
]

Categories

Resources