Extracting variable names and data from csv file using Python - python

I have a csv file that has each line formatted with the line name followed by 11 pieces of data. Here is an example of a line.
CW1,0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64
There are 12 lines in total, each with a unique name and data.
What I would like to do is extract the first cell from each line and use that to name the corresponding data, either as a variable equal to a list containing that line's data, or maybe as a dictionary, with the first cell being the key.
I am new to working with inputting files, so the farthest I have gotten is to read the file in using the stock solution in the documentation
import csv
path = r'data.csv'
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
print(row[0])
I am failing to figure out how to assign each row to a new variable, especially when I am not sure what the variable names will be (this is because the csv file will be created by a user other than myself).
The destination for this data is a tool that I have written. It accepts lists as input such as...
CW1 = [0,-0.38,2.04,1.34,0.76,1.07,0.98,0.81,0.92,0.70,0.64]
so this would be the ideal end solution. If it is easier, and considered better to have the output of the file read be in another format, I can certainly re-write my tool to work with that data type.

As Scironic said in their answer, it is best to use a dict for this sort of thing.
However, be aware that dict objects do not have any "order" - the order of the rows will be lost if you use one. If this is a problem, you can use an OrderedDict instead (which is just what it sounds like: a dict that "remembers" the order of its contents):
import csv
from collections import OrderedDict as od
data = od() # ordered dict object remembers the order in the csv file
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile, delimiter = ' ')
for row in reader:
data[row[0]] = row[1:] # Slice the row up into 0 (first item) and 1: (remaining)
Now if you go looping through your data object, the contents will be in the same order as in the csv file:
for d in data.values():
myspecialtool(*d)

You need to use a dict for these kinds of things (dynamic variables):
import csv
path = r'data.csv'
data = {}
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
data[row[0]] = row[1:]
dicts are especially useful for dynamic variables and are the best method to store things like this. to access you just need to use:
data['CW1']
This solution also means that if you add any extra rows in with new names, you won't have to change anything.
If you are desperate to have the variable names in the global namespace and not within a dict, use exec (N.B. IF ANY OF THIS USES INPUT FROM OUTSIDE SOURCES, USING EXEC/EVAL CAN BE HIGHLY DANGEROUS (rm * level) SO MAKE SURE ALL INPUT IS CONTROLLED AND UNDERSTOOD BY YOURSELF).
with open(path,'rb') as csvFile:
reader = csv.reader(csvFile,delimiter=' ')
for row in reader:
exec("{} = {}".format(row[0], row[1:])

In python, you can use slicing: row[1:] will contain the row, except the first element, so you could do:
>>> d={}
>>> with open("f") as f:
... c = csv.reader(f, delimiter=',')
... for r in c:
... d[r[0]]=map(int,r[1:])
...
>>> d
{'var1': [1, 3, 1], 'var2': [3, 0, -1]}
Regarding variable variables, check How do I do variable variables in Python? or How to get a variable name as a string in Python?. I would stick to dictionary though.

An alternative to using the proper csv library could be as follows:
path = r'data.csv'
csvRows = open(path, "r").readlines()
dataRows = [[float(col) for col in row.rstrip("\n").split(",")[1:]] for row in csvRows]
for dataRow in dataRows: # Where dataRow is a list of numbers
print dataRow
You could then call your function where the print statement is.
This reads the whole file in and produces a list of lines with trailing newlines. It then removes each newline and splits each row into a list of strings. It skips the initial column and calls float() for each entry. Resulting in a list of lists. It depends how important the first column is?

Related

Creating a function to concatenate strings based on len(array)

I am trying to concatenate a string to send a message via python>telegram
My plan is so that the function is modular.
It first import lines from a .txt file and based on that many lines it creates two different arrays
array1[] and array2[], array1 will receive the values of the list as strings and array2 will receive user generated information to complemente what is stored in the same position as to a way to identify the differences in the array1[pos], as to put in a way:
while (k<len(list)):
array2[k]= str(input(array1[k]+": "))
k+=1
I wanted to create a single string to send in a single message like however in a way that all my list goes inside the same string
string1 = array1[pos]+": "+array2[pos]+"\n"
I have tried using while to compared the len but I kept recalling and rewriting my own string again and again.
It looks like what you're looking for is to have one list that comes directly from your text file. There's lots of ways to do that, but you most likely won't want to create a list iteratively with the index position. I would say to just append items to your list.
The accepted answer on this post has a good reference, which is basically the following:
import csv
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
# do something
Which, in your case would mean something like this:
import csv
actual_text_list = []
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
actual_text_list.append(row)
user_input_list = []
for actual_text in actual_text_list:
the_users_input = input(f'What is your response to {actual_text}? ')
user_input_list.append(the_users_input)
This creates two lists, one with the actual text, and the other with the other's input. Which I think is what you're trying to do.
Another way, if the list in your text file will not have duplicates, you could consider using a dict, which is just a dictionary, a key-value data store. You would make the key the actual_text from the file, and the value the user_input. Another technique, you could make a list of lists.
import csv
actual_text_list = []
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
actual_text_list.append(row)
dictionary = dict()
for actual_text in actual_text_list:
the_users_input = input(f'What is your response to {actual_text}? ')
dictionary[actual_text] = the_users_input
Then you could use that data like this:
for actual_text, user_input in dictionary.items():
print(f'In response to {actual_text}, you specified {user_input}.')
list_of_strings_from_txt = ["A","B","C"]
modified_list = [f"{w}: {input(f'{w}:')}" for w in list_of_strings_from_txt]
I guess? maybe?

Check for key in csv file if key is matched then add data into different rows of matched column using python

I have csv file like below
I need to search for a key then some values should be added in that key column. for example I need to search for folder and some values should be added in folder column. in the same way I need to search for name and some values should be added in name column.
so the final output looks like below
I have followed the below way but it doesn't work for me
import csv
list1 = [['ab', 'cd', 'ed']]
with open('1.csv', 'a') as f_csv:
data_to_write_list1 = zip(*list1)
writer = csv.writer(f_csv, delimiter=',', dialect='excel')
writer.writerows(data_to_write_list1)
If you want to only use built-in methods, you can get the first row of a file (in the case of a CSV file like yours, the headers) like this:
>>> with open('file_you_need.csv', 'r') as f:
>>> file = f.readline()
In your case the variable file would then be (supposing the delimiter is ","):
folder,name,service
You can now do file.split(",") (eventually replacing "," with whatever your delimiter is) and you'll get back a list of headers. You can then create a list of lists where each list is a row of your file and write back to the file or use a dictionary to link new entries to each header. Depending on your choice you would then in different ways write back to the file, i.e. supposing you go with list of lists:
with open('file_you_need.csv','w') as f:
for list in listoflists:
row = ""
for el, i in enumerate(list):
if i != len(list):
row += el+","
else:
row += el
f.write(row)
As others have mentioned you could also use Pandas and DataFrames to make it cleaner, but I don't think this is too hard to grasp

Unpacking values from a file read and assigning to a list

In the following short program:
data = []
f = open('C:/tsg3.txt', 'r').read().split("\t")
for i in range(0, len(f)-1):
[GeneID, Sym, Alias, Xref, Chromo, Cyto, Full_name, Gene_type, Desc, Nuc_seq, Pro_seq = f[i]
I am seeing the appearance of a ValueError (need more than 4 values to unpack).
Obviously, I am doing something wrong since I am relatively new to Python.
Any help would be appreciated. I'm using Python 3.3.2.
Thanks.
You split the whole file by tabs, resulting in a single list of strings.
You then loop over that list, assigning f[i] (individual strings) to a long list of variables. From your error message, you are trying to assign a 4-character string to those variables, leading to individual characters being assigned, which fails because the number of characters doesn't match the number of variables.
Most likely, you want to process a tab-delimited file. Use the csv module for such tasks:
import csv
with open('C:/tsg3.txt', 'rb') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
# `row` is a list of columns.
Because the file has headers, you can also use a csv.DictReader and use dictionaries instead (keyed with the headers):
with open('C:/tsg3.txt', 'rb') as f:
reader = csv.DictReader(f, delimiter='\t')
for row in reader:
# `row` is a dictionary of columns.
Not all rows have all values; some appear to be missing Nucleotide_Sequence and Protein_Sequence columns.
For future reference, you can loop directly over a python list, there is no need to use indices with a range():
for i in f:
# do something with the individual elements of `f`, assigned to `i` each iteration.

Trouble with Python order of operations/loop

I have some code that is meant to convert CSV files into tab delimited files. My problem is that I cannot figure out how to write the correct values in the correct order. Here is my code:
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
Now, since both my write statements are in the for row in data loop, my headers are being written multiple times over. If I outdent the first write statement, I'll have an obvious formatting error. If I move the second write statement above the first and then outdent, my data will be out of order. What can I do to make sure that the first write statement gets written once as a header, and the second gets written for each line in the CSV file? How do I extract the first 'write' statement outside of the loop without breaking the dictionary? Thanks!
The csv module contains methods for writing as well as reading, making this pretty trivial:
import csv
with open("test.csv") as file, open("test_tab.csv", "w") as out:
reader = csv.reader(file)
writer = csv.writer(out, dialect=csv.excel_tab)
for row in reader:
writer.writerow(row)
No need to do it all yourself. Note my use of the with statement, which should always be used when working with files in Python.
Edit: Naturally, if you want to select specific values, you can do that easily enough. You appear to be making your own dictionary to select the values - again, the csv module provides DictReader to do that for you:
import csv
with open("test.csv") as file, open("test_tab.csv", "w") as out:
reader = csv.DictReader(file)
writer = csv.writer(out, dialect=csv.excel_tab)
for row in reader:
writer.writerow([row["name"], row["order_num"], ...])
As kirelagin points out in the commends, csv.writerows() could also be used, here with a generator expression:
writer.writerows([row["name"], row["order_num"], ...] for row in reader)
Extract the code that writes the headers outside the main loop, in such a way that it only gets written exactly once at the beginning.
Also, consider using the CSV module for writing CSV files (not just for reading), don't reinvent the wheel!
Ok, so I figured it out, but it's not the most elegant solutions. Basically, I just ran the first loop, wrote to the file, then ran it a second time and appended the results. See my code below. I would love any input on a better way to accomplish what I've done here. Thanks!
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write(item['name']+'\t'+item['order_num']...)
tab_file.close()
for file in import_dir:
data = csv.reader(open(file))
fields = data.next()
new_file = export_dir+os.path.basename(file)
tab_file = open(export_dir+os.path.basename(file), 'a+')
for row in data:
items = zip(fields, row)
item = {}
for (name, value) in items:
item[name] = value.strip()
tab_file.write('\n'+item['amt_due']+'\t'+item['due_date']...)
tab_file.close()

Python csvreader separate lines

I am using the csv module for Python. I have had a good look at the CSV File Reading and Writing guide. I want to write a loop that runs through each row in the CSV file and assigns each row do a different variable. Does anyone have any ideas on this?
I am aware that there is a .next() and .line_num, I didn't think that these would be suitable in this case although I might be wrong.
Currently I have the following code, which print out the whole CSV file:
print_csv = csv.reader(open(csv_name, 'rb'), delimiter=' ', quotechar='|')
for row in print_csv:
print ', '.join(row)
[EDIT]
I am now aware, from this question thread, that the best way to do this will depend on what the first line is going to be used for.
What I want to do with the first line of the CSV file is to check whether it is in the correct format. This would involve:
checking to see whether it has the expected number of columns
checking to see whether the column headers have the correct name
checking to see whether to columns are in the correct order.
1.- Fast Answer
Instead of setting different independent variables you could do:
mydict = {}
for idx, item in enumerate(reader):
mydict['var%i' %idx] = item
then you call your var like:
mydict['var0']
Or still shorter in py3k:
mydict = {'var%i' %idx : item for idx, item in enumerate(reader)}
But this doesnt have much sense applied this way
As a commenter said this is not different than doing directly:
mylist = list(reader)
and then
mylist[0] # instead of 'var0'
and this option is much better.
The dictionary strategy is best suited when you extract the dictionary key from the very same reader line. For example, if it were at pos pos 0,:
mydict = {item[0] : item for item in reader}
2.- The Proper Answer
But if what you want is simply to check the format of the first line (maybe to calculate the space you need for printing), the method could be:
line = reader.next()
like_this = check_how_is_my(line)
if like_this == 'something_long':
spaces = 23
else:
spaces = 0
while True:
try:
print_with_spaces(line, spaces)
reader.next()
except StopIteration:
break
Well, you can obviously do:
var1 = reader.next()
var2 = reader.next()
var3 = reader.next()
var4 = reader.next()
var5 = reader.next()
or any variation thereof. This is not my favorite coding style, but it works.

Categories

Resources