.split() returning empty results - python

I am trying to split a list that I have converted with str(), but I don't seem to be returning any results?
My code is as follows:
import csv
def csv_read(file_obj):
reader=csv.DictReader(file_obj,delimiter=',')
for line in reader:
unique_id.append(line["LUSERFLD4"])
total_amt.append(line["LAMOUNT1"])
luserfld10.append(line["LUSERFLD10"])
break
bal_fwd, junk, sdc, junk2, est_read=str(luserfld10).split(' ')
if __name__=="__main__":
with open("UT_0004A493.csv") as f_obj:
csv_read(f_obj)
print (luserfld10)
print (bal_fwd)
print (sdc)
print (est_read)
print (luserfld10) returns ['N | N | Y'] which is correct. (Due to system limitations when creating the csv file, this field holds three separate values)
All variables have been defined and I'm not getting any errors, but my last three print commands are returning empty lists?
I've tried dedenting the .split() line, but then I can unpack only one value.
How do I get them to each return N or Y?
Why isn't it working as it is?
I'm sure it's obvious, but this is my first week of coding and I haven't been able to find the answer anywhere here. Any help (with explanations please) would be appreciated :)
Edit: all defined variables are as follows:
luserfld10=[]
bal_fwd=[]
sdc=[]
est_read=[]
etc.
File contents I'm not certain how to show? I hope this is okay?
LACCNBR,LAMOUNT1,LUSERFLD4,LUSERFLD5,LUSERFLD6,LUSERFLD8,LUSERFLD9,LUSERFLD10
1290,-12847.28,VAAA0022179,84889.363,Off Peak - nil,5524.11,,N | N | N
2540255724,12847.28,VAAA0022179,84889.363,Off Peak - nil,5524.11,,N | N | N

If the luserfld10 is ['N | N | Y']
then,
luserfld10[0].replace('|', '').split()
Result:
['N', 'N', 'Y']

Even if you fix the .split stuff in
bal_fwd, junk, sdc, junk2, est_read=str(luserfld10).split(' ')
it won't do what you want because it's assigning the results of the split to local names bal_fwd, sdc, etc, that only exist inside the csv_read function, not to the names you defined outside the function in the global scope.
You could use global statements to tell Python to assign those values to the global names, but it's generally best to avoid using the global statement unless you really need it. Also, merely using a global statement won't put the string data into your bal_fwd list. Instead, it will bind the global name to your string data and discard the list. If you want to put the string into the list you need to .append it, like you did with unique_id. You don't need global for that, since you aren't performing an assignment, you're just modifying the existing list object.
Here's a repaired version of your code, tested with the data sample you posted.
import csv
unique_id = []
total_amt = []
luserfld10 = []
bal_fwd = []
sdc = []
est_read = []
def csv_read(file_obj):
for line in csv.DictReader(file_obj, delimiter=','):
unique_id.append(line["LUSERFLD4"])
total_amt.append(line["LAMOUNT1"])
fld10 = line["LUSERFLD10"]
luserfld10.append(fld10)
t = fld10.split(' | ')
bal_fwd.append(t[0])
sdc.append(t[1])
est_read.append(t[2])
if __name__=="__main__":
with open("UT_0004A493.csv") as f_obj:
csv_read(f_obj)
print('id', unique_id)
print('amt', total_amt)
print('fld10', luserfld10)
print('bal', bal_fwd)
print('sdc', sdc)
print('est_read', est_read)
output
id ['VAAA0022179', 'VAAA0022179']
amt ['-12847.28', '12847.28']
fld10 ['N | N | N', 'N | N | N']
bal ['N', 'N']
sdc ['N', 'N']
est_read ['N', 'N']
I should mention that using t = fld10.split(' | ') is a bit fragile: if the separator isn't exactly ' | ' then the split will fail. So if there's a possibility that there might not be exactly one space either side of the pipe (|) then you should use a variation of Jinje's suggestion:
t = fld10.replace('|', ' ').split()
This replaces all pipe chars with spaces, and then splits on runs of white space, so it's guaranteed to split the subields correctly, assuming there's at least one space or pipe between each subfield (Jinje's original suggestion will fail if both spaces are missing on either side of the pipe).
Breaking your data up into separate lists may not be a great strategy: you have to be careful to keep the lists synchronised, so it's tricky to sort them or to add or remove items. And it's tedious to manipulate all the data as a unit when you have it spread out over half a dozen named lists.
One option is to put your data into a dictionary of lists:
import csv
from pprint import pprint
def csv_read(file_obj):
data = {
'unique_id': [],
'total_amt': [],
'bal_fwd': [],
'sdc': [],
'est_read': [],
}
for line in csv.DictReader(file_obj, delimiter=','):
data['unique_id'].append(line["LUSERFLD4"])
data['total_amt'].append(line["LAMOUNT1"])
fld10 = line["LUSERFLD10"]
t = fld10.split(' | ')
data['bal_fwd'].append(t[0])
data['sdc'].append(t[1])
data['est_read'].append(t[2])
return data
if __name__=="__main__":
with open("UT_0004A493.csv") as f_obj:
data = csv_read(f_obj)
pprint(data)
output
{'bal_fwd': ['N', 'N'],
'est_read': ['N', 'N'],
'sdc': ['N', 'N'],
'total_amt': ['-12847.28', '12847.28'],
'unique_id': ['VAAA0022179', 'VAAA0022179']}
Note that csv_read doesn't directly modify any global variables. It creates a dictionary of lists and passes it back to the code that calls it. This makes the code more modular; trying to debug large programs that use globals can become a nightmare because you have to keep track of every part of the program that modifies those globals.
Alternatively, you can put the data into a list of dictionaries, one per row.
def csv_read(file_obj):
data = []
for line in csv.DictReader(file_obj, delimiter=','):
luserfld10 = line["LUSERFLD10"]
bal_fwd, sdc, est_read = luserfld10.split(' | ')
# Put desired data and into a new dictionary
row = {
'unique_id': line["LUSERFLD4"],
'total_amt': line["LAMOUNT1"],
'bal_fwd': bal_fwd,
'sdc': sdc,
'est_read': est_read,
}
data.append(row)
return data
if __name__=="__main__":
with open("UT_0004A493.csv") as f_obj:
data = csv_read(f_obj)
pprint(data)
output
[{'bal_fwd': 'N',
'est_read': 'N',
'sdc': 'N',
'total_amt': '-12847.28',
'unique_id': 'VAAA0022179'},
{'bal_fwd': 'N',
'est_read': 'N',
'sdc': 'N',
'total_amt': '12847.28',
'unique_id': 'VAAA0022179'}]

Related

Adding a single list into a dictionary,

I am hoping someone can help me here. I am having some serious trouble adding a single list, from a text file into a dictionary. the list in the text file appears as:
20
Gunsmoke
30
The Simpsons
10
Will & Grace
14
Dallas
20
Law & Order
12
Murder, She Wrote
What I need is for each entry, one line at a time, to become the key and then value. For example it should look like {20:Gunsmoke, etc...}
I have to use the file.readlines() method according to my instructor. currently my code looks like this:
# Get the user input
inp = input()
# creating file object.
open = open(inp)
# read the file into seperate lines.
mylist = open.readlines()
# put the contents into a dictionary.
mydict = dict.fromkeys(mylist)
print(mydict)
The out put looks like this:
file1.txt
{'20\n': None, 'Gunsmoke\n': None, '30\n': None, 'The Simpsons\n': None, '10\n': None, 'Will & Grace\n': None, '14\n': None, 'Dallas\n': None, 'Law & Order\n': None, '12\n': None, 'Murder, She Wrote\n': None}
Process finished with exit code 0
There is more to this problem, but I am not here for someone to do my homework, I just cant figure out how to add this in properly. I have to be missing something and I am betting its simple. Thank you for your time.
# Get the user input
inp = input()
# creating file object.
f = open(inp)
# read the file into seperate lines.
mylist = f.readlines()
# determine the total number of key/value pairs
total_items = len(mylist)//2
# put the contents into a dictionary.
# note: strip() takes off the \n characters
mydict = {mylist[i*2].strip(): mylist[i*2+1].strip() for i in range(0,total_items)}
print(mydict)
First, you can read the files without newlines using read().splitlines(). Then split the list into 2 lists containing every other word. Then zip those 2 lists together and create a dictionary from that:
inp = input()
with open(inp, 'r') as f:
mylist = f.read().splitlines()
mydict = dict(zip(mylist[::2], mylist[1::2]))
Note also: using with to automatically close the file when done.

python script not joining strings as expected

I have a list of lists of sequences, and a corresponding list of lists of names.
testSequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
testNames = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
I also have a list of all the identifying parts of the names:
taxonNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
I am trying to produce a new list, where each item in the list will correspond to one of the "identifying parts of the names", and the string will be made up of all the sequences for that name.
If a name and sequence does not appear in one of the lists in the lists (i.e. no redFish or blueFish in the first list of testNames) I want to add in a string of hyphens the same length as the sequences in that list. This would give me this output:
['aaaa--AAAAAA', 'cccc--CCCCCC', '----ttTTTTTT', '----ggGGGG']
I have this piece of code to do this.
complete = [''] * len(taxonNames)
for i in range(len(testSequences)):
for j in range(len(taxonNames)):
sequenceLength = len(testSequences[i][0])
for k in range(len(testSequences[i])):
if taxonNames[j] in testNames[i][k]:
complete[j].join(testSequences[i][k])
if taxonNames[j] not in testNames[i][k]:
hyphenString = "-" * sequenceLength
complete[j].join(hyphenString)
print complete
"complete" should give my final output as explained above, but it comes out looking like this:
['', '', '', '']
How can I fix my code to give me the correct answer?
The main issue with your code, which makes it very hard to understand, is you're not really leveraging the language elements that make Python so strong.
Here's a solution to your problem that works:
test_sequences = [
['aaaa', 'cccc'],
['tt', 'gg'],
['AAAAAAA', 'CCCCCC', 'TTTTTT', 'GGGGGG']]
test_names = [
['>xx_oneFish |xzx', '>xx_twoFish |zzx'],
['>xx_redFish |zxx', '>xx_blueFish |zxx'],
['>xx_oneFish |xzx', '>xx_twoFish |xzx', '>xz_redFish |xxx', '>zx_blueFish |xzz']]
taxon_names = ['oneFish', 'twoFish', 'redFish', 'blueFish']
def get_seqs(taxon_name, sequences_list, names_list):
for seqs, names in zip(sequences_list, names_list):
found_seq = None
for seq, name in zip(seqs, names):
if taxon_name in name:
found_seq = seq
break
yield found_seq if found_seq else '-' * len(seqs[0])
result = [''.join(get_seqs(taxon_name, test_sequences, test_names))
for taxon_name in taxon_names]
print(result)
The generator get_seqs pairs up lists from test_sequences and test_names and for each pair, tries to find the sequence (seq) for the name (name) that matches and yields it, or yields a string of the right number of hyphens for that list of sequences.
The generator (a function that yields multiple values) has code that quite literally follows the explanation above.
The result is then simply a matter of, for each taxon_name, getting all the resulting sequences that match in order and joining them together into a string, which is the result = ... line.
You could make it work with list indexing loops and string concatenation, but this is not a PHP question, now is it? :)
Note: for brevity, you could just access the global test_sequences and test_names instead of passing them in as parameters, but I think that would come back to haunt you if you were to actually use this code. Also, I think it makes semantic sense to change the order of names and sequences in the entire example, but I didn't to avoid further deviating from your example.
Here is a solution that may do what you want. It begins, not with your data structures from this post, but with the three example files from your previous post (which you used to build this post's data structures).
The only thing I couldn't figure out was how many hyphens to use for a missing sequence from a file.
differentNames = ['oneFish', 'twoFish', 'redFish', 'blueFish']
files = ['f1.txt', 'f2.txt', 'f3.txt']
data = [[] for _ in range(len(differentNames))]
final = []
for file in files:
d = dict()
with open(file, 'r') as fin:
for line in fin:
line = line.rstrip()
if line.startswith('>'): # for ex., >xx_oneFish |xxx
underscore = line.index('_')
space = line.index(' ')
key = line[underscore+1:space]
else:
d[key] = line
for i, key in enumerate(differentNames):
data[i].append(d.get(key, '-' * 4))
for array in data:
final.append(''.join(array))
print(final)
Prints:
['AAAAAAAaaaa----', 'CCCCCCcccc----', 'TTTTTT----tt', 'GGGGGG----gg']

Pasing through CSV file to store as dictionary with nested array values. Best approach?

I am trying to take this csv file and parse and store it in a form of a dictionary (sorry if I use the terms incorrectly I am currently learning). The first element is my key and the rest will be values in a form of nested arrays.
targets_value,11.4,10.5,10,10.8,8.3,10.1,10.7,13.1
targets,Cbf1,Sfp1,Ino2,Opi1,Cst6,Stp1,Met31,Ino4
one,"9.6,6.3,7.9,11.4,5.5",N,"8.4,8.1,8.1,8.4,5.9,5.9",5.4,5.1,"8.1,8.3",N,N
two,"7.0,11.4,7.0","4.8,5.3,7.0,8.1,9.0,6.1,4.6,5.0,4.6","6.3,5.9,5.9",N,"4.3,4.8",N,N,N
three,"6.0,9.7,11.4,6.8",N,"11.8,6.3,5.9,5.9,9.5","5.4,8.4","5.1,5.1,4.3,4.8,5.1",N,N,11.8
four,"9.7,11.4,11.4,11.4",4.6,"6.2,7.9,5.9,5.9,6.3","5.6,5.5","4.8,4.8,8.3,5.1,4.3",N,7.9,N
five,7.9,N,"8.1,8.4",N,"4.3,8.3,4.3,4.3",N,N,N
six,"5.7,11.4,9.7,5.5,9.7,9.7","4.4,7.0,7.7,7.5,6.9,4.9,4.6,4.9,4.6","7.9,5.9,5.9,5.9,5.9,6.3",6.7,"5.1,4.8",N,7.9,N
seven,"6.3,11.4","5.2,4.7","6.3,6.0",N,"8.3,4.3,4.8,4.3,5.1","9.8,9.5",N,8.4
eight,"11.4,11.4,5.9","4.4,6.3,6.0,5.6,7.6,7.1,5.1,5.3,5.1,4.9","6.3,6.3,5.9,5.9,6.6,6.6","5.3,5.2,7.0","8.3,4.3,4.3,4.8,4.3,4.3,8.3,4.8,8.3,5.1","9.2,7.4","9.4,9.3,7.9",N
nine,"9.7,9.7,11.4,9.7","5.2,4.6,5.5,6.5,4.5,4.6,5.5","6.3,5.9,5.9,9.5,6.5",N,"4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8",8.0,8.6,N
ten,"9.7,9.7,9.7,11.4,7.9","5.2,4.6,5.5,6.5,4.5,4.6,5.5","6.3,5.9,5.9,9.5,6.5",5.7,"4.3,4.3,4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8",8.0,8.6,N
YPL250C_Icy2,"11.4,6.1,11.4",N,"6.3,6.0,6.6,7.0,10.0,6.5,9.5,7.0,10.0",7.1,"4.3,4.3",9.2,"10.7,9.5",N
,,,,,,,,
,,,,,,,,
The issue was that in each line, some columns are a quotes because of multiple values per cell, and some only have a single entry but no quote. And cells that had no value input were inserted with an N. Since there was a mixture of quotes and non quotes, and numbers and non numbers.
Wanted the output to look something like this:
{'eight': ['11.4,11.4,5.9', '4.4,6.3,6.0,5.6,7.6,7.1,5.1,5.3,5.1,4.9', '6.3,6.3,5.9,5.9,6.6,6.6', '5.3,5.2,7.0', '8.3,4.3,4.3,4.8,4.3,4.3,8.3,4.8,8.3,5.1', '9.2,7.4', '9.4,9.3,7.9', 'N'],
'ten': ['9.7,9.7,9.7,11.4,7.9', '5.2,4.6,5.5,6.5,4.5,4.6,5.5', '6.3,5.9,5.9,9.5,6.5', '5.7', '4.3,4.3,4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8', '8.0', '8.6', 'N'],
'nine': ['9.7,9.7,11.4,9.7', '5.2,4.6,5.5,6.5,4.5,4.6,5.5', '6.3,5.9,5.9,9.5,6.5', 'N', '4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8', '8.0', '8.6', 'N']
}
I wrote a script to clean it and store it, but was not sure if my script was "too long for no reason". Any tips?
motif_dict = {}
with open(filename, "r") as file:
data = file.readlines()
for line in data:
if ',,,,,,,,' in line:
continue
else:
quoted_holder = re.findall(r'"(\d.*?\d)"' , line)
#reverses the order of the elements contained in the array
quoted_holder = quoted_holder[::-1]
new_line = re.sub(r'"\d.*?\d"', 'h', line).split(',')
for position,element in enumerate(new_line):
if element == 'h':
new_line[position] = quoted_holder.pop()
motif_dict[new_line[0]] = new_line[1:]
There's a csv module which makes working with csv files much easier. In your case, your code becomes
import csv
with open("motif.csv","r",newline="") as fp:
reader = csv.reader(fp)
data = {row[0]: row[1:] for row in reader if row and row[0]}
where the if row and row[0] lets us skip rows which are empty or have an empty first element. This produces (newlines added)
>>> data["eight"]
['11.4,11.4,5.9', '4.4,6.3,6.0,5.6,7.6,7.1,5.1,5.3,5.1,4.9',
'6.3,6.3,5.9,5.9,6.6,6.6', '5.3,5.2,7.0',
'8.3,4.3,4.3,4.8,4.3,4.3,8.3,4.8,8.3,5.1',
'9.2,7.4', '9.4,9.3,7.9', 'N']
>>> data["ten"]
['9.7,9.7,9.7,11.4,7.9', '5.2,4.6,5.5,6.5,4.5,4.6,5.5',
'6.3,5.9,5.9,9.5,6.5', '5.7', '4.3,4.3,4.3,5.1,8.3,8.3,4.3,4.3,4.3,4.8',
'8.0', '8.6', 'N']
In practice, for processing, I think you'd want to replace 'N' with None or some other object as a missing marker, and make every value a list of floats (even if it's only got one element), but that's up to you.

Python - Iterating through a list of list with a specifically formatted output; file output

Sorry to ask such a trivial question but I can't find the answer anyway and it's my first day using Python (need it for work). Think my problem is trying to use Python like C. Anyway, here is what I have:
for i in data:
for j in i:
print("{}\t".format(j))
Which gives me data in the form of
elem[0][0]
elem[1][0]
elem[2][0]
...
elem[0][1]
elem[1][1]
...
i.e. all at once. What I really want to do, is access each element directly so I can output the list of lists data to a file whereby the elements are separated by tabs, not commas.
Here's my bastardised Python code for outputting the array to a file:
k=0
with open("Output.txt", "w") as text_file:
for j in data:
print("{}".format(data[k]), file=text_file)
k += 1
So basically, I have a list of lists which I want to save to a file in tab delimited/separated format, but currently it comes out as comma separated. My approach would involve reiterating through the lists again, element by element, and saving the output by forcing in the the tabs.
Here's data excerpts (though changed to meaningless values)
data
['a', 'a', 306518, ' 1111111', 'a', '-', .... ]
['a', 'a', 306518, ' 1111111', 'a', '-', .... ]
....
text_file
a a 306518 1111111 a -....
a a 306518 1111111 a -....
....
for i in data:
print("\t".join(i))
if data is something like this '[[1,2,3],[2,3,4]]'
for j in data:
text_file.write('%s\n' % '\t'.join(str(x) for x in j))
I think this should work:
with open(somefile, 'w') as your_file:
for values in data:
print("\t".join(valeues), file=your_file)

How do i format the ouput of a list of list into a textfile properly?

I am really new to python and now I am struggeling with some problems while working on a student project. Basically I try to read data from a text file which is formatted in columns. I store the data in a list of list and sort and manipulate the data and write them into a file again. My problem is to align the written data in proper columns. I found some approaches like
"%i, %f, %e" % (1000, 1000, 1000)
but I don't know how many columns there will be. So I wonder if there is a way to set all columns to a fixed width.
This is how the input data looks like:
2 232.248E-09 74.6825 2.5 5.00008 499.482
5 10. 74.6825 2.5 -16.4304 -12.3
This is how I store the data in a list of list:
filename = getInput('MyPath', workdir)
lines = []
f = open(filename, 'r')
while 1:
line = f.readline()
if line == '':
break
splitted = line.split()
lines.append(splitted)
f.close()
To write the data I first put all the row elements of the list of list into one string with a free fixed space between the elements. But instead i need a fixed total space including the element. But also I don't know the number of columns in the file.
for k in xrange(len(lines)):
stringlist=""
for i in lines[k]:
stringlist = stringlist+str(i)+' '
lines[k] = stringlist+'\n'
f = open(workdir2, 'w')
for i in range(len(lines)):
f.write(lines[i])
f.close()
This code works basically, but sadly the output isn't formatted properly.
Thank you very much in advance for any help on this issue!
You are absolutely right about begin able to format widths as you have above using string formatting. But as you correctly point out, the tricky bit is doing this for a variable sized output list. Instead, you could use the join() function:
output = ['a', 'b', 'c', 'd', 'e',]
# format each column (len(a)) with a width of 10 spaces
width = [10]*len(a)
# write it out, using the join() function
with open('output_example', 'w') as f:
f.write(''.join('%*s' % i for i in zip(width, output)))
will write out:
' a b c d e'
As you can see, the length of the format array width is determined by the length of the output, len(a). This is flexible enough that you can generate it on the fly.
Hope this helps!
String formatting might be the way to go:
>>> print("%10s%9s" % ("test1", "test2"))
test1 test2
Though you might want to first create strings from those numbers and then format them as I showed above.
I cannot fully comprehend your writing code, but try working on it somehow like that:
from itertools import enumerate
with open(workdir2, 'w') as datei:
for key, item in enumerate(zeilen):
line = "%4i %6.6" % key, item
datei.write(item)

Categories

Resources