I'm constructing a dictionary in Python from many elements, some of which are nan's and I don't want to add them to the dictionary at all (because then I'll be inserting it into database and I don't want to have fields which don't make sense).
At the moment I'm doing something like this:
data = pd.read_csv("data.csv")
for i in range(len(data)):
mydict = OrderedDict([("type", "mydata"), ("field2", data.ix[i,2]), ("field5", data.ix[i,5])])
if not math.isnan(data.ix[i,3]):
mydict['field3'] = data.ix[i,3]
if not math.isnan(data.ix[i,4]):
mydict['field4'] = data.ix[i,4]
if not math.isnan(data.ix[i,8]):
mydict['field8'] = data.ix[i,8]
etc....
Can it be done in a flatter structure, i.e., defining an array of field names and field numbers I'd like to conditionally insert?
>>> fields = [float('nan'),2,3,float('nan'),5]
>>> {"field%d"%i:v for i,v in enumerate(fields) if not math.isnan(v)}
{'field2': 3, 'field1': 2, 'field4': 5}
Or an ordered dict:
>>> OrderedDict(("field%d"%i,v) for i,v in enumerate(fields) if not math.isnan(v))
OrderedDict([('field1', 2), ('field2', 3), ('field4', 5)])
Is this what you were looking for?
data = pd.read_csv("data.csv")
for i in range(len(data)):
mydict = OrderedDict([("type", "mydata"), ("field2", data.ix[i,2]), ("field5", data.ix[i,5])])
# field numbers
fields = [3,4,8]
for f in fields:
if not math.isnan(data.ix[i,f]):
mydict['field'+str(f)] = data.ix[i,f]
conditional_fields = ((3, 'field3'), (4, 'field4'), (8, 'field8'))
for i in range(len(data)):
mydict = OrderedDict([("type", "mydata"), ("field2", data.ix[i,2]), ("field5", data.ix[i,5])])
for (index, fieldname) in conditional_fields:
if not math.isnan(data.ix[i, index]):
mydict[fieldname] = data.ix[i, index]
I am assuming the actual field names are not literally 'field8' etc.
Related
I want to store key-value pairs, but I don't know how to do it.
What I want to achieve is a variable that would store the different value pairs.
What I would want is something like this:
dic = {}
valuepair = (2,3), "cell1"
Each value pair is unique and I would want to be able to do something like this:
dic[(2,3)] = "cell1"
dic["cell1"] = (2,3)
Is there a way to achieve something like that for many different unique value pairs?
If you ask if you can use a tuple as a key - yes, for example:
dic[(2,3)] = "cell1"
print(dic[(2,3)])
would show cell1
or create an inverse dict like this:
inverse_d = {v:k for key, value in d}
Key-Value pair means a key mapped to a value. And what you are doing is right, but if you got the key, you can get value from it. So you need not store value ("cell1"), again as a key, when it is already a value. Sorry, if I don't get your question. Or you can do this too:
x = [("k1","v1"),("k2,"v2")]
d = dict(x)
print(d)
OUTPUT : {"k1":"v1", "k2":"v2"}
You can always do that, but why would you need that is still a question.
valuepairs = [[(2,3), "cell1"], [(4,5), "cell2"]]
dic = {}
for x, y in valuepairs:
dic[x] = y
dic[y] = x
print(dic)
# {(2, 3): 'cell1', 'cell1': (2, 3), (4, 5): 'cell2', 'cell2': (4, 5)}
I wanted to learn how to use dictionary comprehension and decided to use one for the previously solved task. I need to assign multiple values to the same key. I was wondering if there's a better way to achieve what I'm trying to do than with the code I've written so far.
graph = {(x1,y1): [(c,d) for a,b,c,d in data if a == x1 and b == y1] for x1 ,y1, x2, y2 in data}
For example I have this:
data = {(1,2,1,5),(1,2,7,2),(1,5,4,7),(4,7,7,5)}
The first two values should create a key and the remaining two should be added as a value of a key.
With the given example I would like to return:
{(1, 2): [(1, 5), (7, 2)], (1, 5): [(4, 7)], (4, 7): [(7, 5)]}
Is there an easier way to do it than iterate through the entire data just to find the matching values?
Using this dict comprehension isn’t an efficient way here. It loops over the same input data repeatedly.
It's more Pythonic to just use a simple for loop, iterating the data only once:
from collections import defaultdict
data = {(1,2,1,5),(1,2,7,2),(1,5,4,7),(4,7,7,5)}
output = defaultdict(list)
for a, b, c, d in data:
output[a, b].append((c, d))
Your code is neat but the time complexity is O(n^2), which can be reduced to O(n).
data = {(1,2,1,5),(1,2,7,2),(1,5,4,7),(4,7,7,5)}
result = dict()
for item in data:
key = (item[0],item[1])
value = result.setdefault(key,[])
value.append((item[2],item[3]))
result[key] = value
print result
In my opinion, using a for loop can make codes more comprehensive
I don't know if it is the best answer but I would do something like that :
m_dict = {}
for val in data:
key = (val[0],val[1])
if key in m_dict:
m_dict[key].append((val[2],val[3]))
else:
m_dict[key] = [(val[2],val[3])]
Or more concisely using setdefault:
m_dict = {}
for val in data:
key = (val[0],val[1])
obj = m_dict.setdefault(key,[])
obj.append((val[2],val[3]))
In this instance, I would use itertools.groupby. For your example:
dict(groupby(data, lambda t: (t[0], t[1])))
This will produce a dict with the keys equal to (1, 2), (1, 5), and (4, 7) and the values consisting of (1, 2, 1, 5), (1, 2, 7, 2)... which should be sufficient for most uses. You can also post-process the grouped list, if need be.
As noted in the comments below, groupby requires sorted data. As such, you will need to sort before grouping and will probably want to cast the iterator to a list:
first_two = lambda tup: (tup[0], tup[1])
groups = groupby(sorted(data, key=first_two), first_two)
target = {k: list(g) for k, g in groups}
I have a two list of tuples
t1 = [ ('a',3,4), ('b',3,4), ('c',4,5) ]
t2 = [ ('a',4,6), ('c',3,4), ('b',3,6), ('d',4,5) ]
Such that
the order of the tuples may not be the same order and
the lists may not contain the same amount of tuple elements.
My goal is to compare the two lists such that if the string element matches, then compare the last integer element in the tuple and return a list containing -1 if t1[2] < t2[2], 0 if they are equal and 1 if they are greater than.
I've tried different variations but the problem i have is finding a way to match the strings to do proper comparison.
return [diff_unique(x[2],y[2]) for x,y in zip(new_list,old_list) ]
Where diff_unique does the aforementioned comparison of the integers, and new_list is t1 and old_list is t2.
I've also tried this:
return [diff_unique(x[2],y[2]) for x,y in zip(new_list,old_list) if(x[0]==y[0]]
What I intend to do is use the returned list and create a new four-tuple list with the original t1 values along with the difference from the matching t2 tuple. i.e
inc_dec_list = compare_list(new,old)
final_list = [ (f,r,u,chge) for (f,r,u), chge in zip(new,inc_dec_list)]
Where new = t1 and old = t2. This may have been an important detail, sorry I missed it.
Any help in the right direction?
Edit: I have added my test case program that mimicks what my original intent is for those who want to help. Thank you all.
import os
import sys
old = [('a',10,1),('b',10,2),('c',100,4),('d',200,4),('f',45,2)]
new = [('a',10,2),('c',10,2),('b',100,2),('d',200,6),('e',233,4),('g',45,66)]
def diff_unique(a,b):
print "a:{} = b:{}".format(a,b)
if a < b:
return -1
elif a==b:
return 0
else:
return 1
def compare_list(new_list, old_list):
a = { t[0]:t[1:] for t in new_list }
b = { t[0]:t[1:] for t in old_list }
common = list( set(a.keys())&set(b.keys()))
return [diff_unique(a[key][1], b[key][1]) for key in common]
#get common tuples
#common = [x for x,y in zip(new_list,old_list) if x[0] == y[0] ]
#compare common to old list
#return [diff_unique(x[2],y[2]) for x,y in zip(new_list,old_list) ]
inc_dec_list = compare_list(new,old)
print inc_dec_list
final_list = [ (f,r,u,chge) for (f,r,u), chge in zip(new,inc_dec_list)]
print final_list
To match the tuples by string from different lists, you can use dict comprehension (order inside the tuples is preserved):
a = {t[0]:t[1:] for t in t1} # {'a': (3, 4), 'c': (4, 5), 'b': (3, 4)}
b = {t[0]:t[1:] for t in t1} # {'a': (4, 6), 'c': (3, 4), 'b': (3, 6), 'd': (4, 5)}
Then you can iterate over the keys of both dictionaries and do the comparison. Assuming you only want to do the comparison for keys/tuples present in t1 and t2, you can join the keys using sets:
common_keys = list(set(a.keys())&set(b.keys()))
And finally compare the dictionary's items and create the list you want like this:
return [diff_unique(a[key][1],b[key][1]) for key in common_keys ]
If you need the output in the order of the alphabetically sorted characters, use the sorted function on the keys:
return [diff_unique(a[key][1],b[key][1]) for key in sorted(common_keys) ]
If you want all keys to be considered, you can do the following:
all_keys = list(set(a.keys()+b.keys()))
l = list()
for key in sorted(all_keys):
try:
l.append(diff_unique(a[key][1],b[key][1]))
except KeyError:
l.append("whatever you want")
return l
With the new information about what values should be returned in what order, the solution would be this:
ordered_keys = [t[0] for t in t1]
a = {t[0]:t[1:] for t in t1} # {'a': (3, 4), 'c': (4, 5), 'b': (3, 4)}
b = {t[0]:t[1:] for t in t1} # {'a': (4, 6), 'c': (3, 4), 'b': (3, 6), 'd': (4, 5)}
l = list()
for key in sorted(ordered_keys):
try:
l.append(diff_unique(a[key][1],b[key][1]))
except KeyError:
l.append(0) # default value
return l
First, build a default dictionary from each list, with the default value for a nonexistent key being a tuple whose last element is the smallest possible value for a comparison.
SMALL = (-float['inf'],)
from collections import defaultdict
d1 = defaultdict(lambda: SMALL, [(t[0], t[1:]) for t in t1])
d2 = defaultdict(lambda: SMALL, [(t[0], t[1:]) for t in t2])
Next, iterate over the keys in each dictionary (which can be created easily with itertools.chain). You probably want to sort the keys for the resulting list to have any meaning (otherwise, how do you know which keys produced which of -1/0/1?)
from itertools import chain
all_keys = set(chain(d1, d2))
result = [cmp(d1[k][-1], d2[k][-1]) for k in sorted(all_keys)]
Here is a simple solution of your problem,
It is not one line as you tried. I hope it will still help you
for a in t1:
for b in t2:
if a[0] != b[0]:
continue
return cmp(a[-1], b[-1])
In python 3.x, you can compare two lists of tuples
a and b thus:
import operator
a = [(1,2),(3,4)]
b = [(3,4),(1,2)]
# convert both lists to sets before calling the eq function
print(operator.eq(set(a),set(b))) #True
Assume you have a data set as something like a CSV file that contains mildly sensitive information, like who passed a note to whom in a 12 Grade English class. While it's not a crisis if this data got out, it would be nice to strip out the identifying information so the data could be made public, shared with collaborators, etc. The data looks something like this:
Giver, Recipient:
Anna,JoeAnna,MarkMark,MindyMindy,Joe
How would you process through this list, assign each name a unique but arbitrary identifier, then strip out the names and replace them with said identifier in Python such that you end up with something like:
1,21,3
3,44,2
you can use hash() to generate a unique arbitrary identifier, it will return always return same integer for a particular string:
with open("data1.txt") as f:
lis=[x.split(",") for x in f]
items=[map(lambda y:hash(y.strip()),x) for x in lis]
for x in items:
print ",".join(map(str,x))
....:
-1319295970,1155173045
-1319295970,-1963774321
-1963774321,-1499251772
-1499251772,1155173045
or you can also use iterools.count:
In [80]: c=count(1)
In [81]: with open("data1.txt") as f:
lis=[map(str.strip,x.split(",")) for x in f]
dic={}
for x in set(chain(*lis)):
dic.setdefault(x.strip(),next(c))
for x in lis:
print ",".join(str(dic[y.strip()]) for y in x)
....:
3,2
3,4
4,1
1,2
or improving my previous answer using the unique_everseen recipe from itertools, you can get the exact answer :
In [84]: c=count(1)
In [85]: def unique_everseen(iterable, key=None):
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
....:
In [86]: with open("data1.txt") as f:
lis=[map(str.strip,x.split(",")) for x in f]
dic={}
for x in unique_everseen(chain(*lis)):
dic.setdefault(x.strip(),next(c))
for x in lis:
print ",".join(str(dic[y.strip()]) for y in x)
....:
1,2
1,3
3,4
4,2
names = """
Anna,Joe
Anna,Mark
Mark,Mindy
Mindy,Joe
"""
nameset = set((",".join(names.strip().splitlines())).split(","))
for i,name in enumerate(nameset):
names = names.replace(name,str(i))
print names
2,1
2,3
3,0
0,1
You could use hash to get a unique ID for each name of you could use a dictionary mapping names to their values (if you want numbers to be as in your example):
data = [("Anna", "Joe"), ("Anna", "Mark"), ("Mark", "Mindy"), ("Mindy", "Joe")]
names = {}
def anon(name):
if not name in names:
names[name] = len(names) + 1
return names[name]
result = []
for n1, n2 in data:
result.append((anon(n1), anon(n2)))
print names
print result
Will give when run:
{'Mindy': 4, 'Joe': 2, 'Anna': 1, 'Mark': 3}
[(1, 2), (1, 3), (3, 4), (4, 2)]
First, read your file into a list of rows:
import csv
with open('myFile.csv') as f:
rows = [row for row in csv.reader(f)]
At this point, you could build a dict to hold the mapping:
nameSet = set()
for row in rows:
for name in row:
nameSet.add(name)
map = dict((name, i) for i, name in enumerate(nameSet))
Alternatively, you could build the dict directly:
nextID = 0
map = {}
for row in rows:
for name in row:
if name not in map:
map[name] = nextID
nextID += 1
Either way, you go through the rows again and apply the mapping:
output = [[map[name] for name in row] for row in rows]
To genuinely anonymize the data, you need random aliases for the names. Hashes are good for that, but if you just want to map each name to an integer, you could do something like this:
from random import shuffle
data = [("Anna", "Joe"), ("Anna", "Mark"), ("Mark", "Mindy"), ("Mindy", "Joe")]
names = list(set(x for pair in data for x in pair))
shuffle(names)
aliases = dict((k, v) for v, k in enumerate(names))
munged = [(aliases[a], aliases[b]) for a, b in data]
That'll give you something like:
>>> data
[('Anna', 'Joe'), ('Anna', 'Mark'), ('Mark', 'Mindy'), ('Mindy', 'Joe')]
>>> names
['Mindy', 'Joe', 'Anna', 'Mark']
>>> aliases
{'Mindy': 0, 'Joe': 1, 'Anna': 2, 'Mark': 3}
>>> munged
[(2, 1), (2, 3), (3, 0), (0, 1)]
You can then (if you need to) get the name from the alias, and vice versa:
>>> aliases["Joe"]
1
>>> names[2]
'Anna'
One way to manually persist a dictionary to a database is to flatten it into a sequence of sequences and pass the sequence as an argument to cursor.executemany().
The opposite is also useful, i.e. reading rows from a database and turning them into dictionaries for later use.
What's the best way to go from myseq to mydict and from mydict to myseq?
>>> myseq = ((0,1,2,3), (4,5,6,7), (8,9,10,11))
>>> mydict = {0: (1, 2, 3), 8: (9, 10, 11), 4: (5, 6, 7)}
mydict = dict((s[0], s[1:]) for s in myseq)
myseq = tuple(sorted((k,) + v for k, v in mydict.iteritems()))
>>> mydict = dict((t[0], t[1:]) for t in myseq))
>>> myseq = tuple(((key,) + values) for (key, values) in mydict.items())
The ordering of tuples in myseq is not preserved, since dictionaries are unordered.