I have a list of data of the form:
[line1,a]
[line2,c]
[line3,b]
I want to use a mapping of a=5, c=15, b=10:
[line1,5]
[line2,15]
[line3,10]
I am trying to use this code, which I know is incorrect, can someone guide me on how to best achieve this:
mapping = {"a": 5, "b": 10, "c": 15}
applyMap = [line[1] = 'a' for line in data]
Thanks
EDIT:
Just to clarify here, for one line, however I want this mapping to occur to all lines in the file:
Input: ["line1","a"]
Output: ["line1",5]
You could try with a list comprehension.
lines = [
["line1", "much_more_items1", "a"],
["line2", "much_more_items2", "c"],
["line3", "much_more_items3", "b"],
]
mapping = {"a": 5, "b": 10, "c": 15}
# here I assume the key you need to remove is at last position of your items
result = [ line[0:-1] + [mapping[line[-1]] for line in lines ]
Try something like this:
data = [
['line1', 'a'],
['line2', 'c'],
['line3', 'b'],
]
mapping = {"a": 5, "b": 10, "c": 15}
applyMap = [[line[0], mapping[line[1]]] for line in data]
print applyMap
>>> data = [["line1", "a"], ["line2", "b"], ["line3", "c"]]
>>> mapping = { "a": 5, "b": 10, "c": 15}
>>> [[line[0], mapping[line[1]]] for line in data]
[['line1', 5], ['line2', 10], ['line3', 15]]
lineMap = {'line1': 'a', 'line2': 'b', 'line3': 'c'}
cha2num = {'a': 5, 'b': 10, 'c': 15}
result = [[key,cha2num[lineMap[key]]] for key in lineMap]
print result
what you need is a map to relevance 'a' -> 5
Related
This question already has answers here:
How to merge dicts, collecting values from matching keys?
(17 answers)
Closed 6 days ago.
I am looking to combine two dictionaries by grouping elements that share common keys, but I would also like to account for keys that are not shared between the two dictionaries. For instance given the following two dictionaries.
d1 = {'a':1, 'b':2, 'c': 3, 'e':5}
d2 = {'a':11, 'b':22, 'c': 33, 'd':44}
The intended code would output
df = {'a':[1,11] ,'b':[2,22] ,'c':[3,33] ,'d':[0,44] ,'e':[5,0]}
Or some array like:
df = [[a,1,11] , [b,2,22] , [c,3,33] , [d,0,44] , [e,5,0]]
The fact that I used 0 specifically to denote an entry not existing is not important per se. Just any character to denote the missing value.
I have tried using the following code
df = defaultdict(list)
for d in (d1, d2):
for key, value in d.items():
df[key].append(value)
But get the following result:
df = {'a':[1,11] ,'b':[2,22] ,'c':[3,33] ,'d':[44] ,'e':[5]}
Which does not tell me which dict was missing the entry.
I could go back and look through both of them, but was looking for a more elegant solution
You can use a dict comprehension like so:
d1 = {'a':1, 'b':2, 'c': 3, 'e':5}
d2 = {'a':11, 'b':22, 'c': 33, 'd':44}
res = {k: [d1.get(k, 0), d2.get(k, 0)] for k in set(d1).union(d2)}
print(res)
Another solution:
d1 = {"a": 1, "b": 2, "c": 3, "e": 5}
d2 = {"a": 11, "b": 22, "c": 33, "d": 44}
df = [[k, d1.get(k, 0), d2.get(k, 0)] for k in sorted(d1.keys() | d2.keys())]
print(df)
Prints:
[['a', 1, 11], ['b', 2, 22], ['c', 3, 33], ['d', 0, 44], ['e', 5, 0]]
If you do not want sorted results, leave the sorted() out.
I want to convert pandas dataframe to a key value pair dictionary by combining index and column name as key. Is there a easy way to do it?
Before:
T1 T2
apple 5 1
pear 2 1.5
banana 10 12
After:
{'apple_T1': 5,
'apple_T2': 1,
'pear_T1': 2,
...
'banana_T2':12
}
Thanks a lot!
In one step:
{f"{row}_{k}": v for row, data in df.iterrows() for k, v in data.items()}
Use DataFrame.to_dict followed by a dictionary comprehension:
import pandas as pd
data = [[5, 1], [2, 1.5], [10, 12]]
df = pd.DataFrame(data=data, columns=["T1", "T2"], index=["apple", "pear", "banana"])
result = { f"{kout}_{kin}" : value for kout, d in df.to_dict("index").items() for kin, value in d.items()}
print(result)
Output
{'apple_T1': 5, 'apple_T2': 1.0, 'pear_T1': 2, 'pear_T2': 1.5, 'banana_T1': 10, 'banana_T2': 12.0}
You can use df.to_dict(orient='index')
Set your dataframe:
df = pd.DataFrame(
columns=["T1", "T2"],
data=[[5,1], [2, 1.5], [10, 12]],
index=["apple", "pear", "banana"]
)
Apply .to_dict method:
d = df.to_dict(orient='index')
result = {f"{k1}_{k2}": v for k1 in d for k2, v in d[k1].items()}
Result:
{'apple_T1': 5,
'apple_T2': 1,
'banana_T1': 10,
'banana_T2': 12,
'pear_T1': 2,
'pear_T2': 1.5}
A direct way is to loop through all index and columns, and define the dictionary items one by one:
df = pd.DataFrame([[5, 1], [2, 1.5], [10, 12]],
columns=['T1', 'T2'],
index=['apple', 'pear', 'banana'])
new_dict = {}
for i in df.index:
for j in df.columns:
new_dict[i + '_' + j] = df.loc[i, j]
print(new_dict)
Output is
{'apple_T1': 5,
'apple_T2': 1.0,
'pear_T1': 2,
'pear_T2': 1.5,
'banana_T1': 10,
'banana_T2': 12.0}
You can use df.iterrows() but you need be careful to get what you want:
>>> {f'{row}_{k}':v for row, col in df.iterrows() for k,v in list(col.items())[1:]}
{'apple_T1': 5,
'apple_T2': 1.0,
'pear_T1': 2,
'pear_T2': 1.5,
'banana_T1': 10,
'banana_T2': 12.0}
I have json like this:
json = {
"b": 22,
"x": 12,
"a": 2,
"c": 4
}
When i generate an Excel file from this json like this:
import pandas as pd
df = pd.read_json(json_text)
file_name = 'test.xls'
file_path = "/tmp/" + file_name
df.to_excel(file_path, index=False)
print("path to excel " + file_path)
Pandas does its own ordering in the Excel file like this:
pandas_json = {
"a": 2,
"b": 22,
"c": 4,
"x": 12
}
I don't want this. I need the ordering which exists in the json. Please give me some advice how to do this.
UPDATE:
if i have json like this:
json = [
{"b": 22, "x":12, "a": 2, "c": 4},
{"b": 22, "x":12, "a": 2, "c": 2},
{"b": 22, "x":12, "a": 4, "c": 4},
]
pandas will generate its own ordering like this:
panas_json = [
{"a": 2, "b":22, "c": 4, "x": 12},
{"a": 2, "b":22, "c": 2, "x": 12},
{"a": 4, "b":22, "c": 4, "x": 12},
]
How can I make pandas preserve my own ordering?
You can read the json as OrderedDict which will help to retain original order:
import json
from collections import OrderedDict
json_ = """{
"b": 22,
"x": 12,
"a": 2,
"c": 4
}"""
data = json.loads(json_, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data,orient='index')
0
b 22
x 12
a 2
c 4
Edit, updated json also works:
j="""[{"b": 22, "x":12, "a": 2, "c": 4},
{"b": 22, "x":12, "a": 2, "c": 2},{"b": 22, "x":12, "a": 4, "c": 4}]"""
data = json.loads(j, object_pairs_hook=OrderedDict)
pd.DataFrame.from_dict(data).to_json(orient='records')
'[{"b":22,"x":12,"a":2,"c":4},{"b":22,"x":12,"a":2,"c":2},
{"b":22,"x":12,"a":4,"c":4}]'
Is there a method of logically merging multiple dictionaries if they have common strings between them? Even if these common strings match between values of one dict() to a key of another?
I see a lot of similar questions on SO but none that seem to address my specific issue of relating multiple keys in "lower level files" to those in higher keys/values(level1dict)
Say we have:
level1dict = { '1':[1,3], '2':2 }
level2dict = { '1':4, '3':[5,9], '2':10 }
level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
finaldict = level1dict
When I say logically I mean, in level1dict 1=1,3 and in level2dict 1=4 and 3=5,9 so overall (so far) 1 = 1,3,4,5,9 (sorting not important)
The result I would like to get to is
#.update or .append or .default?
finaldict = {'1':[1,3,4,5,9,6,8,11,12,14,15,16,17] '2':[2,10,18,19,20]}
Answered: Thank you Ashwini Chaudhary and Abhijit for the networkx module.
This is a problem of connected component subgraphs and can be best determined if you want to use networkx. Here is a solution to your problem
>>> import networkx as nx
>>> level1dict = { '1':[1,3], '2':2 }
>>> level2dict = { '1':4, '3':[5,9], '2':10 }
>>> level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
>>> G=nx.Graph()
>>> for lvl in level:
for key, value in lvl.items():
key = int(key)
try:
for node in value:
G.add_edge(key, node)
except TypeError:
G.add_edge(key, value)
>>> for sg in nx.connected_component_subgraphs(G):
print sg.nodes()
[1, 3, 4, 5, 6, 8, 9, 11, 12, 14, 15, 16, 17]
[2, 10, 13, 18, 19, 20]
>>>
Here is how you visualize it
>>> import matplotlib.pyplot as plt
>>> nx.draw(G)
>>> plt.show()
A couple of notes:
It's not convenient that some values are numbers and some are lists. Try converting numbers to 1-item lists first.
If the order is not important, you'll be better off using sets instead of lists. They have methods for all sorts of "logical" operations.
Then you can do:
In [1]: dict1 = {'1': {1, 3}, '2': {2}}
In [2]: dict2 = {'1': {4}, '2': {10}, '3': {5, 9}}
In [3]: dict3 = {'1': {6, 8, 11}, '2': {13}, '4': {12}}
In [4]: {k: set.union(*(d[k] for d in (dict1, dict2, dict3)))
for k in set.intersection(*(set(d.keys()) for d in (dict1, dict2, dict3)))}
Out[4]: {'1': set([1, 3, 4, 6, 8, 11]), '2': set([2, 10, 13])}
In [106]: level1dict = { '1':[1,3], '2':2 }
In [107]: level2dict = { '1':4, '3':[5,9], '2':10 }
In [108]: level3dict = { '1':[6,8,11], '4':12, '2':13, '3':[14,15], '5':16, '9':17, '10':[18,19,20]}
In [109]: keys=set(level2dict) & set(level1dict) & set(level3dict) #returns ['1','2']
In [110]: dic={}
In [111]: for key in keys:
dic[key]=[]
for x in (level1dict,level2dict,level3dict):
if isinstance(x[key],int):
dic[key].append(x[key])
elif isinstance(x[key],list):
dic[key].extend(x[key])
.....:
In [112]: dic
Out[112]: {'1': [1, 3, 4, 6, 8, 11], '2': [2, 10, 13]}
# now iterate over `dic` again to get the values related to the items present
# in the keys `'1'` and `'2'`.
In [122]: for x in dic:
for y in dic[x]:
for z in (level1dict,level2dict,level3dict):
if str(y) in z and str(y) not in dic:
if isinstance(z[str(y)],(int,str)):
dic[x].append(z[str(y)])
elif isinstance(z[str(y)],list):
dic[x].extend(z[str(y)])
.....:
In [123]: dic
Out[123]:
{'1': [1, 3, 4, 6, 8, 11, 5, 9, 14, 15, 12, 16, 17],
'2': [2, 10, 13, 18, 19, 20]}
Is there a way to see how many items in a dictionary share the same value in Python?
Let's say that I have a dictionary like:
{"a": 600, "b": 75, "c": 75, "d": 90}
I'd like to get a resulting dictionary like:
{600: 1, 75: 2, 90: 1}
My first naive attempt would be to just use a nested-for loop and for each value then I would iterate over the dictionary again. Is there a better way to do this?
You could use itertools.groupby for this.
import itertools
x = {"a": 600, "b": 75, "c": 75, "d": 90}
[(k, len(list(v))) for k, v in itertools.groupby(sorted(x.values()))]
When Python 2.7 comes out you can use its collections.Counter class
otherwise see counter receipe
Under Python 2.7a3
from collections import Counter
items = {"a": 600, "b": 75, "c": 75, "d": 90}
c = Counter( items )
print( dict( c.items() ) )
output is
{600: 1, 90: 1, 75: 2}
>>> a = {"a": 600, "b": 75, "c": 75, "d": 90}
>>> b = {}
>>> for k,v in a.iteritems():
... b[v] = b.get(v,0) + 1
...
>>> b
{600: 1, 90: 1, 75: 2}
>>>
Use Counter (2.7+, see below at link for implementations for older versions) along with dict.values().
>>> a = {"a": 600, "b": 75, "c": 75, "d": 90}
>>> d={}
>>> for v in a.values():
... if not v in d: d[v]=1
... else: d[v]+=1
...
>>> d
{600: 1, 90: 1, 75: 2}