How to input data from Excel to Python dictionary? - python

My code:
import openpyxl
workbook = openpyxl.load_workbook('master.xlsx')
worksheet = workbook.worksheets[0]
result = {}
for k, v in zip(worksheet['A'], worksheet['B']):
result[k.internal_value] = v.internal_value
print(result)
The output I get:
{'PPPPP': '22', 'bbbbb': '20', 'ccccc': '30', 'ddddd': '40', 'eeeee': '50'}
Excel file:
The output I want:
{'PPPPP': ['22','10'], 'bbbbb': ['20','30'], 'ccccc': ['30','30'], 'ddddd': '40', 'eeeee': '50'}

You can do it using pandas
import pandas as pd
df = pd.read_excel('master.xlsx', 0, None, ['A', 'B'])
result = {}
for x, y in zip(df['A'], df['B']):
if x in result:
result[x] = [result.get(x)]
result[x].append(str(y))
else:
result[x] = str(y)
print(result)
{'ppp': ['10', '22'], 'bbb': ['20', '30'], 'ccc': ['30', '30'], 'ddd': '40', 'eee': '50'}

Use a defaultdict, with an empty list as the default, and append each new value:
from collections import defauldict
import openpyxl
workbook = openpyxl.load_workbook('master.xlsx')
worksheet = workbook.worksheets[0]
result = defaultdict(list)
for k, v in zip(worksheet['A'], worksheet['B']):
result[k.internal_value].append(v.internal_value)
print(result)
Here EVERY result will be a list, even when you only have one value. e.g. you will get 'ddddd': ['40'] but you should be able to handle all key value pairs consistently.

Related

Create dictionary of dictionaries of lists in one line

Any way to create dictionary of dictionaries of lists in one line?
names = ['alex', 'ben', 'coby']
age = ['20', '30', '40', '50']
name_age = {n: {} for n in names}
for n in names:
name_age[n] = {a: {} for a in age}
Nesting such as below does not work.
name_age = {n: {{a: {} for a in age}} for n in names}
The fastest way of constructing a dictionary based on two lists is iterating them at the same time with zip. This will work:
names = ['alex', 'ben', 'coby']
age = ['20', '30', '40', '50']
name_age = dict(zip(names, age))
print(name_age)
>>> {'alex': '40', 'ben': '50', 'coby': '20'}
names = ['alex', 'ben', 'coby']
ages = ['20', '30', '40', '50']
result = { names[index]:ages[index] for index in range(len(names)) }
print(result)
{'alex': '20', 'ben': '30', 'coby': '40'}

Best way to merge two lists and combine double values?

What is the best way to merge two lists into one and also combine double values? For example:
list_01 = [['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-06'],
['10', '20', '30', '40']]
list_02 = [['2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07'],
['10', '20', '30', '40']]
The final list should look like this:
list_03 = [['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07'],
['10', '20', '40', '30', '70', '40']]
Whenever the dates have matched, the integer-values in the second column have been summed together.
Right now, my only real solution is to pass both lists trough several loops, but I wonder if there might be a better solution.
Thanks and a great evening for all of you.
Your "integers" should really be ints, not strings, and your lists should probably be Counters, as you seem to be counting things per day. Then you can simply add them:
from collections import Counter
list_01 = [['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-06'],
['10', '20', '30', '40']]
list_02 = [['2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07'],
['10', '20', '30', '40']]
def to_counter(lst):
return Counter(dict(zip(lst[0], map(int, lst[1]))))
counter = to_counter(list_01) + to_counter(list_02)
for item in counter.items():
print(item)
Prints:
('2020-01-02', 10)
('2020-01-03', 20)
('2020-01-04', 40)
('2020-01-06', 70)
('2020-01-05', 20)
('2020-01-07', 40)
Try this, make dictionaries. Let me know if this isn't what you want or is confusing.
dict_01 = {list_01[0][i]:int(list_01[1][i]) for i in range(len(list_01[0]))}
dict_02 = {list_02[0][i]:int(list_02[1][i]) for i in range(len(list_02[0]))}
dates = list(set(list_01[0] + list_02[0]))
dates.sort()
list_03 = [dates, [dict_01.get(date, 0) + dict_02.get(date, 0) for date in dates]]
#Tomerikoo points out a more elegant way to form the dictionaries.
dict_01 = dict(zip(*list_01))
dict_02 = dict(zip(*list_02))
As #HeapOverflow points out if you do this you should change the sums.
list_03 = [dates, [int(dict_01.get(date, 0)) + int(dict_02.get(date, 0)) for date in dates]]
This returns
[['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07'], [10, 20, 40, 20, 70, 40]]
I think this is right, and the 2020-01-05 should be 20 not 30.
The best way to do it would probably be to use defaultdict, which provides a default value to each key, even if you've never introduced that key to the dictionary before. Then all you have to do is add whatever value belongs to that key (which is the date) from both lists. Then when you have this dictionary, just get the key-value pairs as items and unzip it into two lists.
from collections import defaultdict
mydict = defaultdict(int) # default values are 0
list_01 = [['2020-01-02', '2020-01-03', '2020-01-04', '2020-01-06'], ['10', '20', '30', '40']]
list_02 = [['2020-01-04', '2020-01-05', '2020-01-06', '2020-01-07'], ['10', '20', '30', '40']]
for t in [list_01, list_02]:
for key, value in zip(t[0], t[1]):
mydict[key] += int(value)
print(list(zip(*sorted(mydict.items()))))
This prints:
[('2020-01-02', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-06','2020-01-07'),
(10, 20, 40, 20, 70, 40)]

How to read 1st column from csv and separate into multidimensional array

I am trying to separate a column that I read from a .csv file into a multidimensional array. So, if the first column is read into a single array and looks like this:
t = ['90-0066', '24', '33', '34', '91-0495', '22', '33', '92-6676', '23', '32']
How do I write the code in python for every value like '90-0066' the following numbers are put into an array until the next - value? So I would like the array to look like:
t = [['24', '33', '34'], ['22', '33'], ['23', '32']]
Thanks!
You can use itertools.groupby in a list comprehension:
from itertools import groupby
t = [list(g) for k, g in groupby(t, key=str.isdigit) if k]
t becomes:
[['24', '33', '34'], ['22', '33'], ['23', '32']]
If the numbers are possibly floating points, you can use regex instead:
import re
t = [list(g) for k, g in groupby(t, key=lambda s: bool(re.match(r'\d+(?:\.\d+)?$', s)) if k]
Or zip longest with two list comprehensions:
>>> from itertools import zip_longest
>>> l=[i for i,v in enumerate(t) if not v.isdigit()]
>>> [t[x+1:y] for x,y in zip_longest(l,l[1:])]
[['24', '33', '34'], ['22', '33'], ['23', '32']]
>>>

Python script to write list of dict with multiple values to multiple files

I have a list of dicts as shown below. I wish to write the dicts to multiple excel or csv files depending on the keys. if the keys are the same they should be in one file.
my_list_of_dicts:
[{'john': ['0', '100']}, {'john': ['4', '101']}, {'john': ['0', '102']}, {'mary': ['2', '100']}, {'mary': ['5', '101']}, {'mary': ['4', '102']}, {'mary': ['1', '103']}, {'sam': ['4', '100']}, {'sam': ['3', '101']}, {'sam': ['12', '102']}, {'paul': ['2', '100']}, {'hay': ['2', '100']}, {'hay': ['1', '102']}, {'mercy': ['4', '101']}]
My code so far:
x = []
i = 0
for ii, line in enumerate(my_list_of_dicts):
with open("out_%s.csv" % i, 'w+') as f:
if line.keys() not in x:
x.append(line.keys())
i += 1
pd.DataFrame.from_dict(data=line, orient='index').to_csv(f, header=False)
else:
pd.DataFrame.from_dict(data=line, orient='index').to_csv(f, header=False)
Result:
I am getting the desired number of files but not the content.
Expectation:
I expect to get files corresponding to each key i.e (john, mary, sam, jay, paul, hay, and mercy) with the below content. Using john as example:
john, 0, 100
john, 4, 101
john, 0, 102
I am not sure how to proceed or if I even need enumerate. Thank you
A better idea is to aggregate your data into a single dataframe and then iterate a groupby object:
# construct dataframe from list of dictionaries
df = pd.DataFrame([[k, *v] for dct in L for k, v in dct.items()])
df[[1, 2]] = df[[1, 2]].apply(pd.to_numeric)
# iterate groupby object and export to separate CSV files
for key, df_key in df.groupby(0):
df_key.to_csv(f'{key}.csv', index=False, header=False)

create a list of dictionaries from two lists of tuples

I have a set of tuples:
users = set(("test#a.com","password"),("test#b.com","password"))
but could be simplified to a set...and a list of tuples:
licences = [("test#a.com","22"),("test#a.com","23"),("test#b.com","12")]
For every entry of the list the username could be repeated with different "licence" values.
I need to build a list of dictionaries like this:
[{"user":"test#a.com", "licences":["22","23"]},{"user":"test#b.com", "licences":["12"]}]
What I've done so far is this:
licenzadiz = []
for num,user in enumerate(users):
licenzadiz.append({'user': user[0], 'licences': []})
for num2,licence in enumerate(licences):
if user[0] == licence[0]:
licenzadiz[num]['licences'].append(licence[1])
that is working well. BUT I wonder if there are more elegant solutions to my problem.
You can get fancy with nested default dicts:
from collections import defaultdict
items = [('A','1'),('A','3'),('A','2'),
('B','0'),('B','4'),('B','-1'),
('C','7'),('C','6'),('C','12')]
d = defaultdict(lambda: defaultdict(list))
for use,lic in items:
d[use]['username'] = use #<-- Overwrites each time an already known key is found, but thats ok
d[use]['licence'].append(lic)
#Just for printout
for use in d:
print d[use]
print d[use]['username']
print d[use]['licence']
Output:
defaultdict(<type 'list'>, {'username': 'A', 'licence': ['1', '3', '2']})
A
['1', '3', '2']
defaultdict(<type 'list'>, {'username': 'C', 'licence': ['7', '6', '12']})
C
['7', '6', '12']
defaultdict(<type 'list'>, {'username': 'B', 'licence': ['0', '4', '-1']})
B
['0', '4', '-1']
data = {}
for num2,(email, license) in enumerate(licenze):
data.setdefault(email,[]).append(license)
print data #dictionary of email:[licenses,..]
#or
print data.items() # if you want a list
I guess ... i think

Categories

Resources