Updating a dictionary with values and predefined keys - python

I want to create a dictionary that has predefined keys, like this:
dict = {'state':'', 'county': ''}
and read through and get values from a spreadsheet, like this:
for row in range(rowNum):
for col in range(colNum):
and update the values for the keys 'state' (sheet.cell_value(row, 1)) and 'county' (sheet.cell_value(row, 1)) like this:
dict[{}]
I am confused on how to get the state value with the state key and the county value with the county key. Any suggestions?
Desired outcome would look like this:
>>>print dict
[
{'state':'NC', 'county': 'Nash County'},
{'state':'VA', 'county': 'Albemarle County'},
{'state':'GA', 'county': 'Cook County'},....
]

I made a few assumptions regarding your question. You mentioned in the comments that State is at index 1 and County is at index 3; what is at index 2? I assumed that they occur sequentially. In addition to that, there needs to be a way in which you can map the headings to the data columns, hence I used a list to do that as it maintains order.
# A list containing the headings that you are interested in the order in which you expect them in your spreadsheet
list_of_headings = ['state', 'county']
# Simulating your spreadsheet
spreadsheet = [['NC', 'Nash County'], ['VA', 'Albemarle County'], ['GA', 'Cook County']]
list_of_dictionaries = []
for i in range(len(spreadsheet)):
dictionary = {}
for j in range(len(spreadsheet[i])):
dictionary[list_of_headings[j]] = spreadsheet[i][j]
list_of_dictionaries.append(dictionary)
print(list_of_dictionaries)

Raqib's answer is partially correct but had to be modified for use with an actual spreadsheet with row and columns and the xlrd mod. What I did was first use xlrd methods to grab the cell values, that I wanted and put them into a list (similar to the spreadsheet variable raqib has shown above). Not that the parameters sI and cI are the column index values I picked out in a previous step. sI=StateIndex and cI=CountyIndex
list =[]
for row in range(rowNum):
for col in range(colNum):
list.append([str(sheet.cell_value(row, sI)), str(sheet.cell_value(row, cI))])
Now that I have a list of the states and counties, I can apply raqib's solution:
list_of_headings = ['state', 'county']
fipsDic = []
print len(list)
for i in range(len(list)):
temp = {}
for j in range(len(list[i])):
tempDic[list_of_headings[j]] = list[i][j]
fipsDic.append(temp)
The result is a nice dictionary list that looks like this:
[{'county': 'Minnehaha County', 'state': 'SD'}, {'county': 'Minnehaha County', 'state': 'SD', ...}]

Related

How do you append a row of data to a list

For example...
The following are stored in a dictionary variable called ID_DATA
[0,John,male,$2400]
[1,mary,female,$2700]
[2,janie,female,$6790]
[3,adex,male,$3300]
[4,julie,female,$5400]
I want to loop through the list and append only the rows with only male in another list variable called ID_MALE
Can anyone help out ???.
Are they stored in a dictionary of lists or a list of lists?
If it's the latter and the gender is always in the 2nd spot in each list (the second column) then you could just do something like this:
ID_DATA = [['John','male','$2400'],['mary','female','$2700'],['janie','female','$6790'],['adex','male','$3300'],['julie','female','$5400']]
ID_MALE = []
for row in ID_DATA:
if row[2] == 'male':
ID_MALE.append(row)
print(ID_MALE)
Which prints
[['John', 'male', '$2400'], ['adex', 'male', '$3300']]
i think you can use list tuples first for saving data
then use lambda and filter
data_Id= [
[0,'John','male',2400],
[1,'mary','female',2700],
[2,'janie','female',6790],
[3,'adex','male',3300],
[4,'julie','female',5400]
]
str='male'
ID_MALE =filter(lambda x:str in x,data_Id)
print(list(ID_MALE))
Output
[[0, 'John', 'male', 2400], [3, 'adex', 'male', 3300]]

Sort the values of lists corresponding to a row of excel numbers in a dictionary in descending order using openpyxl

I currently have a task to sort the values of a list corresponding to a row of excel numbers in descending order and store them in a dictionary.
For example I have an excel file like this:
0.296178 0.434362 0.033033 0.758968
0.559323 0.455792 0.770423 0.770423
I have created a dictionary using the code below to store the value of each cell as a list, with the key for each row being the row number.
from openpyxl import load_workbook
import operator
vWB = load_workbook(filename="voting.xlsx")
vSheet = vWB.active
# Creating the Dictionary of agents and their alternative values
dictionary = {
i + 1: [cell.value for cell in row[0:]]
for i, row in enumerate(vSheet.rows)
}
output:
{1: [0.296177589742431, 0.434362232763003, 0.0330331941352554, 0.758968208500514], 2: [0.559322784244604, 0.455791535747786, 0.770423104229537, 0.770423104229537]....etc }
However, I'm unsure how to order the values of each list in descending order without changing the order of the keys too.
What I want:
{1: [0.758968208500514, 0.434362232763003, 0.296177589742431, 0.0330331941352554, 0.758968208500514], 2: [0.770423104229537, 0.770423104229537, 0.559322784244604, 0.455791535747786]....etc }
How do I sort the list of values for each key in descending order?
you can try like this
d = {1: [0.758968208500514, 0.434362232763003, 0.296177589742431, 0.0330331941352554, 0.758968208500514],
2: [0.770423104229537, 0.770423104229537, 0.559322784244604, 0.455791535747786]}
for key, value in d.items():
print(sorted(value,reverse=True))
output
[0.758968208500514, 0.758968208500514, 0.434362232763003, 0.296177589742431, 0.0330331941352554] [0.770423104229537, 0.770423104229537, 0.559322784244604, 0.455791535747786]

get a value of a key associated with other key's values in dictionaries Python

I have two lists of dictionaries and json list and I need to grab a value of a specific key based on the value of a key from another dictionary. My data looks like this:
opps = [{'Product2Id': '100','Price': '1645'}, {'Product2Id': '101','Price': '5478'}]
products = [{'Id': '100', 'Name': 'Insertion'}, [{'Id': '101', 'Name': 'Print'}]
sales_json = {'Insertion': {'name': 'BAZ', 'id': '95'}, 'Print': {'name': 'BIC', 'id': '105'}
I need to loop through opps and assign a value to a new variable from sales_json. But for a specific Id that are stored in products and in opps
I tried the following:
for index, my_dict in enumerate(opps):
new_name = sales_json[products[my_dict["Product2Id"]]["Name"]]["name"]
Gives me an error.
The desired output is:
print(new_name)
BAZ,
BIC
You are trying to use the list products as a dictionary. Instead, you should first build a product number to name dictionary from it:
prod_num_to_name = {d['Id']: d['Name'] for d in products}
Then, you can run the loop you wanted, modified like this:
for index, my_dict in enumerate(opps):
new_name = sales_json[prod_num_to_name[my_dict["Product2Id"]]]["name"]
print new_name
To return a list of names that match the criteria, using a List Comprehension:
names = [ sales_json[product['Name']]['name'] for opp in opps for product in products if product['Id'] == opp['Product2Id']]
print (names)
Prints the list of names:
['BAZ', 'BIC']

Add keys from dicts (in column) to new column

I have a DataFrame with a 'budgetYearMap' column, which has 1-3 key-value pairs for each record. I'm a bit stuck as to how I'm supposed to make a new column containing only the keys of the "budgetYearMap" column.
Sample data below:
df_sample = pd.DataFrame({'identifier': ['BBI-2016-D02', 'BBI-2016-D03', 'BBI-2016-D04', 'BBI-2016-D05', 'BBI-2016-D06'],
'callIdentifier': ['H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016'],
'budgetYearMap': [{'0': 188650000}, {'2017': 188650000}, {'2015': 188650000}, {'2014': 188650000}, {'2020': 188650000, '2014': 188650000, '2012': 188650000}]
})
First I tried to extract the keys by position, then make a list out of them and add the list to the dataframe. As some records contained multiple keys (I then found out), this approach failed.
all_keys = [i for s in [list(d.keys()) for d in df_sample.budgetYearMap] for i in s]
df_TD_selected['budgetYear'] = all_keys
My problem is that extracting the keys by "name" wouldn't work either, given that the names of the keys are variable, and I do not know the set of years in advance. The data set will keep growing. It can be either 0 or a year within the 2000 range now, but in the future more years will be added.
My desired output would be:
df_output = pd.DataFrame({'identifier': ['BBI-2016-D02', 'BBI-2016-D03', 'BBI-2016-D04', 'BBI-2016-D05', 'BBI-2016-D06'],
'callIdentifier': ['H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016', 'H2020-BBI-JTI-2016'],
'Year': ['0', '2017', '2015', '2014', '2020, 2014, 2012']
})
Any idea how I should approach this?
Perfect pipeline use-case.
df = (
df_sample
.assign(Year = df_sample['budgetYearMap'].apply(lambda s: list(s.keys())))
.drop(columns = ['budgetYearMap'])
)
.assign creates a new column which takes the 'budgetYearMap' Series and applies the lambda function to it. This returns the dictionary's keys in a list. If you prefer a string (as in your desired output), simply replace the lambda function with
lambda s: ', '.join(list(s.keys()))

convert list to dataframe using dictionary

I am new to Pythonland and I have a question. I have a list as below and want to convert it into a dataframe.
I read on Stackoverflow that it is better to create a dictionary then a list so I create one as follows.
column_names = ["name", "height" , "weight", "grade"] # Actual list has 10 entries
row_names = ["jack", "mick", "nick","pick"]
data = ['100','50','A','107','62','B'] # The actual list has 1640 entries
dic = {key:[] for key in column_names}
dic['name'] = row_names
t = 0
while t< len(data):
dic['height'].append(data[t])
t = t+3
t = 1
while t< len(data):
dic['weight'].append(data[t])
t = t+3
So on and so forth, I have 10 columns so I wrote above code 10 times to complete the full dictionary. Then i convert
it to dataframe. It works perfectly fine, there has to
be a way to do this in shorter way. I don't know how to refer to key of a dictionary with a number. Should it be wrapped to a function. Also, how can I automate adding one to value of t before executing the next loop? Please help me.
You can iterate through columnn_names like this:
dic = {key:[] for key in column_names}
dic['name'] = row_names
for t, column_name in enumerate(column_names):
i = t
while i< len(data):
dic[column_name].append(data[i])
i += 3
Enumerate will automatically iterate through t form 0 to len(column_names)-1
i = 0
while True:
try:
for j in column_names:
d[j].append(data[i])
i += 1
except Exception as er: #So when i value exceed by data list it comes to exception and it will break the loop as well
print(er, "################")
break
The first issue that you have all columns data concatenated to a single list. You should first investigate how to prevent it and have list of lists with each column values in a separate list like [['100', '107'], ['50', '62'], ['A', 'B']]. Any way you need this data structure to proceed efficiently:
cl_count = len(column_names)
d_count = len(data)
spl_data = [[data[j] for j in range(i, d_count, cl_count)] for i in range(cl_count)]
Then you should use dict comprehension. This is a 3.x Python feature so it will not work in Py 2.x.
df = pd.DataFrame({j: spl_data[i] for i, j in enumerate(column_names)})
First, we should understand how an ideal dictionary for a dataframe should look like.
A Dataframe can be thought of in two different ways:
One is a traditional collection of rows..
'row 0': ['jack', 100, 50, 'A'],
'row 1': ['mick', 107, 62, 'B']
However, there is a second representation that is more useful, though perhaps not as intuitive at first.
A collection of columns:
'name': ['jack', 'mick'],
'height': ['100', '107'],
'weight': ['50', '62'],
'grade': ['A', 'B']
Now, here is the key thing to realise, the 2nd representation is more useful
because that is the representation interally supported and used in dataframes.
It does not run into conflict of datatype within a single grouping (each column needs to have 1 fixed datatype)
Across a row representation however, datatypes can vary.
Also, operations can be performed easily and consistently on an entire column
because of this consistency that cant be guaranteed in a row.
So, tl;dr DataFrames are essentially collections of equal length columns.
So, a dictionary in that representation can be easily converted into a DataFrame.
column_names = ["name", "height" , "weight", "grade"] # Actual list has 10 entries
row_names = ["jack", "mick"]
data = [100, 50,'A', 107, 62,'B'] # The actual list has 1640 entries
So, With that in mind, the first thing to realize is that, in its current format, data is a very poor representation.
It is a collection of rows merged into a single list.
The first thing to do, if you're the one in control of how data is formed, is to not prepare it this way.
The goal is a list for each column, and ideally, prepare the list in that format.
Now, however, if it is given in this format, you need to iterate and collect the values accordingly. Here's a way to do it
column_names = ["name", "height" , "weight", "grade"] # Actual list has 10 entries
row_names = ["jack", "mick"]
data = [100, 50,'A', 107, 62,'B'] # The actual list has 1640 entries
dic = {key:[] for key in column_names}
dic['name'] = row_names
print(dic)
Output so far:
{'height': [],
'weight': [],
'grade': [],
'name': ['jack', 'mick']} #so, now, names are a column representation with all correct values.
remaining_cols = column_names[1:]
#Explanations for the following part given at the end
data_it = iter(data)
for row in zip(*([data_it] * len(remaining_cols))):
for i, val in enumerate(row):
dic[remaining_cols[i]].append(val)
print(dic)
Output:
{'name': ['jack', 'mick'],
'height': [100, 107],
'weight': [50, 62],
'grade': ['A', 'B']}
And we are done with the representation
Finally:
import pd
df = pd.DataFrame(dic, columns = column_names)
print(df)
name height weight grade
0 jack 100 50 A
1 mick 107 62 B
Edit:
Some explanation for the zip part:
zip takes any iterables and allows us through iterate through them together.
data_it = iter(data) #prepares an iterator.
[data_it] * len(remaining_cols) #creates references to the same iterator
Here, this is similar to [data_it, data_it, data_it]
The * in *[data_it, data_it, data_it] allows us to unpack the list into 3 arguments for the zip function instead
so, f(*[data_it, data_it, data_it]) is equivalent to f(data_it, data_it, data_it) for any function f.
the magic here is that traversing through an iterator/advancing an iterator will now reflect the change across all references
Putting it all together:
zip(*([data_it] * len(remaining_cols))) will actually allow us to take 3 items from data at a time, and assign it to row
So, row = (100, 50, 'A') in first iteration of zip
for i, val in enumerate(row): #just iterate through the row, keeping index too using enumerate
dic[remaining_cols[i]].append(val) #use indexes to access the correct list in the dictionary
Hope that helps.
If you are using Python 3.x, as suggested by l159, you can use a comprehension dict and then create a Pandas DataFrame out of it, using the names as row indexes:
data = ['100', '50', 'A', '107', '62', 'B', '103', '64', 'C', '105', '78', 'D']
column_names = ["height", "weight", "grade"]
row_names = ["jack", "mick", "nick", "pick"]
df = pd.DataFrame.from_dict(
{
row_label: {
column_label: data[i * len(column_names) + j]
for j, column_label in enumerate(column_names)
} for i, row_label in enumerate(row_names)
},
orient='index'
)
Actually, the intermediate dictionary is a nested dictionary: the keys of the outer dictionary are the row labels (in this case the items of the row_names list); the value associated with each key is a dictionary whose keys are the column labels (i.e., the items in column_names) and values are the correspondent elements in the data list.
The function from_dict is used to create the DataFrame instance.
So, the previous code produces the following result:
height weight grade
jack 100 50 A
mick 107 62 B
nick 103 64 C
pick 105 78 D

Categories

Resources