I want to declare a set of variables with PuLP which contains all the possible combinations of the following lists:
month = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
manufacturer = ['China', 'Mexico', 'Taiwan']
demand = ['London', 'Paris', 'Milan']
Then, I will have a dictionary (for example) as follow:
'1.China.London', '1.China.Paris',...
I tried with the following code, but I don't know how to store all the combinations.
vlbs = {}
for key in month:
for kay in manufacturer:
for eyk in demand:
vlbs = (str(key)+'.'+str(kay)+'.'+str(eyk))
First, I'm not getting properly the dictionary vlbs. And later on:
variables = {var: pl.LpVariable(var, lowBound = 0) for var in vlbs}
How can I solve it properly??
You can use a tuple as a dictionary key. I think this makes filtering/searching easier than doing a bunch of string concatenating and splitting.
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
manufacturers = ['China', 'Mexico', 'Taiwan']
demands = ['London', 'Paris', 'Milan']
var_dict = {}
for month in months:
for manufacturer in manufacturers:
for demand in demands:
combo = (month, manufacturer, demand)
var_name = '.'.join([str(c) for c in combo])
var_dict[combo] = LpVariable(var_name, lowBound=0)
# add constraint for month 1
model += lpSum([k for k in var_dict if k[0] == 1]) <= 50
Related
I have a dataframe which stores different variables. I'm using OLS linear regression and using all of the variables to predict the 'price' column.
import pandas as pd
import statsmodels.api as sm
data = {'accommodates':[2, 2, 3, 2, 2, 6, 8, 4, 3, 2],
'bedrooms':[1, 2, 1, 1, 3, 4, 2, 2, 2, 3],
'instant_bookable':[1, 0, 1, 1, 1, 1, 0, 0, 0, 1],
'availability_365':[123, 3, 33, 14, 15, 16, 3, 41, 61, 74],
'minimum_nights':[3, 12, 1, 4, 6, 7, 2, 3, 6, 10],
'beds':[2, 2, 3, 4, 1, 5, 6, 2, 3, 2],
'price':[59, 234, 15, 162, 56, 42, 28, 52, 22, 31]}
df = pd.DataFrame(data, columns = ['accommodates', 'bedrooms', 'instant_bookable', 'availability_365',
'minimum_nights', 'beds', 'price'])
I have a for loop which calculates the Adjusted R squared value for each variable:
fit_d = {}
for columns in [x for x in df.columns if x != 'price']:
Y = df['price']
X = df[columns]
X = sm.add_constant(X)
model = sm.OLS(Y,X, missing = 'drop').fit()
fit_d[columns] = model.rsquared
fit_d
How can I modify my code in order to find the combination of variables that give the largest Adjusted R squared value? Ideally the function would find the variable with the largest adj. R squared value first, then using the 1st variable iterate with the remaining variables to get 2 variables that give the highest value, then 3 variables etc. until the value cannot be increased further. I'd like the output to be something like
Best variables: {'accommodates, 'availability', 'bedrooms'}
Here is a "brute force way" to do all possible combinations (from itertools) of different length to find the variables with higher R value. The idea is to do 2 loops, one for the number of variables to try, and one for all the combinations with the number of variables.
from itertools import combinations
# all possible columns for X
cols = [x for x in df.columns if x != 'price']
# define Y as same accross the loops
Y = df['price']
# define result dictionary
fit_d = {}
# loop for any length of combinations
for i in range(1, len(cols)+1):
# loop for any combinations with length i
for comb in combinations(cols, i):
# Define X from the combination
X = df[list(comb)]
X = sm.add_constant(X)
# perform the OLS opertion
model = sm.OLS(Y,X, missing = 'drop').fit()
# save the rsquared in a dictionnary
fit_d[comb] = model.rsquared
# extract the key for the max R value
key_max = max(fit_d, key=fit_d.get)
print(f'Best variables {key_max} for a R-value of {round(fit_d[key_max], 5)}')
# Best variables ('accommodates', 'bedrooms', 'instant_bookable', 'availability_365', 'minimum_nights', 'beds') for a R-value of 0.78506
I've got some files of big data to parse through. Each file has repetitions of certain tags but only one case of others. For example, each file has parents for name and date which only show once in every block of data but have many children like patent citations, non-patent citations, and classification.
So I parse through finding all cases of each three of these children and store them every iteration of parents in each file to individual lists. The problem is that the children are always of different lengths and I want to write them all on one row of a CSV file.
For example for one iteration in a file for my list inputs are like:
Name = [Jon]
Date = [1985]
Patcit = [1, 2, 3]
Npatcit = [4, 5, 6, 7, 8]
Class = [9, 10]
This is my second iteration, incoming lists
Name = [Nikhil]
Date = [1988]
Patcit = [1, 2, 3]
Npatcit = [4, 5, 6, 7]
Class = [9, 10, 11, 12, 13]
This is my third iteration, incoming lists
Name = [Neetha]
Date = [1986]
Patcit = [1, 2]
Npatcit = [4, 5]
Class = [9, 10, 11, 12]
And I want an output written to a CSV file to look like:
Name Date Patcit Npatcit Class
Jon 1985 1,2,3 4,5,6,7,8 9,10
Nikhil 1988 1,2,3 4,5,6,7 9,10,11,12,13
Neetha 1986 1,2 4,5 9,10,11,12
(Repeat next name and date iteration on the next row)
You can convert data to dictionary and append() to existing DataFrame
It will need to convert list [1, 2, 3] (and similar) to string "1,2,3" (etc.)
import pandas as pd
df = pd.DataFrame(columns=['Name', 'Date', 'Patcit', 'Npatcit', 'Class'])
# -------------------------------
Name = ['Jon']
Date = [1985]
Patcit = [1, 2, 3]
Npatcit = [4, 5, 6, 7, 8]
Class = [9, 10]
row = {
'Name': Name[0],
'Date': Date[0],
'Patcit': ','.join(str(x) for x in Patcit),
'Npatcit': ','.join(str(x) for x in Npatcit),
'Class': ','.join(str(x) for x in Class),
}
df = df.append(row, ignore_index=True)
#print(df)
# -------------------------------
Name = ['Nikhil']
Date = [1988]
Patcit = [1, 2, 3]
Npatcit = [4, 5, 6, 7]
Class = [9, 10, 11, 12, 13]
row = {
'Name': Name[0],
'Date': Date[0],
'Patcit': ','.join(str(x) for x in Patcit),
'Npatcit': ','.join(str(x) for x in Npatcit),
'Class': ','.join(str(x) for x in Class),
}
df = df.append(row, ignore_index=True)
print(df)
Result
Name Date Patcit Npatcit Class
0 Jon 1985 1,2,3 4,5,6,7,8 9,10
1 Nikhil 1988 1,2,3 4,5,6,7 9,10,11,12,13
And later you can write to csv using standard separator - comma - or other separator.
df.to_csv('output.csv', sep=';')
Or see other question which describes how to write fixed-width-file
If you want to make a string out of a list you can try this thing:
x = ",".join(patcit)
#the str itself will be the dividor
#x is now 1,2,3
#the type of x is str
Later you can use .split(",") to turn it back to list of strings
I've searched quite a lot but I haven't found any similar question to that one.
I have two lists of dictionaries in following format:
data1 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
data2 = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 6, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
]
desired output:
final_data = [
{'id': 4, 'date_time': datetime.datetime(2020, 4, 3, 12, 34, 40)},
{'id': 7, 'date_time': datetime.datetime(2020, 4, 3, 16, 14, 21)},
]
I want only dictionaries which are in data1 and not in data2.
Until now when I found a match in two for loops I popped the dictionary out of the list but that does not seem like a good approach to me. How can I achieve desired output?
It doesn't have to be time efficient since there will be max tens of dictionaries in each list
Current implementation:
counter_i = 0
for i in range(len(data1)):
counter_j = 0
for j in range(len(data2)):
if data1[i-counter_i]['id'] == data2[j-counter_j]['id'] and data1[i-counter_i]['date_time'] == data2[j-counter_j]['date_time']
data1.pop(i-counter_i)
data2.pop(j-counter_j)
counter_i += 1
counter_j += 1
break
If performance is not an issue, why not:
for d in data2:
try:
data1.remove(d)
except ValueError:
pass
list.remove checks for object equality, not identity, so will work for dicts with equal keys and values. Also, list.remove only removes one occurrence at a time.
schwobaseggl's answer is probably the cleanest solution (just make a copy before removing if you need to keep data1 intact).
But if you want to use a set difference... well dicts are not hashable, because their underlying data could change and lead to issues (same reason why lists or sets are not hashable either).
However, you can get all the dict pairs in a frozenset to represent a dictionary (assuming the dictionary values are hashable -schwobaseggl). And frozensets are hashable, so you can add those to a set a do normal set difference. And reconstruct the dictionaries at the end :D.
I don't actually recommend doing it, but here we go:
final_data = [
dict(s)
for s in set(
frozenset(d.items()) for d in data1
).difference(
frozenset(d.items()) for d in data2
)
]
you can go in either way:
Method 1:
#using filter and lambda function
final_data = filter(lambda i: i not in data2, data1)
final_data = list(final_data)
Method 2:
# using list comprehension to perform task
final_data = [i for i in data1 if i not in data2]
I have 2 Lists named 'speciality' and 'count', which are part of a Dictionary 'P' . I Zip sorted both the 'Lists' on Descending order of 'count' List.
speciality = ['Cardiology' , 'Nephrology', 'ENT', 'Opthalmology' 'Oncology']
count = [2, 7, 9, 9, 1]
count, speciality = zip(*[[x, y] for x, y in sorted(zip(count, speciality), reverse=True)])
P = {'Speciliaty': speciality, 'Count': count}
print(P)
# {'Speciliaty': ('Opthalmology', 'ENT', 'Nephrology', 'Cardiology', 'Oncology'), 'Count': (9, 9, 7, 2, 1)}
Please notice, the elements 'Opthalmology' and 'ENT' has the same count 9.
But, after doing the Zip Sort.
'Opthalmology' appeared before 'ENT' in the Output Tuple. In the Input the order is 'ENT' first then 'Opthalmology'.
Can we make the output like below:
P = {'Speciliaty': ('ENT', 'Opthalmology', 'Nephrology', 'Cardiology', 'Oncology'), 'Count': (9, 9, 7, 2, 1)}
You need to set the key in sorted to sort by count.
Ex:
speciality = ['Cardiology' , 'Nephrology', 'ENT', 'Opthalmology', 'Oncology']
count = [2, 7, 9, 9, 1]
count, speciality = zip(*[[x, y] for x, y in sorted(zip(count, speciality), key=lambda x: x[0], reverse=True)])
P = {'Speciliaty': speciality, 'Count': count}
print(P)
Output:
{'Count': (9, 9, 7, 2, 1), 'Speciliaty': ('ENT', 'Opthalmology', 'Nephrology', 'Cardiology', 'Oncology')}
Let's say we have a dictionary
dict = { 'Dollar': 12, 'Half-Coin': 4, 'Quarter': 3, 'Dime': 7 }
How would I go about printing the code so it looks like:
Dollar 12, Half-Coin 4, Quarter 3, Dime 7
Use ','.join(), passing in a generator of strings.
d = { 'Dollar': 12, 'Half-Coin': 4, 'Quarter': 3, 'Dime': 7 }
print ', '.join('{} {}'.format(k,v) for k,v in d.items())
Result:
Half-Coin 4, Quarter 3, Dollar 12, Dime 7
If you want the results to be in a predictable order, you'll need to sort the items.
order=('Dollar', 'Half-Coin', 'Quarter', 'Dime')
d = { 'Dollar': 12, 'Half-Coin': 4, 'Quarter': 3, 'Dime': 7 }
print ', '.join('{} {}'.format(k,d[k]) for k in sorted(d, key=order.index))
Result:
Dollar 12, Half-Coin 4, Quarter 3, Dime 7
Ps. Don't name your variables with names of builtin types. Your name eclipses the builtin name, so subsequent code won't be able to call dict(), for example.
", ".join([x +" "+str(dict[x]) for x in dict.keys()])
dict = { 'Dollar': 12, 'Half-Coin': 4, 'Quarter': 3, 'Dime': 7 }
out=""
for i in dict:
out += i+" "+str(dict[i])+", "
print out[:-2]
result:
Half-Coin 4, Quarter 3, Dollar 12, Dime 7