I would need help saving the values my for-loop iterates over.
With this script I create my csv the way I need it, but there is no information what value w and c has in each row. How can I add this information in two more columns?
import pandas as pd
df = pd.read_csv(...)
country_list = df.Country.unique()
wave_list = df.Wave.unique()
dn = pd.DataFrame()
for w in wave_list:
print ("Wave is: " + str(w))
wave_select =df[df["Wave"] == w] # Select Rows for Waves
for c in country_list:
print ("Country is: " + str(c))
country_select = df[df["Country"] == c] # Select Rows for Countries
out = country_select["Sea"].value_counts(normalize=True)*100 # Calculate Percentage
print (out)
dn = dn.append(out)
dn.to_csv (...)
I would be very grateful for help.
Before loop:
dn = pd.DataFrame(columns=['wave','country','out'])
Inside inner loop instead of dn = dn.append(out):
dn = dn.append({'wave':w,'country':c,'out':out}, ignore_index=True)
Related
I have started using pyomo to solve optimization problems. I have a bit of an issue regarding accessing the variables, which use two indices. I can easily print the solution, but I want to store the index depending variable values within a pd.DataFrame to further analyze the result. I have written following code, but it needs forever to store the variables. Is there a faster way?
df_results = pd.DataFrame()
df_variables = pd.DataFrame()
results.write()
instance.solutions.load_from(results)
for v in instance.component_objects(Var, active=True):
print ("Variable",v)
varobject = getattr(instance, str(v))
frequency = np.empty([len(price_dict)])
for index in varobject:
exist = False
two = False
if index is not None:
if type(index) is int:
#For time index t (0:8760 hours of year)
exists = True #does a index exist
frequency[index] = float(varobject[index].value)
else:
#For components (names)
if type(index) is str:
print(index)
print(varobject[index].value)
else:
#for all index with two indices
two = True #is index of two indices
if index[1] in df_variables.columns:
df_variables[index[0], str(index[1]) + '_' + str(v)] = varobject[index].value
else:
df_variables[index[1]] = np.nan
df_variables[index[0], str(index[1]) + '_' + str(v)] = varobject[index].value
else:
# If no index exist, simple print the variable value
print(varobject.value)
if not(exists):
if not(two):
df_variable = pd.Series(frequency, name=str(v))
df_results = pd.concat([df_results, df_variable], axis=1)
df_variable.drop(df_variable.index, inplace=True)
else:
df_results = pd.concat([df_results, df_variable], axis=1)
df_variable.drop(df_variable.index, inplace=True)
with some more work and less DataFrame, I have solved the issue with following code. Thanks to BlackBear for the comment
df_results = pd.DataFrame()
df_variables = pd.DataFrame()
results.write()
instance.solutions.load_from(results)
for v in instance.component_objects(Var, active=True):
print ("Variable",v)
varobject = getattr(instance, str(v))
frequency = np.empty([20,len(price_dict)])
exist = False
two = False
list_index = []
dict_position = {}
count = 0
for index in varobject:
if index is not None:
if type(index) is int:
#For time index t (0:8760 hours of year)
exist = True #does a index exist
frequency[0,index] = float(varobject[index].value)
else:
#For components (names)
if type(index) is str:
print(index)
print(varobject[index].value)
else:
#for all index with two indices
exist = True
two = True #is index of two indices
if index[1] in list_index:
position = dict_position[index[1]]
frequency[position,index[0]] = varobject[index].value
else:
dict_position[index[1]] = count
list_index.append(index[1])
print(list_index)
frequency[count,index[0]] = varobject[index].value
count += 1
else:
# If no index exist, simple print the variable value
print(varobject.value)
if exist:
if not(two):
frequency = np.transpose(frequency)
df_variable = pd.Series(frequency[:,0], name=str(v))
df_results = pd.concat([df_results, df_variable], axis=1)
df_variable.drop(df_variable.index, inplace=True)
else:
for i in range(count):
df_variable = pd.Series(frequency[i,:], name=str(v)+ '_' + list_index[i])
df_results = pd.concat([df_results, df_variable], axis=1)
df_variable.drop(df_variable.index, inplace=True)
I have an excel pivot table of format:
Names 2/1/2010 3/1/2010 4/1/2010
A 8
B 4 5 7
C 5 3
D 6 6
I need to get the names and date of the cells which are empty. How can I do it?
I want the output as a list: [A:3/1/2010,4/1/2010].
Assuming format is same as above, Check this code snippet, you can use different python module to read excel sheet
import xlrd
def get_list_vals() :
res = []
path="C:/File_PATH.xlsx"
wb=xlrd.open_workbook(path)
sheet=wb.sheet_by_index(0)
# Get rows from 2nd line
for row in range(1, sheet.nrows) :
temp = []
for column in range (sheet.ncols) :
val = sheet.cell_value(row,column)
# get first column values like(A, B, C)
if column == 0:
temp.append(val)
continue
# if not first column, get the date data from row = 1
elif val=="" :
date_val = sheet.cell_value(0,column)
temp.append(date_val)
res.append(temp)
return res
If you want specific format like [A : date1, date2] for thhis instead of temp = [] , you can append to string value
temp = [] -->> temp = ""
temp.append(val) --> temp += str(val) + ":"
temp.append(date_val) -->> temp + str(val) + ","
Beginner here:
I'm trying to sort a list of nicknames to the corresponding countries in the same line.
They come in this format:
FODORGBR + HU-Szombathely-2
ZSOLDPTE + HU-Debrecen-3
THAUSKTR + DE-Herzogenaurach-1
WRIGHNIL + UK-SuttonColdfield-2
KUROTADR + SK-KysuckeNoveMesto-1
KLERNMTT + DE-Herzogenaurach-1
BIRKNJHA + DE-Erlangen-111
CANECVAD + SK-KysuckeNoveMesto-1
MALDESND + DE-Herzogenaurach-1
I want to sort it by the country initials (so HU, DE etc.) with a caption.
So something like:
DE:
THAUSKTR
KLERNMTT
BIRKNJHA
MALDESND
HU:
FODORGBR
ZSOLDPTE
This is what I came up with do define the countries but I can't figure out how to sort all lines with it.
fw = open("NameList.txt")
for line_fw in fw:
if not line_fw.strip():
continue
cross = line_fw.find("+")
country = line_fw[cross+2:cross+4]
First split the list on " " and use operator.itemgetter to iterate over last element of list.
Or replace -1 in itemgetter by 3 if country code is always 3rd element in the list.
from operator import itemgetter
x = ["FODORGBR + HU-Szombathely-2","ZSOLDPTE + HU-Debrecen-3","THAUSKTR + DE-Herzogenaurach-1",
"WRIGHNIL + UK-SuttonColdfield-2","KUROTADR + SK-KysuckeNoveMesto-1","KLERNMTT + DE-Herzogenaurach-1",
"BIRKNJHA + DE-Erlangen-111","CANECVAD + SK-KysuckeNoveMesto-1","MALDESND + DE-Herzogenaurach-1"]
new_list = [i.split() for i in x]
new_list.sort(key=itemgetter(-1))
print([" ".join(i) for i in new_list])
Output:
['BIRKNJHA + DE-Erlangen-111', 'THAUSKTR + DE-Herzogenaurach-1', 'KLERNMTT + DE-Herzogenaurach-1', 'MALDESND + DE-Herzogenaurach-1', '
ZSOLDPTE + HU-Debrecen-3', 'FODORGBR + HU-Szombathely-2', 'KUROTADR + SK-KysuckeNoveMesto-1', 'CANECVAD + SK-KysuckeNoveMesto-1', 'WRI
GHNIL + UK-SuttonColdfield-2']
Using re.search and collections.defaultdict:
import re
from collections import defaultdict
d = defaultdict(list)
with open('NameList.txt') as fw:
for line in fw:
code = re.search(' (\w{2})-', line).group(1)
nick = re.search('(\w{8}) +', line).group(1)
d[code].append(nick)
Output:
defaultdict(list,
{'DE': ['THAUSKTR', 'KLERNMTT', 'BIRKNJHA', 'MALDESND'],
'HU': ['FODORGBR', 'ZSOLDPTE'],
'SK': ['KUROTADR', 'CANECVAD'],
'UK': ['WRIGHNIL']})
Your code for finding the country names looks just fine. One Piece of advice when working with files: use the with- statement instead of open and close. When using open, and an error occurs sometime before close is called, it's possible that the file is not properly closed, which can mess up all kinds of things. with closes the file no matter what happens inside the corresponding code block (It works similar to try - finally, see the above link for more info). So, like this:
with open('NameList.txt', 'r') as fw:
for line_fw in fw:
...
it is ensured that the file will always close down. By the way, instead of using line.find('+'), you can just use line.split('+'), which takes away the whole string slicing part.
Now, to your question: there are a few possibilities to use here. The simplest would be defining a list for every country, and appending the corresponding names to the right list:
de = []
hu = []
uk = []
sk = []
with open('NameList.txt', 'r') as fw:
for line_fw in fw:
if not line_fw.strip():
continue
country = line_fw.split('+')[1].split('-')[0].strip()
nickname = line_fw.split('+')[0]
if country == 'DE':
de.append(nickname)
elif country == 'HU':
hu.append(nickname)
elif country == 'UK':
uk.append(nickname)
else:
sk.append(nickname)
this will return a list for every country, containing the corresponding nicknames. As you see, however, this is very clunky and long. A more elegant solution is using a dictionary with the countries as keys and a list of the names as values:
d = {}
with open('NameList.txt', 'r') as fw:
for line_fw in fw:
if not line_fw.strip():
continue
country = line_fw.split('+')[1].split('-')[0].strip()
nickname = line_fw.split('+')[0].strip()
try:
d[country].append(nickname) # if country already exists in d, append the nickname
except KeyError:
d[country] = [nickname] # if country doesn't exist in d, make a new entry
which will create a dictionary looking like this (i just took the first few lines to illustrate it):
{'HU': ['FODORGBR', 'ZSOLDPTE'], 'DE': ['THAUSKTR'], 'UK': ['WRIGHNIL']}
Now, there are more elegant solutions for extracting the countries and nicknames, but some of those have been pointed out in other answers.
Finally, if i got that right, you want to write your results to a new file, or at least print them. Let's say you have a dictionary of the above form. Simply iterate over it's keys via for k in d:, add some newlines ('\n') inbetween and use join to convert the listsinto one string with newlines between all items:
for k in d:
print(k + ':\n' + '\n'.join(d[k]) + '\n')
which will print:
HU:
FODORGBR
ZSOLDPTE
DE:
THAUSKTR
UK:
WRIGHNIL
by adding with open(outputfile, 'w') as f: and replacing print with f.write, you can easily write this to a new file.
Here below is the Snippet which would help you :
sample = '''
FODORGBR + HU-Szombathely-2
ZSOLDPTE + HU-Debrecen-3
THAUSKTR + DE-Herzogenaurach-1
WRIGHNIL + UK-SuttonColdfield-2
KUROTADR + SK-KysuckeNoveMesto-1
KLERNMTT + DE-Herzogenaurach-1
BIRKNJHA + DE-Erlangen-111
CANECVAD + SK-KysuckeNoveMesto-1
MALDESND + DE-Herzogenaurach-1
'''
def find_between( s, first, last ):
try:
start = s.index( first ) + len( first )
end = s.index( last, start )
return s[start:end]
except ValueError:
return ""
data = sample.splitlines()
elements = {}
for indv in data:
code = find_between(indv,"+","-").strip()
value = find_between(indv,"","+").strip()
if code != '' and code in elements:
values = []
values.append(value)
values.extend(elements[code])
values = list(filter(None, values))
values.sort()
elements[code] = values
elif code != '':
values = []
values.append(value)
elements[code] = values
print(elements)
Output :
{'HU': ['FODORGBR', 'ZSOLDPTE'], 'DE': ['BIRKNJHA', 'KLERNMTT', 'MALDESND', 'THAUSKTR'], 'UK': ['WRIGHNIL'], 'SK': ['CANECVAD', 'KUROTADR']}
I have coded the following for loop. The main idea is that in each occurrence of 'D' in the column 'A_D', it looks for all the possible cases where some specific conditions should happen. When all the conditions are verified, a value is added to a list.
a = []
for i in df.index:
if df['A_D'][i] == 'D':
if df['TROUND_ID'][i] == ' ':
vb = df[(df['O_D'] == df['O_D'][i])
& (df['A_D'] == 'A' )
& (df['Terminal'] == df['Terminal'][i])
& (df['Operator'] == df['Operator'][i])]
number = df['number_ac'][i]
try: ## if all the conditions above are verified a value is added to a list
x = df.START[i] - pd.Timedelta(int(number), unit='m')
value = vb.loc[(vb.START-x).abs().idxmin()].FlightID
except: ## if are not verified, several strings are added to the list
value = 'No_link_found'
else:
value = 'Has_link'
else:
value = 'IsArrival'
a.append(value)
My main problem is that df has millions of rows, therefore this for loop is way too time consuming. Is there any vectorized solution where I do not need to use a for loop?
An initial set of improvements: use apply rather than a loop; create a second dataframe at the start of the rows where df["A_D"] == "A"; and vectorise the value x.
arr = df[df["A_D"] == "A"]
# if the next line is slow, apply it only to those rows where x is needed
df["x"] = df.START - pd.Timedelta(int(df["number_ac"]), unit='m')
def link_func(row):
if row["A_D"] != "D":
return "IsArrival"
if row["TROUND_ID"] != " ":
return "Has_link"
vb = arr[arr["O_D"] == row["O_D"]
& arr["Terminal"] == row["Terminal"]
& arr["Operator"] == row["Operator"]]
try:
return vb.loc[(vb.START - row["x"]).abs().idxmin()].FlightID
except:
return "No_link_found"
df["a"] = df.apply(link_func, axis=1)
Using apply is apparently more efficient but does not automatically vectorise the calculation. But finding a value in arr based on each row of df is inherently time consuming, however efficiently it is implemented. Consider whether the two parts of the original dataframe (where df["A_D"] == "A" and df["A_D"] == "D", respectively) can be reshaped into a wide format somehow.
EDIT: You might be able to speed up the querying of arr by storing query strings in df, like this:
df["query_string"] = ('O_D == "' + df["O_D"]
+ '" & Terminal == "' + df["Terminal"]
+ '" & Operator == "' + df["Operator"] + '"')
def link_func(row):
vb = arr.query(row["query_string"])
try:
row["a"] = vb.loc[(vb.START - row["x"]).abs().idxmin()].FlightID
except:
row["a"] = "No_link_found"
df.query('(A_D == "D") & (TROUND_ID == " ")').apply(link_func, axis=1)
How can I make the cell number increase by one every time it loops through all of the sheets? I got it to loop through the different sheets itself but I'm not sure how to add +1 to the cell value.
for sheet in sheetlist:
wsX = wb.get_sheet_by_name('{}'.format(sheet))
ws2['D4'] = wsX['P6'].value
I'm trying to get just the ['D4'] to change to D5,D6,D7.. etc up to 25 automatically.
No need for counters or clumsy string conversion: openpyxl provides an API for programmatic access.
for idx, sheet in enumerate(sheetlist, start=4):
wsX = wb[sheet]
cell = ws2.cell(row=idx, column=16)
cell.value = wsX['P6']
for i, sheet in enumerate(sheetlist):
wsX = wb.get_sheet_by_name('{}'.format(sheet))
cell_no = 'D' + str(i + 4)
ws2[cell_no] = wsX['P6'].value
write this outside of the loop :
x = 'D4'
write this in the loop :
x = x[0] + str(int(x[1:])+1)
Try this one... it's commented so you can understand what it's doing.
#counter
i = 4
for sheet in sheetlist:
#looping from D4 to D25
while i <= 25:
wsX = wb.get_sheet_by_name('{}'.format(sheet))
#dynamic way to get the cell
cell1 = 'D' + str(i)
ws2[cell1] = wsX['P6'].value
#incrementing counter
i += 1