Adding Data to column based on content of other cells (Python/Excel)

Adding Data to column based on content of other cells (Python/Excel) - python

everybody,
I'm trying to automate the allocation of inventory. Since I am not an experienced programmer, I have difficulties in creating the logic.
The goal is to combine two Excel files and add a column containing the responsible persons/departments. What I have managed so far is to combine the Excel files and add the column "Reviser". Now this column must be filled with the right persons/departments.
The logic behind this is not very difficult, but I don't really know how to do this with Python/Pandas etc.
I already tried to fix this problem with np.where but that doesn´t fix the problem completely.
Here you can see the logic behind the assignment of the revisor:
[Logic behind assignment][1]
Thanks for your help!
My current code:
import pandas as pd
import numpy as np
from openpyxl import Workbook
Q_Stock = pd.read_excel("C:\\Users\\Lucas\\Desktop\\Excel_Test\\Q Bestand.xlsx",usecols=["Bestandsqualifikation", "Inhalt", "Benutzerfeld 1", "Benutzerfeld 2","Material", "Externer Barcode 2", "Handling Unit"])
""" Q_Bestand["Bearbeiter"] = "" """
Q_Stocknew = Q_Stock[0:-1]
S_Stock = pd.read_excel("C:\\Users\\Lucas\\Desktop\\Excel_Test\\S Bestand.xlsx",usecols=["Bestandsqualifikation", "Inhalt", "Benutzerfeld 1", "Benutzerfeld 2","Material", "Externer Barcode 2", "Handling Unit" ])
""" S_Bestand["Bearbeiter"] = "" """
S_Stocknew = S_Stock[0:-1]
complete_list = [S_Stocknew, Q_Stocknew]
Combined = pd.concat(complete_list)
df = pd.DataFrame
def bar(df):
if Combined['Inhalt'] ==np.nan:
return np.nan
elif str(Combined['Inhalt']).contains("QV"):
return "Distribution"
elif str(Combined['Inhalt']).contains("QP"):
return "Production"
elif (Combined['Benutzerfeld 2'] == "ruckschnitt") and (str(Combined['Material']).contains("^09")):
return "Person 1"
df["Reviser"] = Combined.apply(bar, axis = 1)
Combined.to_excel(r'C:\\Users\Lucas\\Desktop\\Excel_Test\\Test.xlsx', index = True)
Which throws out this error now:
C:\Python\Code\venv\Scripts\python.exe C:/Python/Code/SAP_Automatisieren.py
Traceback (most recent call last):
File "C:/Python/Code/SAP_Automatisieren.py", line 29, in
df["Reviser"] = Combined.apply(bar, axis = 1)
File "C:\Python\Code\venv\lib\site-packages\pandas\core\frame.py", line 6878, in apply
return op.get_result()
File "C:\Python\Code\venv\lib\site-packages\pandas\core\apply.py", line 186, in get_result
return self.apply_standard()
File "C:\Python\Code\venv\lib\site-packages\pandas\core\apply.py", line 295, in apply_standard
result = libreduction.compute_reduction(
File "pandas_libs\reduction.pyx", line 620, in pandas._libs.reduction.compute_reduction
File "pandas_libs\reduction.pyx", line 128, in pandas._libs.reduction.Reducer.get_result
File "C:/Python/Code/SAP_Automatisieren.py", line 19, in bar
if Combined['Inhalt'] ==np.nan:
File "C:\Python\Code\venv\lib\site-packages\pandas\core\generic.py", line 1478, in nonzero
raise ValueError(
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
import pandas as pd
import numpy as np
from openpyxl import Workbook
Q_Stock = pd.read_excel("C:\\Users\\Lucas\\Desktop\\Excel_Test\\Q Bestand.xlsx",usecols=["Bestandsqualifikation", "Inhalt", "Benutzerfeld 1", "Benutzerfeld 2","Material", "Externer Barcode 2", "Handling Unit"])
""" Q_Bestand["Bearbeiter"] = "" """
Q_Stocknew = Q_Stock[0:-1]
S_Stock = pd.read_excel("C:\\Users\\Lucas\\Desktop\\Excel_Test\\S Bestand.xlsx",usecols=["Bestandsqualifikation", "Inhalt", "Benutzerfeld 1", "Benutzerfeld 2","Material", "Externer Barcode 2", "Handling Unit" ])
""" S_Bestand["Bearbeiter"] = "" """
S_Stocknew = S_Stock[0:-1]
complete_list = [S_Stocknew, Q_Stocknew]
Combined = pd.concat(complete_list)
df = pd.DataFrame
def bar(Combined):
if Combined['Inhalt'] ==np.nan:
return np.nan
elif str(Combined['Inhalt']).contains("QV"):
return "Distribution"
elif str(Combined['Inhalt']).contains("QP"):
return "Production"
elif (Combined['Benutzerfeld 2'] == "ruckschnitt") and (str(Combined['Material']).contains("^09")):
return "Person 1"
df["Reviser"] = Combined.apply(bar, axis = 1)
Error:
C:\Python\Code\venv\Scripts\python.exe C:/Python/Code/SAP_Automatisieren.py
Traceback (most recent call last):
File "C:/Python/Code/SAP_Automatisieren.py", line 28, in
df["Reviser"] = Combined.apply(bar, axis = 1)
File "C:\Python\Code\venv\lib\site-packages\pandas\core\frame.py", line 6878, in apply
return op.get_result()
File "C:\Python\Code\venv\lib\site-packages\pandas\core\apply.py", line 186, in get_result
return self.apply_standard()
File "C:\Python\Code\venv\lib\site-packages\pandas\core\apply.py", line 295, in apply_standard
result = libreduction.compute_reduction(
File "pandas_libs\reduction.pyx", line 620, in pandas._libs.reduction.compute_reduction
File "pandas_libs\reduction.pyx", line 128, in pandas._libs.reduction.Reducer.get_result
File "C:/Python/Code/SAP_Automatisieren.py", line 21, in bar
elif str(Combined['Inhalt']).contains("QV"):
AttributeError: 'str' object has no attribute 'contains'

You can write a function like below and apply this to your dataframe. You can also write a nested np.where but below function will be more readable for you coming from Excel world
def bar(df):
if df['Inventory Qualifikation'] ==np.nan:
return np.nan
elif str(df['Inventory Qualifikation']).contains("QV"):
return "Distribution"
elif str(df['Inventory Qualifikation']).contains("QP"):
return "Production"
elif (df['Userfield 2'] == "ruckschnitt") and (str(df['material']).contains("^09")):
return "Person 1"
df['reviser'] = df.apply(bar, axis = 1)

Related

if df confition is met add x value to variable

What I want to try is, if df confition is met add x value to variable
example
local_bid = 0
df.loc[["Entity"] == "Keyword"]
then
local_bid = df["Bid"]
I tried the
df.loc[["Entity"] == "Keyword", local_bid] = df["Bid"]
but it didn't work
Traceback (most recent call last):
File "/home/shaumne/Desktop/zorba/Sp_limpr.py", line 17, in <module>
s =limpr.loc[["Entity"] == "Keyword", local_bid] = limpr["Bid"]
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 818, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1703, in _setitem_with_indexer
key, _ = convert_missing_indexer(idx)
File "/home/shaumne/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 2585, in convert_missing_indexer
raise KeyError("cannot use a single bool to index into setitem")
KeyError: 'cannot use a single bool to index into setitem'

You need compare column df["Entity"] not list ["Entity"]:
local_bid = 0
df.loc[df["Entity"] == "Keyword", local_bid] = df["Bid"]

TypeError: 'Int64Index([], dtype='int64')' is an invalid key

This class should be able to work with a given csv-file and the function is responsible to update the csv if the price of the input is lower. but i get a error-message if I work with self.df instead of defining the self.df to a df variable outside the loop.
The format of the csv-file is "City,IATA Code,Lowest Price" and a example input for the function could be "{'LTN': {'Lowest Price': 11}}"
import pandas
class DataManager:
def __init__(self):
self.csv = pandas.read_csv("prices.csv")
self.df = pandas.DataFrame(self.csv)
def update_price_list(self, new_prices):
data = pandas.DataFrame(new_prices).transpose()
for entry in data.iterrows():
index = self.df.index[self.df["IATA Code"] == "IST"]
index_in_df = self.df.index[self.df["IATA Code"] == entry[0]]
if self.df.at[index_in_df, "Lowest Price"] > float(entry[1]):
self .df = self.df.at[index_in_df, "Lowest Price"] = float(entry[1])
DataManager().update_price_list({'LTN': {'Lowest Price': 11}})
The ERROR-Message outputs "TypeError: 'Int64Index([], dtype='int64')' is an invalid key"
if i define the Dataframe as varible in outside the for loop there is no problem,
but i feel like that is not the right way. How do i avoid this error?
Complete errormessage:
Traceback (most recent call last):
File "C:/Users/Flori/PycharmProjects/Kurs-Python/Python-Course/6. Week/Day 38/Flight_price_finder/wtf.py", line 19, in <module>
DataManager().update_price_list({'LTN': {'Lowest Price': 11}})
File "C:/Users/Flori/PycharmProjects/Kurs-Python/Python-Course/6. Week/Day 38/Flight_price_finder/wtf.py", line 16, in update_price_list
if self.df.at[index_in_df, "Lowest Price"] > float(entry[1]):
File "C:\Users\Flori\PycharmProjects\Kurs-Python\venv\lib\site-packages\pandas\core\indexing.py", line 2275, in __getitem__
return super().__getitem__(key)
File "C:\Users\Flori\PycharmProjects\Kurs-Python\venv\lib\site-packages\pandas\core\indexing.py", line 2222, in __getitem__
return self.obj._get_value(*key, takeable=self._takeable)
File "C:\Users\Flori\PycharmProjects\Kurs-Python\venv\lib\site-packages\pandas\core\frame.py", line 3579, in _get_value
loc = engine.get_loc(index)
File "pandas\_libs\index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 82, in pandas._libs.index.IndexEngine.get_loc
TypeError: 'Int64Index([], dtype='int64')' is an invalid key

Outlook Calendar Export Type Issue

I have the following code meant to extract my outlook calendar and show me a list of all participants in the meetings that I have scheduled.. I am running into the following error related to datatypes. I believe the issue is actually getting the events to pull because when I print the appointments list prior to the error, it shows as blank. Thoughts?
Code:
import datetime as dt
import pandas as pd
import win32com.client
def get_calendar(begin,end):
outlook = win32com.client.Dispatch('Outlook.Application').GetNamespace('MAPI')
calendar = outlook.getDefaultFolder(9).Items
calendar.IncludeRecurrences = True
calendar.Sort('[Start]')
restriction = "[Start] >= '" + begin.strftime('%m/%d/%Y') + "' AND [END] <= '" + end.strftime('%m/%d/%Y') + "'"
calendar = calendar.Restrict(restriction)
return calendar
def get_appointments(calendar,subject_kw = None,exclude_subject_kw = None, body_kw = None):
if subject_kw == None:
appointments = [app for app in calendar]
else:
appointments = [app for app in calendar if subject_kw in app.subject]
if exclude_subject_kw != None:
appointments = [app for app in appointments if exclude_subject_kw not in app.subject]
cal_subject = [app.subject for app in appointments]
cal_start = [app.start for app in appointments]
cal_end = [app.end for app in appointments]
cal_body = [app.body for app in appointments]
df = pd.DataFrame({'subject': cal_subject,
'start': cal_start,
'end': cal_end,
'body': cal_body})
return df
def make_cpd(appointments):
appointments['Date'] = appointments['start']
appointments['Hours'] = (appointments['end'] - appointments['start']).dt.seconds/3600
appointments.rename(columns={'subject':'Meeting Description'}, inplace = True)
appointments.drop(['start','end'], axis = 1, inplace = True)
summary = appointments.groupby('Meeting Description')['Hours'].sum()
return summary
final = r"C:\Users\rcarmody\Desktop\Python\Accelerators\Outlook Output.xlsx"
begin = dt.datetime(2021,1,1)
end = dt.datetime(2021,5,12)
print(begin)
print(end)
cal = get_calendar(begin, end)
appointments = get_appointments(cal, subject_kw = 'weekly', exclude_subject_kw = 'Webcast')
result = make_cpd(appointments)
result.to_excel(final)
Error:
Traceback (most recent call last):
File "C:\Users\Desktop\Python\Accelerators\outlook_meetings.py", line 50, in <module>
result = make_cpd(appointments)
File "C:\Users\Desktop\Python\Accelerators\outlook_meetings.py", line 34, in make_cpd
appointments['Hours'] = (appointments['end'] - appointments['start']).dt.seconds/3600
File "C:\Users\AppData\Roaming\Python\Python39\site-packages\pandas\core\generic.py", line 5461, in __getattr__
return object.__getattribute__(self, name)
File "C:\Users\rcarmody\AppData\Roaming\Python\Python39\site-packages\pandas\core\accessor.py", line 180, in __get__
accessor_obj = self._accessor(obj)
File "C:\Users\AppData\Roaming\Python\Python39\site-packages\pandas\core\indexes\accessors.py", line 494, in __new__
raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
[Finished in 1.2s]
New Error:
Traceback (most recent call last):
File "C:\Users\Desktop\Python\Accelerators\outlook_meetings.py", line 50, in <module>
result = make_cpd(appointments)
File "C:\Users\Desktop\Python\Accelerators\outlook_meetings.py", line 34, in make_cpd
appointments['Hours'] = (appointments['end'] - appointments['start']) / pd.Timedelta(hours=1)
File "C:\Users\\AppData\Roaming\Python\Python39\site-packages\pandas\core\ops\common.py", line 65, in new_method
return method(self, other)
File "C:\Users\AppData\Roaming\Python\Python39\site-packages\pandas\core\arraylike.py", line 113, in __truediv__
return self._arith_method(other, operator.truediv)
File "C:\Users\\AppData\Roaming\Python\Python39\site-packages\pandas\core\series.py", line 4998, in _arith_method
result = ops.arithmetic_op(lvalues, rvalues, op)
File "C:\Users\\AppData\Roaming\Python\Python39\site-packages\pandas\core\ops\array_ops.py", line 185, in arithmetic_op
res_values = op(lvalues, rvalues)
File "pandas\_libs\tslibs\timedeltas.pyx", line 1342, in pandas._libs.tslibs.timedeltas.Timedelta.__rtruediv__
numpy.core._exceptions.UFuncTypeError: ufunc 'true_divide' cannot use operands with types dtype('float64') and dtype('<m8[ns]')

The substraction of two datetime objects results in a timedelta object. In order to retrieve hours from timedelta objects you can use :
import numpy as np
hours = timedelta_object / np.timedelta64(1, "h")
Note: it could also be (more pandas-only style)
hours = timedelta_object / pd.Timedelta(hours=1)
So in your case, you would use it as :
appointments['Hours'] = (appointments['end'] - appointments['start']) / pd.Timedelta(hours=1)

Configure event profile in pyalgotrade to look back further than one bar ( eg bards[-2] )

I'm trying to write various predicates on simple candle stick structures. For example one component of a '3 green candles in a row' predicate would require a look back of -4
To start off simple I try an test it with a 'higher_highs' predicate. If the close of the previous candle is below the current candles close the function returns true. Below is my code:
from pyalgotrade import eventprofiler
from pyalgotrade.barfeed import csvfeed
class single_event_strat( eventprofiler.Predicate ):
def __init__(self,feed):
pass
def higher_highs(self, instrument, bards):
#prev_three = bards[-4]
#prev_two = bards[-3]
prev = bards[-2]
curr = bards[-1]
if prev.getOpen() < curr.getOpen():
return True
return False
def eventOccurred(self, instrument, bards):
if self.higher_highs(instrument, bards):
return True
else:
return False
def main(plot):
feed = csvfeed.GenericBarFeed(0)
feed.addBarsFromCSV('FCT', "FCT_daily_converted.csv")
predicate = single_event_strat(feed)
eventProfiler = eventprofiler.Profiler( predicate, 20, 20)
eventProfiler.run(feed, True)
results = eventProfiler.getResults()
print "%d events found" % (results.getEventCount())
if plot:
eventprofiler.plot(results)
if __name__ == "__main__":
main(True)
However I get an IndexError :
Traceback (most recent call last):
File "C:\Users\David\Desktop\Python\Coursera\Computational Finance\Week2\PyAlgoTrade\Bitfinex\FCT\FCT_single_event_test.py", line 44, in <module>
main(True)
File "C:\Users\David\Desktop\Python\Coursera\Computational Finance\Week2\PyAlgoTrade\Bitfinex\FCT\FCT_single_event_test.py", line 36, in main
eventProfiler.run(feed, True)
File "C:\Python27\lib\site-packages\pyalgotrade\eventprofiler.py", line 215, in run
disp.run()
File "C:\Python27\lib\site-packages\pyalgotrade\dispatcher.py", line 102, in run
eof, eventsDispatched = self.__dispatch()
File "C:\Python27\lib\site-packages\pyalgotrade\dispatcher.py", line 90, in __dispatch
if self.__dispatchSubject(subject, smallestDateTime):
File "C:\Python27\lib\site-packages\pyalgotrade\dispatcher.py", line 68, in __dispatchSubject
ret = subject.dispatch() is True
File "C:\Python27\lib\site-packages\pyalgotrade\feed\__init__.py", line 105, in dispatch
self.__event.emit(dateTime, values)
File "C:\Python27\lib\site-packages\pyalgotrade\observer.py", line 59, in emit
handler(*args, **kwargs)
File "C:\Python27\lib\site-packages\pyalgotrade\eventprofiler.py", line 172, in __onBars
eventOccurred = self.__predicate.eventOccurred(instrument, self.__feed[instrument])
File "C:\Users\David\Desktop\Python\Coursera\Computational Finance\Week2\PyAlgoTrade\Bitfinex\FCT\FCT_single_event_test.py", line 20, in eventOccurred
if self.higher_highs(instrument, bards):
File "C:\Users\David\Desktop\Python\Coursera\Computational Finance\Week2\PyAlgoTrade\Bitfinex\FCT\FCT_single_event_test.py", line 11, in higher_highs
prev = bards[-2]
File "C:\Python27\lib\site-packages\pyalgotrade\dataseries\__init__.py", line 90, in __getitem__
return self.__values[key]
File "C:\Python27\lib\site-packages\pyalgotrade\utils\collections.py", line 141, in __getitem__
return self.__values[key]
IndexError: list index out of range
I'm still trying to figure out how the EP works. It's interesting because in the buyongap example there is a look back period of bards[-2],
def __gappedDown(self, instrument, bards):
ret = False
if self.__stdDev[instrument][-1] is not None:
prevBar = bards[-2]
currBar = bards[-1]
low2OpenRet = (currBar.getOpen(True) - prevBar.getLow(True)) / float(prevBar.getLow(True))
if low2OpenRet < (self.__returns[instrument][-1] - self.__stdDev[instrument][-1]):
ret = True
return ret
however it's nestled in if self.__stdDev[instrument][-1] is not None: statement, my predicate requires no TA indicators, so how could I access the previous bards?

The problem is that on the first call to eventOccurred bards only has one item, so trying to do bards[-2] will fail. Check the length of bards first.

Pandas Reduction Error when trying to update dataframe

I need to update some of the data in my dataframe in the same sense of a update query in SQL. My current code is as follows:
import pandas
df = pandas.read_csv('filee.csv') # load trades from csv file
def updateDataframe(row):
if row['Name'] == "Joe":
return "Black"
else:
return row
df['LastName'] = df.apply(updateDataframe,axis=1)
However, it returns the following error:
Traceback (most recent call last):
File "test.py", line 11, in <module>
df['LastName'] = df.apply(updateDataframe,axis=1)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2038, in __setitem__
self._set_item(key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2085, in _set_item
NDFrame._set_item(self, key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 582, in _set_item
self._data.set(key, value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 1459, in set
_set_item(self.items[loc], value)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 1454, in _set_item
block.set(item, arr)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 176, in set
self.values[loc] = value
ValueError: output operand requires a reduction, but reduction is not enabled
How do I resolve this. Or is there a better way to accomplish what I am trying to do?

#Jeff has a good concise implementation of your problem in the comments above, but if you want to fix the error in your code, try the following:
For the file filee.csv with the following contents:
Name,LastName
Andy,Blue
Joe,Smith
After the else, you need to return a Last Name string rather than a row object, as shown below:
import pandas
df = pandas.read_csv('filee.csv') # load trades from csv file
def updateDataframe(row):
if row['Name'] == "Joe":
return "Black"
else:
return row['LastName']
df['LastName'] = df.apply(updateDataframe,axis=1)
print df
results in the the following output:
Name LastName
0 Andy Blue
1 Joe Black

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding Data to column based on content of other cells (Python/Excel) - python

Related

if df confition is met add x value to variable

TypeError: 'Int64Index([], dtype='int64')' is an invalid key

Outlook Calendar Export Type Issue

Configure event profile in pyalgotrade to look back further than one bar ( eg bards[-2] )

Pandas Reduction Error when trying to update dataframe

Categories

Resources