Pandas DataFrame exists in list

Pandas DataFrame exists in list - python

I tried searching but didnt see anything relevant, or it may have skipped my eyes.
So what I want is pretty specific. I have a list of Pandas DataFrame, and I want to check wheather the dataframe created in current step / workflow already exists in list, if yes then pass or else append to it. Now I tried using following:
if df not in best_dfs:
# process something here
best_dfs.append(df)
else:
pass
This is how you would do to check weather a list contains some object of fixed type. But When I do the same, I recieve following error:
Traceback (most recent call last):
File "C:/Projects/Barclays/Email Analytics/POC - Stop Cheque Classification/03_CodeBase/CodeBase/utils/FindBestDf.py", line 239, in <module>
print(obj.find_(dfs))
File "C:/Projects/Barclays/Email Analytics/POC - Stop Cheque Classification/03_CodeBase/CodeBase/utils/FindBestDf.py", line 19, in find_
r = self.__driver(list_of_df)
File "C:/Projects/Barclays/Email Analytics/POC - Stop Cheque Classification/03_CodeBase/CodeBase/utils/FindBestDf.py", line 201, in __driver
if v[0] not in best_dfs:
File "C:\Users\IBM_ADMIN\Anaconda3\lib\site-packages\pandas\core\ops.py", line 1296, in f
return self._compare_frame(other, func, str_rep)
File "C:\Users\IBM_ADMIN\Anaconda3\lib\site-packages\pandas\core\frame.py", line 3670, in _compare_frame
raise ValueError('Can only compare identically-labeled '
ValueError: Can only compare identically-labeled DataFrame objects
How do I tacke this? Any work around?
Any help will be grately appreciated.
Thanks

Probably not the most efficient way, but this works for pandas:
if not True in [df.equals(x) for x in df_list]:
df_list.append(df)
Pandas has a built-in method to check for df equality called df.equals(). Basically you iterate this through your df_list to create another list of result, then check if any of the result return True (i.e. the same df exist in the list).

Related

populating word template with python .merge() but TypeError: merge() argument after ** must be a mapping, not str

I am trying to create a script that will take in user info and populate word templates with the information.
I keep getting the following error and I don't understand why:
TypeError: merge() argument after ** must be a mapping, not str
My script begins by gathering information from the user and storing it into a dictionary. then the following code is executed:
stress_notes_document = MailMerge(os.path.join(new_path,new_notes))
stress_notes_document.merge(
TR_num = packet_info['TR#'],
pckg_num = packet_info['Package#'],
TED_num = packet_info['TED#'],
Charge_Line = packet_info['Charge Line'],
Change_num = packet_info['Change#'],
Installation_list = packet_list['Installations list'],
Drawings_list = packet_list['Drawings list'],
Designer = packet_info['Designer'],
phone_number_designer = packet_info['Phone Number of designer'],
Date_in = packet_info['Date in'],
Stress_Due_Date = packet_info['Stress Due Date'],
Date_out = packet_info['Date out'],
model = packet_info['model'],
Customer = packet_info['Customer'],
Effectivity = packet_info['Effectivity'],
panel_excel = 'new_panel')
stress_notes_document.write(os.path.join(new_path,new_notes + "ver A"))
The error happens when I try to execute the second line, stress_notes_document.merge(..). I am trying to assign a value from my dictionary to a mergefield in the word document.
Any suggestions?
edit: I am using this as a guide: http://pbpython.com/python-word-template.html
The examples shown they use strings in the merge() function.
Here is the full error :
Traceback (most recent call last):
File "<ipython-input-1-e67354559525>", line 1, in <module>
runfile('C:/Python_All/python_scripts/data_gather.py', wdir='C:/Python_All/python_scripts')
File "C:\Python_All\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "C:\Python_All\Anaconda\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Python_All/python_scripts/data_gather.py", line 114, in <module>
Effectivity = packet_info['Effectivity'])
File "C:\Python_All\Anaconda\lib\site-packages\mailmerge.py", line 176, in merge
self.merge_rows(field, replacement)
File "C:\Python_All\Anaconda\lib\site-packages\mailmerge.py", line 219, in merge_rows
self.merge([row], **row_data)
TypeError: merge() argument after ** must be a mapping, not str

The reason for the error is that one of parameter values you pass to merge() is a list not a string.
merge() as described in the docs enables you to pass list of dictionaries as a shortcut to merge_rows() function. So if you pass code as below (taken from docs) then it runs merge_rows() function using given list.
document.merge(field1='docx Mail Merge',
col1=[
{'col1': 'A'},
{'col1': 'B'},
])
Now, in your code one of the values you provide (packet_list['Installations list'] from comments) is a list, so merge() decides to run merge_rows on that. But format of your list does not match the expected format (expected format is list with dictionaries as elements, but in your code elements are strings). Thus you get error, when merge_rows tries to read provided data as dictionary.
To fix this you either convert list packet_list['Installations list'] into a string, for example:
",".join(packet_list['Installations list'] )
or convert that list into expected list of dictionaries format.
Whichever makes sense.

trying to increment the list value if key exits

for line in open('transactions.dat','r'):
item=line.rstrip('\n').split(',')
custid=item[2]
amt=item[4]
if custid in cust:
amt1=int(cust[custid])+int(amt)
cust[custid]=amt1
else:
cust[custid]=[amt]
well i am trying to check if customer id already is there in dictionary then simply add the previous amount and new amount in that customer. Otherwise add that amount in a new position in list. But i am getting error:
Traceback (most recent call last):
File "<pyshell#74>", line 7, in <module>
amt1=int(cust[custid])+int(amt)
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
some transaction data is like:
101300101,2016-09-03,376248582,1013,10.92
109400132,2016-09-03,391031719,1094,36.72
136100107,2016-09-03,391031719,1361,28.77

Did you try using defaultdict? It would make your job much easier.
from collections import defaultdict
cust = defaultdict(int)
for line in open('transactions.dat','r'):
item=line.rstrip('\n').split(',')
custid=item[2]
amt=item[4]
cust[custid] += float(amt)
and why do you try to cast amt to int? Looks like it's not an integer in the sample lines you posted. If you really want to use integer change float(amt) to int(float(amt)).

Python List Index out of Range when Appending a List

When appending to a list in Python, I am getting the error:
Traceback (most recent call last):
File "/Volumes/HARDRIVE/Java/Python/Test.py", line 16, in <module>
cities.append([1][i])
IndexError: list index out of range
The list cities is initialized here:
cities = [[0 for x in range(math.factorial(CITIES)+3)] for x in range(math.factorial(CITIES)+3)]
Why is it producing this error when there is obviously enough space for the append operation (I gave the list three more than it needed)? What should I do to fix it? This is the loop that contains the line of code:
for i in range(0,CITIES):
cities.append([1][i])
cities.append([1][i])
holder=cities[0][i]
cities[0][i]=cities[CITIES+1][i]
cities[CITIES+1][i]=holder
Thanks

I think maybe you might want to append a new list onto your existing lists
cities.append([1,i,0])
as an aside you can reproduce the issue easily as mentioned in the comments without anything to do with appending
for i in range(3):
try: print i, [1][i]
except IndexError: print "LIST:[1] has no index", i

Peewee: reducing where conditionals break after a certain length

This is what I have:
SomeTable.select.where(reduce(operator.or_, (SomeTable.stuff == entry for entry in big_list)))
The problem arises when I have a relatively large list of elements in big_list and I get this:
RuntimeError: maximum recursion depth exceeded
Is there another way to approach this that doesn't involve splitting up the list into several chunks?
Tried the suggestion to use any, here's my error:
Traceback (most recent call last):
File "C:/Users/f9xk3li/Documents/GitHub/leoshop_web/leoshop_web/data_models/data_model.py", line 347, in <module>
search_bins_all("BoA 0")
File "C:/Users/f9xk3li/Documents/GitHub/leoshop_web/leoshop_web/data_models/data_model.py", line 179, in search_bins_all
for d in generator.order_by(SomeTable.RetrievedDate.desc()):
File "C:\Users\f9xk3li\AppData\Local\Continuum\Anaconda\lib\site-packages\peewee.py", line 282, in inner
clone = self.clone() # Assumes object implements `clone`.
File "C:\Users\f9xk3li\AppData\Local\Continuum\Anaconda\lib\site-packages\peewee.py", line 2202, in clone
return self._clone_attributes(query)
File "C:\Users\f9xk3li\AppData\Local\Continuum\Anaconda\lib\site-packages\peewee.py", line 2412, in _clone_attributes
query = super(SelectQuery, self)._clone_attributes(query)
File "C:\Users\f9xk3li\AppData\Local\Continuum\Anaconda\lib\site-packages\peewee.py", line 2206, in _clone_attributes
query._where = self._where.clone()
AttributeError: 'bool' object has no attribute 'clone'
And here's the code
generator = SomeTable.select()
generator = generator.where(any(SomeTable.BIN == entry for entry in big_list))
for d in generator:
....

Try ...where(SomeTable.BIN.in_(big_list))
PeeWee has restrictions as to what can be used in their where clause in order to work with the library.
http://docs.peewee-orm.com/en/latest/peewee/querying.html#query-operators

To expand on Jacob's comment on the approved answer, I think he's saying you can use subqueries rather than resolving all the IDs.
E.G.
admin_users = User.select().where(User.is_admin == True)
admin_messages = Message.select().where(Message.user.in_(admin_users))

ValueError: too many values to unpack

I am trying to sort dictionaries in MongoDB. However, I get the value error "too many values to unpack" because I think it's implying that there are too many values in each dictionary (there are 16 values in each one). This is my code:
FortyMinute.find().sort(['Rank', 1])
Anyone know how to get around this?
EDIT: Full traceback
Traceback (most recent call last):
File "main.py", line 33, in <module>
main(sys.argv[1:])
File "main.py", line 21, in main
fm.readFortyMinute(args[0])
File "/Users/Yih-Jen/Documents/Rowing Project/FortyMinute.py", line 71, in readFortyMinute
writeFortyMinute(FortyMinData)
File "/Users/Yih-Jen/Documents/Rowing Project/FortyMinute.py", line 104, in writeFortyMinute
FortyMinute.find().sort(['Rank', 1])
File "/Users/Yih-Jen/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 692, in sort
self.__ordering = helpers._index_document(keys)
File "/Users/Yih-Jen/anaconda/lib/python2.7/site-packages/pymongo/helpers.py", line 65, in _index_document
for (key, value) in index_list:
ValueError: too many values to unpack

You pass the arguments and values in unpacked as so:
FortyMinute.find().sort('Rank', 1)
It is only when you're passing multiple sort parameters that you group arguments and values using lists, and then too you must surround all your parameters with a tuple as so:
FortyMinute.find().sort([(Rank', 1), ('Date', 1)])
Pro-tip: Even the Cursor.sort documentation linked below recommends using pymongo.DESCENDING and pymongo.ASCENDING instead of 1 and -1; in general, you should use descriptive variable names instead of magic constants in your code as so:
FortyMinute.find().sort('Rank',pymongo.DESCENDING)
Finally, if you are so inclined, you can sort the list using Python's built-in as the another answerer mentioned; but even thought sorted accepts iterators and not just sequences it might be more inefficient and nonstandard:
sorted(FortyMinute.find(), key=key_function)
where you might define key_function to return the Rank column of a record.
Link to the official documentation

If you want mong/pymongo to sort:
FortyMinute.find().sort('Rank', 1)
If you want to sort using multiple fields:
FortyMinute.find().sort([('Rank': 1,), ('other', -1,)])
You also have constants to make it more clear what you're doing:
FortyMinute.find().sort('Rank',pymongo.DESCENDING)
If you want to sort in python first you have to return the result and use a sorting method in python:
sorted(FortyMinute.find(), key=<some key...>)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas DataFrame exists in list - python

Related

populating word template with python .merge() but TypeError: merge() argument after ** must be a mapping, not str

trying to increment the list value if key exits

Python List Index out of Range when Appending a List

Peewee: reducing where conditionals break after a certain length

ValueError: too many values to unpack

Categories

Resources