In Pandas- how to check of dataframe is empty after every manipulation? - python

In doing a lot of manipulations on dataframe and in one of them it can return empty (which an except able result).
the thing is if turns empty it crashes on the other line, like in the following code:
NumOfActiveDays = local_input_list.groupby(['DeviceSidID'])['timestamp'].nunique().reset_index().rename(columns={'timestamp': 'NumOfDays'})
NumOfActiveDays = NumOfActiveDays[NumOfActiveDays.NumOfDays >= float(extdf.dict[entity]['days_thresh'])]
local_input_list = local_input_list[local_input_list.DeviceSidID.isin(NumOfActiveDays.loc[:, 'DeviceSidID'])].reset_index(drop=True)
if NumOfActiveDays will become empty it will crash the third line...
is there a better why to check if data manipulation ends up empty instead of after every line to do if df.empty()?
Thanks

Related

forLoop except criteria is deleting whole row, which i do not want

Let me start with the background: I have a Dataframe. It's a column of hyperlinks. I use a forLoop to extract the hyperlinks with an attribute target and add them to an appended column.
Result of successful forLoop.
now let me throw curveball: a blank/gap. let's say that there is a gap in the column and Source C is out of the picture – what happens to the forLoop then?
Result of unwanted forLoop
what if instead of deleting the entire row, I want the forLoop to put a blank cell there? so that no data is being rearranged and Source C has a blank cell or NaN cell next to it. Does that make sense? What are my options? (also note that my print() function is not really working as I intend it to.) for what it's worth, ws.cell is an openpyxl operation that accesses a cell of an Excel sheet.
Here is the hard code just in case:
links = []
for i in range(2, ws.max_row + 1): # 2nd arg in range() not inclusive, so add 1
try:
links.append(ws.cell(row=i, column=1).hyperlink.target)
except AttributeError or NaN:
print('nothing here')
df['link'] = pd.Series(links)
df
Can't see your input data, i.e. the input xlsx file and may not be able make a sure solution. Anyway, have you tried the following?
...
except AttributeError or NaN:
lists.append('') # still append a blank string to the list
print('nothing here')

Pandas - Can't change datatype of dataframe columns

Downloading some data from here:
http://insideairbnb.com/get-the-data.html
Then
listings = pd.read_csv('listings.csv')
Trying to change types
listings.bathrooms = listings.bathrooms.astype('int64',errors='ignore')
listings.bedrooms = listings.bedrooms.astype('int64',errors='ignore')
listings.beds = listings.beds.astype('int64',errors='ignore')
listings.price = listings.price.replace('[\$,]','',regex=True).astype('float')
listings.price = listings.price.astype('int64',errors='ignore')
Tried some other combinations but at the end pops error or just doesn't change datatype.
EDIT: corrected some typos
The apostrophes in the last line is not in the correct place and the last one is not the correct type: you need ' instead of ` (maybe it was accidentaly added because of the code block).
So for me it works like this:
listings.price.astype('int64', errors='ignore')
But if you would like to reassign it to the original variable then you need the same structure as you used in the previous lines:
listings.price = listings.price.astype('int64', errors='ignore')

Appending item to list in Python overwring all existing elements

I am writing a script to read a .osu file and convert it to specific objects. This has to be done multiple times for each "hitobject"
The reading part works fine, however appending the object is the problem
When appending the object, it seems to overwrite all existing elements in the list. I can not for the life of me figure out why this is happening.
I have tried creating a "temp" list, which stores the objects in a local list instead of the "self.notes" list, still the same issue.
I believe the error is occurring in this part of the file:
if hitobjline != -1:
hitobjects = self.file_lines[hitobjline+1:]
for i in hitobjects:
ln = i[:i.find(':')].split(',')
new_note = [NoteType.Circle, NoteType.Hold][int(ln[3] == '128')]
add_note = File.Note
add_note.NoteTypeByte = ln[3]
add_note.Note_Number = int(ln[0])
add_note.Time = int(ln[2])
add_note.NoteType = new_note
add_note.Raw = ln
self.notes.append(add_note)
print(ln, ln[3], ln[3] == '128', new_note, add_note.NoteType)
For background, the .osu files have a syntax like: x,y,time,type,hit,end:stuff-i-dont-need-to-worry-about
I expected an out put of self.notes[0].NoteType to be osureader.NoteType.Hold as the first line of the file is 192,192,410,128,0,2974:0:0:0:0: (the 128 indicating the 'Hold'
However, I get, osureader.NoteType.Circle, the last line of the file.

Python pandas if statement based off of boolean qualifier

I am try to do an IF statement where it keeps my currency pairs in alphabetic ordering (i.e. USD/EUR would flip to EUR/USD because E alphabetically comes before U, however CHF/JPY would stay the same because C comes alphabetically before J.) Initially I was going to write code specific to that, but realized there were other fields I'd need to flip (mainly changing a sign for positive to negative or vice versa.)
So what I did was write a function to create a new column and make a boolean identifier as to whether or not the field needs action (True) or not (False).
def flipFx(ccypair):
first = ccypair[:3]
last = ccypair[-3:]
if(first > last):
return True
else:
return False
brsPosFwd['Flip?'] = brsPosFwd['Currency Pair'].apply(flipFx)
This works great and does what I want it to.
Then I try and write an IF statement to use that field to create two new columns:
if brsPosFwd['Flip?'] is True:
brsPosFwd['CurrencyFlip'] = brsPosFwd['Sec Desc'].apply(lambda x:
x.str[-3:]+"/"+x.str[:3])
brsPosFwd['NotionalFlip'] = -brsPosFwd['Current Face']
else:
brsPosFwd['CurrencyFlip'] = brsPosFwd['Sec Desc']
brsPosFwd['NotionalFlip'] = brsPosFwd['Current Face']
However, this is not working properly. It's creating the two new fields, CurrencyFlip and NotionalFlip but treating every record like it is False and just pasting what came before it.
Does anyone have any ideas?
Pandas uses vectorised functions. You are performing operations on entire series objects as if they were single elements.
You can use numpy.where to vectorise your calculations:
import numpy as np
brsPosFwd['CurrencyFlip'] = np.where(brsPosFwd['Flip?'],
brsPosFwd['Sec Desc'].str[-3:]+'/'+brsPosFwd['Sec Desc'].str[:3]),
brsPosFwd['Sec Desc'])
brsPosFwd['NotionalFlip'] = np.where(brsPosFwd['Flip?'],
-brsPosFwd['Current Face'],
brsPosFwd['Current Face'])
Note also that pd.Series.apply should be used as a last resort; since it is a thinly veiled inefficient loop. Here you can simply use the .str accessor.

Get last row of View by couchbase query

i have a query which returns me a Viewobject with all the entries i want to process. I know i can iterate over this view Object so that i can use the single entries for my purposes.
Now i want to extract only the first and the last row. The first row is no problem because i can just iterate and break the loop after the first item.
Now my question is, how to get the last element from the View.
I tried by:
for row in result_rows:
rowvalue = row[3].value
diagdata = rowvalue[models.DIAGDATA]
if models.ODOMETER in diagdata:
start_mileage = diagdata[models.ODOMETER]
start_mileage_found = True
break
row = result_rows[len(result_rows)]
rowvalue = row[3].value
diagdata = rowvalue[models.DIAGDATA]
if models.ODOMETER in diagdata:
end_mileage = diagdata[models.ODOMETER]
end_mileage_found = True
The second value i obviously wont get, because view has neither a length nor can i access the rows by a index. Has anyone an idea how to get the last element?
You might run another request but with descending=True option, so that the server will stream results in reverse order.
Or you can convert iterator to array which basically the same a iterate through all values. I'm not a python expert, but it seems like list(result_rows) will do it for you. And when you are doing len(...) it probably doing it for you implicitly. There is rows_returned method to get the number of rows without turning it to list.

Categories

Resources