I would like to apply conditional formatting to my spreadsheet, but instead of iterating over every single cell (of which there are very many) I want to apply the formatting to the entire row.
I have been trying to figure out how to just select a row. I looked on here (scroll a bit less than halfway down to 'Accessing many cells'), and it tells me
row10 = ws[10]
but when I try to do that I get the error message saying
TypeError: argument of type 'int' is not iterable
which I take to mean that it is not a valid range. I messed around with it a bit, but could not make it work.
So at this point I have no idea what to do. Hopefully someone can help.
Related
I want to replace the commas in my dataframe, particularly in the Generation column, with nothing. There's no error but it doesn't get rid of the commas. I want to be able to change the dtype of the generation column to numeric without having to do error = 'coerce'.
enter image description here
I'm pretty sure I've used this same code before and it has worked, I'm not sure why today it isn't.
I have a dataframe that's the result of importing a csv and then performing a few operations and adding a column that's the difference between two other columns (column 10 - column 9 let's say). I am trying to sort the dataframe by the absolute value of that difference column, without changing its value or adding another column.
I have seen this syntax over and over all over the internet, with indications that it was a success (accepted answers, comments saying "thanks, that worked", etc.). However, I get the error you see below:
df.sort_values(by='Difference', ascending=False, inplace=True, key=abs)
Error:
TypeError: sort_values() got an unexpected keyword argument 'key'
I'm not sure why the syntax that I see working for other people is not working for me. I have a lot more going on with the code and other dataframes, so it's not a pandas import problem I don't think.
I have moved on and just made a new column that is the absolute value of the difference column and sorted by that, and exclude that column from my export to worksheet, but I really would like to know how to get it to work the other way. Any help is appreciated.
I'm using Python 3
df.loc[(df.c - df.b).sort_values(ascending = False).index]
Sorting by difference between "c" and "b" , without creating new column.
I hope this is what you were looking for.
key is optional argument
It accepts series as input , maybe you were working with dataframe.
check this
I have a data frame with several columns. My goal is to manipulate the last column in a certain way. Until now the last column has the type string. I need help by building a for loop that walks through the last column and removes the last two chars and then typecasts it into a float.
An example for an entry in the last column is "1234.5678;;". I want it to look like 1234.5678 and that for every entry for the last column.
Thanks in advance.
I'm unsure of what exactly you are asking, are you asking for how you can manipulate a string to remove the last two chars? Or are you asking for how you access (and edit/change) a data frame? If so are you using pandas?
Clarifying that may help other people help you with your issue more effectively.
In python, given any string, you can cut off the last two characters like this:
string=string[:-2]
I assume that the variable string holds your string you want.
For the future it would be greatly appreciated if you were to explain your issue in more detail, explain what it is you want to do and where you need help and to overall put in more effort into your question, a spelling mistake in the title is not a good look optics wise.
When you're using a pandas DataFrame, you can do it like this:
w.iloc[:][w.columns.size-1]=w.iloc[:][w.columns.size-1].str.replace(";;","")
w is your DataFrame. This line will replace all the ; in the column. I assume that the last two characters always follow the same pattern. This will obviously not work if the last two characters follow a not known pattern.
L is a list of dataframes with a multiindex on the rows.
pd.concat(L,axis=1)
I get the following error (from the Categorical constructor in categorical.py):
TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.
It clearly has something to do with the values in my dataframe, as I can get it work if I restrict the data in some way.
E.g. all of these work
a=pd.concat(L[0:6],axis=1)
b=pd.concat(L[6:11],axis=1)
c=pd.concat(L[3:9],axis=1)
but
d=pd.concat(L[0:11],axis=1)
fails.
pd.concat([x.iloc[0:1000,:] for x in L[0:11]],axis=1)
also works. I've gone through the edge cases at which it breaks, and for the life of me, I don't see anything that could be offensive in those rows. Does anyone have some ideas on what I should be looking for?
I just had this issue too when I did a df.groupby(...).apply(...) with a custom apply function. The error seemed to appear when the results were to get merged back together after the groupby-apply (so I must have returned something in my custom apply function that it didn't like).
After inspecting the extensive stacktrace provided by pytest I found a mysterious third value appeared in my index values:
values = Index([(2018-09-01 00:00:00, 'SE0011527613'),
(2018-09-25 00:00:00, 'SE0011527613'),
1535760000000000000], dtype='object')
I have absolutely no idea how it appeared there, but I managed to work around it somehow by avoiding multi-indexed stuff in that particular part of the code (extensive use of reset_index and set_index).
Not sure if this will be of help to anyone, but there you have it. If someone could attempt a minimal reproducible example that would be helpful (I didn't manage to).
I came across the same error:
TypeError: 'values' is not ordered, please explicitly specify the categories order by passing in a categories argument.
However, there is not much material around it. Have a look what the error log states a bit further above. I have:
TypeError: unorderable types: range() < range()
During handling of the above exception, another exception occurred:
The clue was 'range() < range()' because I had a previous problem here with Pandas interpreting '(1,2)' or '(30,31)' not as string but as 'range(1,3)' or 'range(30,32)' respectively. Very annoying as the dtypes is still object.
I had to change the column content to lists and/or drop the 'range(x,y)' column.
Hope this helps or anybody else who comes across this problem. Cheers!
Everytime I import this one csv ('leads.csv') I get the following error:
/usr/local/lib/python2.7/site-packages/pandas/io/parsers.py:1130: DtypeWarning: Columns (11,12,13,14,17,19,20,21) have mixed types. Specify dtype option on import or set low_memory=False.
data = self._reader.read(nrows)
I import many .csv's for this one analysis of which 'leads.csv' is only one. It's the only file with the problem. When I look at those columns in a spreadsheet application, the values are all consistent.
For example, Column 11 (which is Column K when using Excel), is a simple Boolean field and indeed, every row is populated and it's consistently populated with exactly 'FALSE' or exactly 'TRUE'. The other fields that this error message references have consistently-formatted string values with only letters and numbers. In most of these columns, there are at least some blanks.
Anyway, given all of this, I don't understand why this message keeps happening. It doesn't seem to matter much as I can use the data anyway. But here are my questions:
1) How would you go about identifying the any rows/records that are causing this error?
2) Using the low_memory=False option seems to be pretty unpopular in many of the posts I read. Do I need to declare the datatype of each field in this case? Or should I just ignore the error?