This question already has answers here:
How to access (get or set) object attribute given string corresponding to name of that attribute
(3 answers)
Closed 8 months ago.
I'm not quite sure how to phrase this question, so let me illustrate with an example.
Let's say you have a Pandas dataframe called store_df with a column called STORE_NUMBER. There are two ways to access a given column in a Pandas dataframe:
store_df['STORE_NUMBER']
and
store_df.STORE_NUMBER
Now let's say that you have a variable called column_name which contains the name of a column in store_df as a string. If you run
store_df[column_name]
All is well. But if you try to run
store_df.column_name
Python throws an AttributeError because it is looking for a literal column named "column_name" which doesn't exist in our hypothetical dataframe.
My question is: Is there a way to look up columns dynamically using second syntax (dot notation)? Not so much because there is anything wrong with the first syntax (list notation), but because I am curious if there is some advanced feature of Python that allows users to replace variable names with their value as another variable (in this case a state variable of the dataframe). I know there is the exec function but I was wondering if there was a more elegant solution. I tried
store_df.{column_name}
but received a SyntaxError.
Would getattr(df, 'column_name_as_str') be the kind of thing you're looking for, perhaps?
Related
I've started using python Snowpark and no doubt missing obvious answers based on being unfamiliar to the syntax and documentation.
I would like to do a very simple operation: append a new column to an existing Snowpark DataFrame and assign with a simple string.
Any pointers to the documentation to what I presume is readily achievable would be appreciated.
You can do this by using the function with_column in combination with the lit function. The with_column function needs a Column expression and for a literal value this can be made with the lit function. see documentation here: https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.lit.html
from snowflake.snowpark.functions import lit
snowpark_df = snowpark_df.with_column('NEW_COL', lit('your_string'))
This question already has answers here:
Pandas: convert categories to numbers
(6 answers)
Closed 4 months ago.
I'm new to python and have task to solve. I got large a .csv file and I was wondering is there simple way to transfer string values from one column to numerical values in another colomn.
For example, in one column I have a bunch of different factory names and in the new colum should be numerical value for every factory:
Factories
NumValues
FactoryA
1
FactoryB
2
FactoryA
1
FactoryC
3
I know that i could do this with dictionaries, but since there is quite a lot of different names(factories) i was wondering if there is already some library to make this process easier and faster?
I hope I explained my problem well.
you can use ngroup() . Basically, group by factories and give an id for every factory. Does this give the output you want?
df['NumValues']=df.groupby('Factories').ngroup() + 1
I have pandas DataFrame named 'dataset' and it contains a column named 'class'
when I execute the following line I get SyntaxError: invalid syntax
print("Unique values in the Class column:", dataset.class.unique())
It works for another column names but not working with 'class'
How to use a keyword as column name in pandas ?
class is a keyword in python. A rule of thumb: whenever you're dealing with column names that cannot be used as valid variable names in python, you must use the bracket notation to access: dataset['class'].unique().
There are, of course, exceptions here, but they work against your favour. For example, min/max is a valid variable name in python (even though it shadows builtins). In the case of pandas, however, you cannot refer to such a named column using the Attribute Access notation. There are more such exceptions, they're enumerated in the documentation.
A good place to begin with further reading is the documentation on Attribute Access. Specifically, the red Warning box), which I'm adding here for posterity:
You can use this access only if the index element is a valid Python
identifier, e.g. s.1 is not allowed. See here for an explanation of
valid identifiers.
The attribute will not be available if it conflicts with an existing
method name, e.g. s.min is not allowed, but s['min'] is possible.
Similarly, the attribute will not be available if it conflicts with
any of the following list: index, major_axis, minor_axis, items.
In any of these cases, standard indexing will still work, e.g. s['1'],
s['min'], and s['index'] will access the corresponding element or
column.
class is reserved word.
You can do as dataset['class'].unique()
This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 3 years ago.
This pandas python code generates the error message,
"TypeError: bad operand type for unary ~: 'float'"
I have no idea why because I'm trying to manipulate a str object
df_Anomalous_Vendor_Reasons[~df_Anomalous_Vendor_Reasons['V'].str.contains("File*|registry*")] #sorts, leaving only cases where reason is NOT File or Registry
Anybody got any ideas?
Credit to Davtho1983 comment above, I thought I'd add color to the comment for clarity.
For anyone stumbling on this later with the same error (like me).
It's a very simple fix. The documentation from pandas shows
Series.str.contains(pat, case=True, flags=0, na=nan, regex=True)
What's happening is the contains() method isn't being applied to na values in the DataFrame, they will remain na. You just need to fill na values with Boolean values so you may use the invert operator ~ .
With the example above one should use
df_Anomalous_Vendor_Reasons[~df_Anomalous_Vendor_Reasons['V'].str.contains("File*|registry*", na=False)]
Of course one should choose False or True for the na argument based on intended logic. Whichever Boolean value you choose for filling na will be inverted.
This question already has answers here:
How do I convert a Pandas series or index to a NumPy array? [duplicate]
(8 answers)
Closed 4 years ago.
I'm probably using poor search terms when trying to find this answer. Right now, before indexing a DataFrame, I'm getting a list of values in a column this way...
list = list(df['column'])
...then I'll set_index on the column. This seems like a wasted step. When trying the above on an index, I get a key error.
How can I grab the values in an index (both single and multi) and put them in a list or a list of tuples?
To get the index values as a list/list of tuples for Index/MultiIndex do:
df.index.values.tolist() # an ndarray method, you probably shouldn't depend on this
or
list(df.index.values) # this will always work in pandas
If you're only getting these to manually pass into df.set_index(), that's unnecessary. Just directly do df.set_index['your_col_name', drop=False], already.
It's very rare in pandas that you need to get an index as a Python list (unless you're doing something pretty funky, or else passing them back to NumPy), so if you're doing this a lot, it's a code smell that you're doing something wrong.