Pandas converting String object to lower case and checking for string - python

I have the below code
import pandas as pd
private = pd.read_excel("file.xlsx","Pri")
public = pd.read_excel("file.xlsx","Pub")
private["ISH"] = private.HolidayName.str.lower().contains("holiday|recess")
public["ISH"] = public.HolidayName.str.lower().contains("holiday|recess")
I get the following error:
AttributeError: 'Series' object has no attribute 'contains'
Is there anyway to convert the 'HolidayName' column to lower case and then check the regular expression ("Holiday|Recess")using .contains in one step?

private["ISH"] = private.HolidayName.str.contains("(?i)holiday|recess")
The (?i) in the regex pattern tells the re module to ignore case.
The reason why you were getting an error is because the Series object does not have the contains method; instead the Series.str attribute has the contains method. So you could avoid the error with:
private["ISH"] = private.HolidayName.str.lower().str.contains("holiday|recess")

I'm a bit late to the party, but you could use the keyarg
case : bool, default True, If True, case sensitive.
private["ISH"] = private.HolidayName.str.contains("holiday|recess", case=False)
public["ISH"] = public.HolidayName.str.contains("holiday|recess", case=False)

Related

Cast column to differnent datatype before performing evaluation in pyarrow dataset filter

In a less than ideal situation, I have values within a parquet dataset that I would like to filter, using > = < etc, however, because of the mixed datatypes in the dataset as a whole, it seems the field is read as an object.
Is there a way to cast the column to a different datatype before evaluating:
I.e. ds.field('value').cast('uint8') > 60 to become part of the filter list?
ParquetDataset(folder, use_legacy_dataset=False, filters=[('group','==',group),('table','==',table),(ds.field('value').cast('uint8') > 60)]).read().to_pandas()
I tried the method above and didn't seem to work, it just returns the:
TypeError: object of type 'pyarrow._compute.Expression' has no len()
The dataset api can filter on expressions.
import pyarrow.dataset as ds
expr = (ds.field('group') == group) & (ds.field('table') == table) & (ds.field('value').cast('uint8') > 60)
ds.dataset(folder).to_table(filter=expr).to_pandas()
There's a syntax error:
(ds.field('value').cast('uint8') > 60)
This return an expression which isn't what you want. You want a tuple:
(ds.field('value').cast('uint8'), ">", 60)
That's why it complains about TypeError: object of type 'pyarrow._compute.Expression' has no len(), it was expecting a tuple
PS:
it seems the field is read as an object.
There are no object in pyarrow, I don't think you need to cast.

Pandas: AttributeError: 'str' object has no attribute 'isnull'

I'm creating a new column named lead_actor_actress_known whose values is boolean based on whether there value in 2nd column lead_actor_actress has value or not. If there is a value(Name of actors) populate 1st column using True if there is no value, return False
AttributeError: 'str' object has no attribute 'isnull'
My code is throwing an error above.
df['lead_actor_actress_known'] = df['lead_actor_actress'].apply(lambda x: True if x.isnull() else False)
What i'm i missing?
Henry Ecker's comment contains the answer to this question, I am reproducing in the answer section for convenience. Replace your application of the .apply() method with the code df['lead_actor_actress_known'] = df['lead_actor_actress'].isna().
The thing to know here is that df['lead_actor_actress'].isna() returns a "Boolean mask" (a series of True and False values) and this is exactly what you are asking to assign the variable lead_actor_actress.

What does a ¨no attribute error¨ mean in python?

I don´t understand why I get this error (AttributeError: 'str' object has no attribute 'ascii_uppercase') when I try running my code. Iḿ guessing it could be some indenting thatś out of place?
import collections
import string
def caesar(message, key):
upper = collections.deque(string.ascii_uppercase)
lower = collections.deque(string.ascii_lowercase)
upper.rotate(key)
lower.rotate(key)
upper = ''.join(list(upper))
lower = ''.join(list(lower))
return message.translate(string.maketrans(string.ascii_uppercase, upper)).translate(string.maketrans(string.ascii_lowercase, lower))
string = "hi my name is sam"
for i in range(len(string.ascii_uppercase)):
print i, " | ", caesar(string, i)```
What it actually means is that a string object has no ascii_uppercase attribute, Python is usually pretty clear about things like that :-)
On a less jocular note, it's saying that strings do not have a method called ascii_uppercase that you can use on them. If you want a string to be upper-cased, just use myStr.upper().
If you need something more complex/nuanced than a simple upper-casing, you'll probably have to write it yourself.
Custom variable string="hi my name is sam" conflicts with import string
maybe str1="hi my name is sam"
if you want to convert to uppercase
print("hi my name is sam".upper())
or
[print(i) for i in "hi my name is sam".upper()]

check when trying to assign Boolean

ctx['location_ids'] = vals['location_ids']
I have a large function so I will not post it here, but the problem is when vals['location_ids'] have values as integer everything works smooth, but sometimes there are no values in vals['location_ids'] so it is False, and when it is False I get error.
ctx['location_ids'] = vals['location_ids']
TypeError: 'bool' object has no attribute '__getitem__'
how can I avoid it, maybe add hasattr?
you should try to check first it's dictionary
if isinstance(vals, dict):
ctx['location_ids'] = vals.get('location_ids', None)

Mapping python tuple and R list with rpy2?

I'm having some trouble to understand the mapping with rpy2 object and python object.
I have a function(x) which return a tuple object in python, and i want to map this tuple object with R object list or vector.
First, i'm trying to do this :
# return a python tuple into this r object tlist
robjects.r.tlist = get_max_ticks(x)
#Convert list into dataframe
r('x <- as.data.frame(tlist,row.names=c("seed","ticks"))')
FAIL with error :
rinterface.RRuntimeError: Error in eval(expr, envir, enclos) : object 'tlist' not found
So i'm trying an other strategy :
robjects.r["tlist"] = get_max_ticks(x)
r('x <- as.data.frame(tlist,row.names=c("seed","ticks"))')
FAIL with this error :
TypeError: 'R' object does not support item assignment
Could you help me to understand ?
Thanks a lot !!
Use globalEnv:
import rpy2.robjects as ro
r=ro.r
def get_max_ticks():
return (1,2)
ro.globalEnv['tlist'] = ro.FloatVector(get_max_ticks())
r('x <- as.data.frame(tlist,row.names=c("seed","ticks"))')
print(r['x'])
# tlist
# seed 1
# ticks 2
It may be possible to access symbols in the R namespace with this type of notation: robjects.r.tlist, but you can not assign values this way. The way to assign symbol is to use robject.globalEnv.
Moreover, some symbols in R may contain a period, such as data.frame. You can not access such symbols in Python using notation similar to robjects.r.data.frame, since Python interprets the period differently than R. So I'd suggest avoiding this notation entirely, and instead use
robjects.r['data.frame'], since this notation works no matter what the symbol name is.
You could also avoid the assignment in R all together:
import rpy2.robjects as ro
tlist = ro.FloatVector((1,2))
keyWordArgs = {'row.names':ro.StrVector(("seed","ticks"))}
x = ro.r['as.data.frame'](tlist,**keyWordArgs)
ro.r['print'](x)

Categories

Resources