I am working on a (simple) function. Based on user input (name and month) the function searches in the df. The code sums op the amount of money spent in that shop in the specified month.
Names in the df are written sometimes with capital, sometimes not. So I want all names extracted from df to be lowercase as well as all user input.
Making the name input lowercase is no problem. But how / where do I write .lower in the code with multiple conditions?
So my question is: how do I place .lower around the .str.contains(naam) part?
(code below works well when part of name is typed with Capital letters in the right spot).
def euro_month():
name = input('What shop are you looking for: ')
name = (name.lower())
month = input('Give the month number, 1 - 12: ')
df = df_2019.loc[((df_2019['Name'].str.contains(name)))&(df_2019['Month'] == int(month))]
bedrag = round(df['Bedrag'].sum(),2)
print('We spent in shop', name, 'in month ', str(maand), ' 2019', bedrag, ' Euro's.' )
pandas str.contains() has an argument to make the search not case sensitive https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html
in your code:
df = df_2019.loc[((df_2019['Name'].str.contains(name, case=False)))&(df_2019['Month'] == int(month))]
or instead you can use str.lower() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.lower.html
df = df_2019.loc[((df_2019['Name'].str.lower().str.contains(name, case=False)))&(df_2019['Month'] == int(month))]
This should work.
df = df_2019.loc[((df_2019['Name'].str.lower().str.contains(name))) & (df_2019['Month'] == int(month))]
You can simply call .lower and then call .str.contains
Related
I wonder how to add underscore between "non" or "no" and the word followed by it using python. Thank you in advanced.
For example,
Input dataframe:
Expected output dataframe:
You could use the pandas "apply" method as follows.
import pandas as pd
def func(s):
tokens = s.split()
i = 0
while i<len(tokens):
if tokens[i] in ["no","non"] and i < len(tokens)-1:
tokens[i] = f"{tokens[i]}_{tokens[i+1]}"
tokens.pop(i+1)
i+=1
return ' '.join(tokens)
df = pd.DataFrame({'id':[1,2], "text":["no damage car", "non damage car"]})
df["text"] = df["text"].apply(func)
The resulting dataframe df:
id text
0 1 no_damage car
1 2 non_damage car
Granted, the function being applied could be made "nicer" with the help of regular expressions.
I have a scenario where I have a list of names of countries. Now, I have to prompt for user input 5 times and if that input matches a particular string in the list, then I have to append that string into a second list. If the value entered does not match with any name in the list, then I have to keep on asking the user until a correct word is typed. My python code is mentioned below.
Python Code:
a = []
for i in range(5):
b = str(input("Enter the name: "))
if(b == 'USA' or b == 'UK'):
a.append(b)
else:
for name in a:
if(b == name):
c.append(name)
print(c)
Problem:
I am not able to compare the user input with the strings present in the list.
Can someone please help me in implementing the above-mentioned logic?
To check if your input provided country exists in a list you can do the following:
country = input("Enter the name of a country: ")
if country in country_names:
# logic if exists
else:
# logic is not exists
if name not in country_name:
country_list.append(name)
Ive written a program which takes in the name and age of multiple entries seperated by a comma and then sepearates the aplhabets from the numerics and then compares the name with a pre defined set/list.
If the entry doesnt match with the pre defined data, the program sends a message"incorrect entry" along with the element which didnt match.
heres the code:
from string import digits
print("enter name and age")
order=input("Seperate entries using a comma ',':")
order1=order.strip()
order2=order1.replace(" ","")
order_sep=order2.split()
removed_digits=str.maketrans('','',digits)
names=order.translate(removed_digits)
print(names)
names1=names.split(',')
names_list=['abby','chris','john','cena']
names_list=set(names_list)
for name in names1:
if name not in names_list:
print(f"{name}:doesnt match with predefined data")
the problem im having is even when i enter chris or john, the program treats them as they dont belong to the pre defined list
sample input : ravi 19,chris 20
output:ravi ,chris
ravi :doesnt match with predefined data
chris :doesnt match with predefined data
also i have another issue , ive written a part to eliminate whitespace but i dont know why, it doesnt elimintae them
sample input:ravi , chris
ravi :doesnt match with predefined data
()chris :doesnt match with predefined data
theres a space where ive put parenthesis.
any suggestion to tackle this problem and/or improve this code is appreciated!
I think some of the parts can be simplified, especially when removing the digits. As long as the input is entered with a space between the name and age, you can use split() twice. First to separate the entries with split(',') and next to separate out the ages with split(). It makes comparisons easier later if you store the names by themselves with no punctuation or whitespace around them. To print the names out from an iterable, you can use the str.join() function. Here is an example:
print("enter name and age")
order = input("Seperate entries using a comma ',': ")
names1 = [x.split()[0] for x in order.split(',')]
print(', '.join(names1))
names_list=['abby', 'chris', 'john', 'cena']
for name in names1:
if name not in names_list:
print(f"{name}:doesnt match with predefined data")
This will give the desired output:
enter name and age
Seperate entries using a comma ',': ravi 19, chris 20
ravi, chris
ravi:doesnt match with predefined data
So I have a review dataset having reviews like
Simply the best. I bought this last year. Still using. No problems
faced till date.Amazing battery life. Works fine in darkness or broad
daylight. Best gift for any book lover.
(This is from the original dataset, I have removed all punctuation and have all lower case in my processed dataset)
What I want to do is replace some words by 1(as per my dictionary) and others by 0.
My dictionary is
dict = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}
I want my output like:
0010000000000001000000000100000
I have used this code:
df['newreviews'] = df['reviews'].map(dict).fillna("0")
This always returns 0 as output. I did not want this so I took 1s and 0s as strings, but despite that I'm getting the same result.
Any suggestions how to solve this?
First dont use dict as variable name, because builtins (python reserved word), then use list comprehension with get for replace not matched values to 0.
Notice:
If data are like date.Amazing - no space after punctuation is necessary replace by whitespace.
df = pd.DataFrame({'reviews':['Simply the best. I bought this last year. Still using. No problems faced till date.Amazing battery life. Works fine in darkness or broad daylight. Best gift for any book lover.']})
d = {"amazing":"1","super":"1","good":"1","useful":"1","nice":"1","awesome":"1","quality":"1","resolution":"1","perfect":"1","revolutionary":"1","and":"1","good":"1","purchase":"1","product":"1","impression":"1","watch":"1","quality":"1","weight":"1","stopped":"1","i":"1","easy":"1","read":"1","best":"1","better":"1","bad":"1"}
df['reviews'] = df['reviews'].str.replace(r'[^\w\s]+', ' ').str.lower()
df['newreviews'] = [''.join(d.get(y, '0') for y in x.split()) for x in df['reviews']]
Alternative:
df['newreviews'] = df['reviews'].apply(lambda x: ''.join(d.get(y, '0') for y in x.split()))
print (df)
reviews \
0 simply the best i bought this last year stil...
newreviews
0 0011000000000001000000000100000
You can do:
# clean the sentence
import re
sent = re.sub(r'\.','',sent)
# convert to list
sent = sent.lower().split()
# get values from dict using comprehension
new_sent = ''.join([str(1) if x in mydict else str(0) for x in sent])
print(new_sent)
'001100000000000000000000100000'
You can do it by
df.replace(repl, regex=True, inplace=True)
where df is your dataframe and repl is your dictionary.
I am new to python, and want to find all 'date-related' words in a sentence, such as date, Monday, Tuesday, last week, next week, tomorrow, yesterday, today, etc.
For example:
input: 'Yesterday I went shopping'
return: 'Yesterday'
input: 'I will start working on Tuesday'
return: 'Tuesday'
input: 'My birthday is 1998-12-12'
return: '1998-12-12'
I find that python package 'datefinder' can find these words, but it will automatically change these words to standard datetime. However, I only want to extract these words, is there any other method or package that can do this?
Thanks for your help!
This is how I would do the logic for it, as far as getting the numbers from a string that contains digits as well I'm not sure, I would create and input that would specifically ask for digits then as I did firstSentence.lower() I would then do firstSentence = int(firstSentence) to ensure only ints passed
firstSentence = raw_input('Tell me something: ')
firstSentence = firstSentence.lower()
if 'yesterday' in firstSentence:
#now pass a function that returns date/time
pass
elif 'tuesday' in firstSentence:
#now pass a function that returns date/time
pass
else:
print 'No day found'