I am trying to write (excel file) /print a dataframe , where column values consists of next line (\n) .
>>> textbox = ""
>>> textbox+=str('So, so you think you can tell \n')
>>> textbox+=str('Heaven from hell')
>>> print textbox
So, so you think you can tell
Heaven from hell
>>> df1 = pd.DataFrame({'lyric':[textbox]})
>>> df1
lyric
0 So, so you think you can tell \nHeaven from hell
>>> print df1
lyric
0 So, so you think you can tell \nHeaven from hell
So when i print the df or write a to excel file , I see that instead of next line , "\n" is printed.
How to print next line?
I think you need create lists from string by split with \n, then strip start and end whitespaces:
splitlines solution:
df1 = pd.DataFrame({'lyric':textbox.splitlines()})
df1.lyric = df1.lyric.str.strip()
print (df1)
lyric
0 So, so you think you can tell
1 Heaven from hell
split solution:
print (textbox.split('\n'))
['So, so you think you can tell ', 'Heaven from hell']
df1 = pd.DataFrame({'lyric':textbox.split('\n')})
df1.lyric = df1.lyric.str.strip()
print (df1)
lyric
0 So, so you think you can tell
1 Heaven from hell
strip by list comprehension:
df1 = pd.DataFrame({'lyric':[x.strip() for x in textbox.split('\n')]})
print (df1)
lyric
0 So, so you think you can tell
1 Heaven from hell
EDIT:
I think you need replace:
df1 = pd.DataFrame({'lyric':[textbox]})
df1.lyric = df1.lyric.str.replace('\n', '')
print (df1)
lyric
0 So, so you think you can tell Heaven from hell
Related
I would like to keep in the column only the two first words of a cell in a dataframe.
For instance:
df = pd.DataFrame(["I'm learning Python", "I don't have money"])
I would like that the results in the column have the following output:
"I'm learning" ; "I don't"
After that, if possible I would like to add '*' between each word. So would be like:
"*I'm* *learning*" ; "*I* *don't*"
Thanks for all the help!
You can use a regex with str.replace:
df[0].str.replace(r'(\S+)\s(\S+).*', r'*\1* *\2*', regex=True)
output:
0 *I'm* *learning*
1 *I* *don't*
Name: 0, dtype: object
As a new column:
df['new'] = df[0].str.replace(r'(\S+)\s(\S+).*', r'*\1* *\2*', regex=True)
output:
0 new
0 I'm learning Python *I'm* *learning*
1 I don't have money *I* *don't*
Im trying to add " in the beginning and end of the column in a df.
Eg - Initial dataframe:
A B
Hello What
I Is
AM MY
Output:
A B
Hello "What
I Is
AM MY"
You could use iat to index on the specific strings, and format them with:
df.iat[0,1] = f'"{df.iat[0,1]}'
df.iat[-1,1] = f'{df.iat[-1,1]}"'
print(df)
A B
0 Hello "What
1 I Is
2 AM MY"
I have a pandas data frame DF
A
["I need PEN"
["something went wrong in LAPTOP"
"I eat MANGO"
"I dont know anything "]
and a Python list matches ["BAT","PEN","LAPTOP","I","SCHOOL",,,,]
need a new column B to be added which matches strings from list
df['B']=df['A'].str.extract("(" + "|".join(matchers) + ")",expand=True)
Use str.findall and then join:
import pandas as pd
import re
df = pd.DataFrame({"A":["I need PEN",
"something went wrong in LAPTOP",
"I eat MANGO",
"I dont know anything about school"]})
matches = ["BAT","PEN","LAPTOP","I","SCHOOL"]
pattern = "|".join(f"\\b{i}\\b" for i in matches)
df["B"] = df['A'].str.findall(pattern,flags=re.IGNORECASE).str.join(",")
print (df)
#
A B
0 I need PEN I,PEN
1 something went wrong in LAPTOP LAPTOP
2 I eat MANGO I
3 I dont know anything about school I,school
Just use df.apply function
def fn_apply(x):
default_list = ["BAT","PEN","LAPTOP","I","SCHOOL"]
b_list = []
for item in default_list:
if item.upper() in x.A.upper().split():
b_list.append(item)
return ",".join(b_list)
df['B'] = df.apply(fn_apply, axis=1)
df
A B
0 I need PEN PEN,I
1 something went wrong in LAPTOP LAPTOP
2 eat MANGO
3 dont know anythingabout school SCHOOL
Let me know if this works for you
with easy pattern
import re
df['B'] = df['A'].str.findall('(' + '|'.join(matches) + ')', flags=re.IGNORECASE).str.join(',')
I have the following string:
"hello, I'm going to eat to the fullest today hhhhhhhhhhhhhhhhhhhhh"
I have collected many tweets like that and assigned them to a dataframe. How can I clean those rows in dataframe by removing "hhhhhhhhhhhhhhhhhh" and only let the rest of the string in that row?
I'm also using countVectorizer later, so there was a lot of vocabularies that contained 'hhhhhhhhhhhhhhhhhhhhhhh'
Using Regex.
Ex:
import pandas as pd
df = pd.DataFrame({"Col": ["hello, I'm going to eat to the fullest today hhhhhhhhhhhhhhhhhhhhh", "Hello World"]})
#df["Col"] = df["Col"].str.replace(r"\b(.)\1+\b", "")
df["Col"] = df["Col"].str.replace(r"\s+(.)\1+\b", "").str.strip()
print(df)
Output:
Col
0 hello, I'm going to eat to the fullest today
1 Hello World
You may try this:
df["Col"] = df["Col"].str.replace(u"h{4,}", "")
Where you may set the number of characters to match in my case 4.
Col
0 hello, I'm today hh hhhh hhhhhhhhhhhhhhh
1 Hello World
Col
0 hello, I'm today hh
1 Hello World
I used unicode matching, since you mentioned you are in tweets.
How do I remove multiple spaces between two strings in python.
e.g:-
"Bertug 'here multiple blanks' Mete" => "Bertug Mete"
to
"Bertug Mete"
Input is read from an .xls file. I have tried using split() but it doesn't seem to work as expected.
import pandas as pd , string , re
dataFrame = pd.read_excel("C:\\Users\\Bertug\\Desktop\\example.xlsx")
#names1 = ''.join(dataFrame.Name.to_string().split())
print(type(dataFrame.Name))
#print(dataFrame.Name.str.split())
Let me know where I'm doing wrong.
I think use replace:
df.Name = df.Name.replace(r'\s+', ' ', regex=True)
Sample:
df = pd.DataFrame({'Name':['Bertug Mete','a','Joe Black']})
print (df)
Name
0 Bertug Mete
1 a
2 Joe Black
df.Name = df.Name.replace(r'\s+', ' ', regex=True)
#similar solution
#df.Name = df.Name.str.replace(r'\s+', ' ')
print (df)
Name
0 Bertug Mete
1 a
2 Joe Black