Getting the length of text in a dataframe in python - python

So i have this dataframe:
Text target
#Coronavirus is a cover for something else. #5... D
Crush the One Belt One Road !! \r\n#onebeltonf... B
RT #nickmyer: It seems to be, #COVID-19 aka #c... B
#Jerusalem_Post All he knows is how to destroy... B
#newscomauHQ Its gonna show us all. We will al... B
Where Text are tweets and i am trying to get the count of each string in the text column and input the count into the dataframe. And i have tried this
d = pd.read_csv('5gCoronaFinal.csv')
d['textlength'] = [len(int(t)) for t in d['Text']]
But it keeps giving me this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-42-dabcab1de7b2> in <module>
----> 1 d['textlength'] = [len(t) for t in d['Text']]
<ipython-input-42-dabcab1de7b2> in <listcomp>(.0)
----> 1 d['textlength'] = [len(t) for t in d['Text']]
TypeError: object of type 'float' has no len()
I've tried converting t to integer like so:
d['textlength'] = [len(int(t)) for t in d['Text']]
but then it gives me this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-43-9ae56e5f7912> in <module>
----> 1 d['textlength'] = [len(int(t)) for t in d['Text']]
<ipython-input-43-9ae56e5f7912> in <listcomp>(.0)
----> 1 d['textlength'] = [len(int(t)) for t in d['Text']]
ValueError: invalid literal for int() with base 10: '#Coronavirus is a cover for something else. #5g is being rolled out and they are expecting lots to...what? Die from #60ghz +. They look like they are to keep the cold in? #socialdistancing #covid19 #
I need some help thanks!

You can use the str accessor for vectorised string operations. In this case you can use str.split and str.len:
df['Text_length'] = df.Text.str.split().str.len()
print(df)
Text target Text_length
0 #Coronavirus is a cover for something else. #5... D 8
1 Crush the One Belt One Road !! \r\n#onebeltonf... B 8
2 RT #nickmyer: It seems to be, #COVID-19 aka # B 9
3 #Jerusalem_Post All he knows is how to destroy B 8
4 #newscomauHQ Its gonna show us all. We will al B 9

Related

"There are no fields in dtype int64." why am i getting it?

b= np.array([[1,2,3,4,5],[2,3,4,5,6]])
b[1,1]
output:----------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-15-320f1bda41d3> in <module>()
9 """
10 # let's say we want to access the digit 5 in 2nd row.
---> 11 b[1,1]
12 # here the the 1st one is representing the row no. 1 but you may ask the question if the 5 is in the 2nd row then why did we passed the argument saying the row that wwe want to access is 1.
13 # well the answer is pretty simple:- the thing is here we are providing the index number that is assigned by the python it has nothing to do with the normal sequencing that starts from 1 rather we use python sequencing that starts from 0,1,2,3....
KeyError: 'There are no fields in dtype int64.'

TypeError: 'numpy.float64' object cannot be interpreted as an integer fake news detection

I am getting this error and not able to resolve it and not able to find it on the internet.
TypeError: 'numpy.float64' object cannot be interpreted as an integer
TypeError Traceback (most recent call last)
<ipython-input-10-33f2a17ec582> in <module>
20 print("Saving New CSV file")
21 if __name__=='__main__':
---> 22 dataSetExtraction()
<ipython-input-10-33f2a17ec582> in dataSetExtraction()
6 dfReal=processRealNewsDataFrame(dfReal)
7 dfCombine=[]
----> 8 for d in extractTopRealResultsForCrawling(dfReal):
9 print('len of datadrame :',d['URL'].size)
10 #d=d[:100]
<ipython-input-6-9dbfd3f21499> in extractTopRealResultsForCrawling(dfReal)
6 listOfIndex=[]
7 df=[]
----> 8 for i in range(0,loop):
9 listOfIndex.append(dfReal[i*10000:(i+1)*10000])
10 df+=[dfReal[i*10000:(i+1)*10000]]
TypeError: 'numpy.float64' object cannot be interpreted as an integer
This is code giving the error. I have not been able to remove the error Please help me
def extractTopRealResultsForCrawling(dfReal):
print("Retrieve top 20000 Real news data")
num=dfReal.size
loop=num/10000
listOfIndex=[]
df=[]
for i in range(0,loop):
listOfIndex.append(dfReal[i*10000:(i+1)*10000])
df+=[dfReal[i*10000:(i+1)*10000]]
#print "length of dataframe array retrieved:",len(df[0])
return df[:LEN]
The range function can only receive integer values
Here is a minimal code reproducing (more or less) the problem:
>>> a = 2.0
>>> [i for i in range(a)]
Traceback (most recent call last):
File "<pyshell#15>", line 1, in <module>
[i for i in range(a)]
TypeError: 'float' object cannot be interpreted as an integer
You need to convert the value to an integer
>>> [i for i in range(int(a))]
[0, 1]
In your code you should use:
for i in range(int(loop)):
Alternatively, you could do:
for i in range(0, num, 10000):
listOfIndex.append(dfReal[i:i+10000])
df+=[dfReal[i:i+10000]]
avoiding the division...

Getting Type Error: Can only concatenate tuple (not "int") to tuple

PROBLEM:
I am not understanding where am I getting wrong and where should I correct my code. If anyone gets the solution please help.
MY CODE:
def myfunc(*args):
#take entries
my_list=list()
formula=2*(args)+5
#make loop for entering in formula
for i in args:
print('{}/n'.format(formula))
return formula
ERROR:
Traceback (most recent call last)
----> 1 myfunc([2,4,5])
in myfunc(*args)
2 #take entries
3 my_list=list()
----> 4 formula=2*(args)+5
6 for i in args:
7 return formula
TypeError: can only concatenate tuple (not "int") to tuple

How do I find lowercase words in a DataFrame column that has NaNs?

I have a dataframe column that has these values in one of its columns:
Jerry
NaN
bill
Sol
I want to catch the all lowercase names, i.e., bill. But my code keeps getting stuck, I think on the NaN.
Here is my code:
for n in df_copy.name:
if n.islower():
print(n)
I get this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-296-2e5fe579149d> in <module>
1 for n in df_copy.name:
----> 2 if n.islower():
3 print(n)
AttributeError: 'float' object has no attribute 'islower'
So I tried making the values a string:
for n in df_copy.name:
if n.str.islower():
print(n)
It gives me this error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-295-7e9d8aa5abad> in <module>
1 for n in df_copy.name:
----> 2 if n.str.islower():
3 print(n)
AttributeError: 'str' object has no attribute 'str
Argh. Does anyone know how to solve this?
We can using str.islower
df[df.name.str.islower().fillna(False)]
Out[243]:
name
2 bill

AttributeError: 'str' object has no attribute 'search_nodes' - Python

I've built a tree using ete2 package. Now I'm trying to write a piece of code that takes the data from the tree and a csv file and does some data analysis through the function fre.
Here is an example of the csv file I've used:
PID Code Value
1 A1... 6
1 A2... 5
2 A.... 4
2 D.... 1
2 A1... 2
3 D.... 5
3 D1... 3
3 D2... 5
Here is a simplified version of the code
from ete2 import Tree
import pandas as pd
t= Tree("((A1...,A2...)A...., (D1..., D2...)D....).....;", format=1)
data= pd.read_csv('/data_2.csv', names=['PID','Code', 'Value'])
code_count = data.groupby('Code').sum()
total_patients= len(list (set(data['PID'])))
del code_count['PID']
############
def fre(code1,code2):
code1_ancestors=[]
code2_ancestors=[]
for i in t.search_nodes(name=code1)[0].get_ancestors():
code1_ancestors.append(i.name)
for i in t.search_nodes(name=code2)[0].get_ancestors():
code2_ancestors.append(i.name)
common_ancestors = []
for i in code1_ancestors:
for j in code2_ancestors:
if i==j:
common_ancestors.append(i)
print common_ancestors
####
for i in patients_list:
a= list (data.Code[data.PID==patients_list[i-1]])
#print a
for j in patients_list:
b= list (data.Code[data.PID==patients_list[j-1]])
for k in a:
for t in b:
fre (k,t)
However, an error is raising which is:
AttributeError Traceback (most recent call last)
<ipython-input-12-f9b47fcec010> in <module>()
38 for k in a:
39 for t in b:
---> 40 fre (k,t)
<ipython-input-12-f9b47fcec010> in fre(code1, code2)
12 code1_ancestors=[]
13 code2_ancestors=[]
---> 14 for i in t.search_nodes(name=code1)[0].get_ancestors():
15 code1_ancestors.append(i.name)
16 for i in t.search_nodes(name=code2)[0].get_ancestors():
AttributeError: 'str' object has no attribute 'search_nodes'
I've tried to manually pass all possible values to the function and it works! However, When I'm using the last section of the code, it raises the error.
You're changing your global variable 't' with your for loop.
If you print out its value before each call to your function, you will find that you have assigned it to a string at some point.

Categories

Resources