Problems with BeautifulSoup find_all - python

I need to retrieve a few ids from a site html, it's not a hard work to do if i create some variables to store them there, however i would like to use a list to make it easier to find and work with.
The terminal returns "TypeError: list indices must be integers or slices, not str" when using the following line:
ids = site.find_all('p', class_="frase fr")['id']
I mean, using soup.find_all works fine for me, though if i use the square brackets in the end to specify where it should gather the info it don't work. Here lies the problem, how can i fix it?

The find_all method returns a list of elements, so if you want to get only the IDs for each element you will have to iterate over each one and extract the desired information.
Use this instead:
ids = [p.get('id') for p in site.find_all('p', class_="frase fr")]
This will give you a list of every ID in the tags you find, including None ones.
You can also filter the None's out using:
ids = [p.get('id') for p in site.find_all('p', class_="frase fr") if p.get('id')]

Related

Get list of values from nuke.EvalString_Knob?

I'm trying to work with a custom group node that has a bunch of EvalString_Knobs and I need to get a list of items in them but when I try using node['knobName'].values()
I get an attribute error as values isn't an attirbute of EvalString_Knob.
Anyone have a way to get the values of EvalString_Knob?
Thanks.
Let's create two EvalString_Knob inside a new tab:
import nuke
write = nuke.createNode('Write', inpanel=False)
tab = nuke.Tab_Knob("Parameters")
write.addKnob(tab)
write.addKnob(nuke.EvalString_Knob('prefix','Prefix','render'))
write.addKnob(nuke.EvalString_Knob('suffix','Suffix','7'))
In UI it looks like that:
For listing all the dictionary's knobs of a Write node use the following command:
nuke.toNode("Write1").knobs().keys()
To get a value of any known knob use this command:
nuke.toNode("Write1").knob('prefix').getValue()
To list all the properties with their corresponding values use this approach:
nuke.selectedNode().writeKnobs()
P.S.
TabKnobs have no names to reach them. Only empty strings.
You can use
node['knobName'].getValue() to get a string returned from an EvalString_Knob,
node['knobName'].values() to get a list returned from an Enumeration_Knob,
and you can use node['knobName'].Class() to find out what is the type of the knob.

'list' object has no attribute 'strip' error even when I select individual string elements?

After reading some similar questions I understand that you cannot .strip() a list of strings, only individual strings. I have a list statesData that is made of strings, and I want to strip and split each individual element. I've been able to get it to work by using two lines, but my instructor used one line for an example problem and I'm trying to figure out why I can't get the same result. What I'm confused about is that I thought that the index location [i] should have selected an individual string element within the list which could be stripped?
statesData = ['State,Population,ElectoralVotes,HighwayMiles,SquareMiles\n',
'Alabama,4802982,9,213068,52419.02\n', 'Alaska,721523,3,31618,663267.26\n']
#etc
for i in range(len(statesData)):
statesData[i] = statesData[i].strip().split(',')
statesData = ['State,Population,ElectoralVotes,HighwayMiles,SquareMiles\n',
'Alabama,4802982,9,213068,52419.02\n', 'Alaska,721523,3,31618,663267.26\n']
#etc
new_statesData = [i.strip().split(',') for i in statesData]
List comprehension is one line and easy to understand and very fast.

Extracting the 'text' field from a JSON tweet element and adding it to string array python

I have a MongoDB collection full of tweets that I have collected and now I would like to perform sentiment analysis on them but I only want to perform this analysis on the 'text' field of each element. I had initially had a piece of code to determine whether or not the element had a text field so ive altered it to try to detect whether it has the text field and if so to add it to the next element of the array however I get a Type Error shown below.
appleSentimentText[record] = record.get('text')
TypeError: list indices must be integers, not dict.
I know this means that its to do with [record] not being an integer but im confused as to how I am to make it into an integer? Im new to Python so any help would be much appreciated. Here is my snippet of code below for reference.
appleSentimentText = []
for record in db.Apple.find():
if record.get('text'):
appleSentimentText[record] = record.get('text')
Lists require their indexes to be integer, hence the error.
If you want to add to the list, use list.append or list.insert methods.
appleSentimentText.append(record.get("text"))
List methods

Check if element is list or another object

I've got an object which contains an element named "companies".
This element can either be a list of objects or just a single object (not contained within a list).
I would like to run through all companies, but this example fails if the element "companies" is just a single item (not contained within a list):
for company in companies:
I've tried to test before the for-loop, such as:
if type(companies['company']) is list:
# do your thing
but that fails as well.
Can anyone help?
Firstly, that's a really horrible way to structure data, and you should complain to whoever creates it. If an item can be a list, it should always be a list, even if that list just contains one element.
However, the code you have shown should work - although a better way to do it is if isinstance(companies['company'], list). If that's still not working, you will need to show the data, and the exact code that's using it.
You can make a list from a non-list for non-conditional use of "for ... in ...".
companies = list(companies)
for company in companies:
# use "company" in some way

I have single-element arrays. How do I change them into the elements themselves?

Importing a JSON document into a pandas dataframe using records = pandas.read_json(path), where path was a pre-defined path to the JSON document, I discovered that the content of certain columns of the resulting dataframe "records" are not simply strings as expected. Instead, each "cell" in such a column is an array, containing one single element -- the string of interest. This makes selecting columns using boolean indexing difficult. For example, records[records['category']=='Python Books'] in Ipython outputs an empty dataframe; had the "cells" contained strings instead of arrays of strings, the output would have been nonempty, containing rows that correspond to python books.
I could modify the JSON document, so that "records" reads the strings in properly. But is there a way to modify "records" directly, to somehow strip the single-element arrays into the elements themselves?
Update: After clarification, I believe this might accomplish what you want while limiting it to a single iteration over the data:
nested_column_1 = records["column_name_1"]
nested_column_2 = records["column_name_2"]
clean_column_1 = []
clean_column_2 = []
for i in range(0, len(records.index):
clean_column_1.append(nested_column_1[i][0])
clean_column_2.append(nested_column_2[i][0])
Then you convert the clean_column lists to Series like you mentioned in your comment. Obviously, you make as many nested_column and clean_column lists as you need, and update them all in the loop.
You could generalize this pretty easily by keeping a record of "problem" columns and using that to create a data structure to manage the nested/clean lists, rather than declaring them explicitly as I did in my example. But I thought this might illustrate the approach more clearly.
Obviously, this assumes that all columns have the same number of elements, which maybe isn't a a valid assertion in your case.
Original Answer:
Sorry if I'm oversimplifying or misunderstanding the problem, but could you just do something like this?
simplified_list = [element[0] for element in my_array_of_arrays]
Or if you don't need the whole thing at once, just a generator instead:
simplifying_generator = (element[0] for element in my_array_of_arrays)

Categories

Resources