Generate a pandas dataframe from collections.OrderedDict

Generate a pandas dataframe from collections.OrderedDict - python

I have to open this xml file from this website and make a dataframe.
I tried to pas a xml to dict and then pass to dataframe
from urllib.request import urlopen
import xmltodict
from collections import OrderedDict
from io import StringIO
from collections import OrderedDict, Counter
import pandas as pd
file = urlopen('https://analisi.transparenciacatalunya.cat/download/8s6p-h233/text%2Fxml')
data_bytes = file.read()
orderDictListData = xmltodict.parse(data_bytes)
orderDictListData
df = pd.DataFrame(orderDictListData)
I need a dataframe since key "id" until "codiINEmunicipi" like that:

How about simply using pandas.read_xml:
url = 'https://analisi.transparenciacatalunya.cat/download/8s6p-h233/text%2Fxml'
df = pd.read_xml(url)
output:
id nom carrec tractament resp iddep dep idpare codidep nif ordre datamodificacio datacreacio centres sinonims
0 535 012 Atenció Ciutadana None None None 3392 Departament de la Vicepresidència i de Polítiques Digitals i Territori 6564 PTO None 912000 02/06/2021 19/06/1997 NaN NaN
1 3383 061 Salut Respon None None None 2803 Departament de Salut 7021 SLT None 1000 23/02/2021 19/06/1997 NaN NaN
2 5500 ACCIÓ - Agència per a la Competitivitat de l'Empresa consellera delegada de l'Agència per a la Competitivitat de l'Empresa, ACCIÓ Sra. Natàlia Mas Guix 19775 Departament d'Empresa i Treball 19035 EMO S-0800476-D 323699 28/02/2022 19/06/1997 NaN NaN
3 5504 ACCIÓ a Girona delegat d'ACCIÓ a Girona Sr. Ferran Rodero 19775 Departament d'Empresa i Treball 5500 EMO None 10500 25/01/2016 19/06/1997 NaN NaN
4 5505 ACCIÓ a Lleida delegada d'ACCIÓ a Lleida Sra. Clara Porta Sànchez 19775 Departament d'Empresa i Treball 5500 EMO None 11500 25/01/2016 19/06/1997 NaN NaN
...

Related

beautifulsoup problems extracting a table

It's literally my first time using BeautifulSoup, and I'm having trouble extracting the table I want to work with ([https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments]). I want to extract the table table table-products sortable searchable .
import requests
from bs4 import BeautifulSoup
url="https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments"
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, "html.parser")
table = soup.find("table", class_="table table-products sortable searchable ")
table_data = table.tbody.find_all("tr")
This outputs:
AttributeError: 'NoneType' object has no attribute 'tbody'.
I guess I'm not reaching the table correctly, which is why it comes out as 'NoneType'.

There must be something wrong with the CSS class filter, without it works:
table = soup.find("table")
table_data = table.tbody.find_all("tr")
Add the class filter back but remove the trailing space:
table = soup.find("table", class_="table table-products sortable searchable") # last space at the end removed
Works too.
See:
BeautifulSoup and class with spaces
Beautiful Soup find element with multiple classes.

you have an extra space at the end class - table = soup.find("table", class_="table table-products sortable searchable ") But you can get table more simple:
import pandas as pd
df = pd.read_html('https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments')[1]
print(df)
OUTPUT:
Statut ... Remise à disposition
0 Rupture de stock ... NaN
1 Rupture de stock ... NaN
2 Rupture de stock ... NaN
3 Tension d'approvisionnement ... NaN
4 Remise à disposition ... NaN
.. ... ... ...
373 Arrêt de commercialisation ... NaN
374 Rupture de stock ... NaN
375 Rupture de stock ... NaN
376 Remise à disposition ... 2 mars 2021
377 Rupture de stock ... NaN

There are two tables in the webpage and class value table table-products sortable searchable select both of them. The desired table is 2 and I use pandas to pull the complete table data
import pandas as pd
df = pd.read_html('https://ansm.sante.fr/disponibilites-des-produits-de-sante/medicaments')[1]
print(df)
Output:
Statut ... Remise à disposition
0 Rupture de stock ... NaN
1 Rupture de stock ... NaN
2 Rupture de stock ... NaN
3 Tension d'approvisionnement ... NaN
4 Remise à disposition ... NaN
.. ... ... ...
373 Arrêt de commercialisation ... NaN
374 Rupture de stock ... NaN
375 Rupture de stock ... NaN
376 Remise à disposition ... 2 mars 2021
377 Rupture de stock ... NaN
[378 rows x 4 columns]

AttributeError: 'str' object has no attribute 'values' (compare dataframes)

I am a beginner at Python, and I have some issues with a code I wrote.
I have 2 dataframes: one with general informations about books (dfMediaGe), and the other with the names of books which were shown on TV (dfTV).
My goal is to compare them, and to fill the column 'At least 1 TV emission' in dfMediaGe with a 1 if the book appears in the dfTV dataframe.
My difficulty is that the dataframes do not have the same number of lines/columns.
Here is a sample of dfMediaGe :
Titre original_title AUTEUR DATE EDITEUR THEMESIMPLE THEME GENRE rating rating_average ... current_count done_count list_count recommend_count review_count TRADUITDE LANGUEECRITURE NOTE At least 1 TV emission id
0 La souris des dents NaN Roger, Marie-Sabine|Desbons, Marie 01/01/2021 Lito TIPJJ001 Eveil J000100 Jeunesse - Eveil et Fiction / Histoire... GJEU003 Jeunesse / Mini albums|GJEU013 Jeuness... NaN NaN ... 0.0 0.0 0.0 0.0 0.0 NaN fre NaN 0 46220676.0
1 La petite mare du grand crocodile NaN Buteau, Gaëlle|Hudrisier, Cécile 01/01/2021 Lito TIPJJ001 Eveil J000100 Jeunesse - Eveil et Fiction / Histoire... GJEU003 Jeunesse / Mini albums|GJEU013 Jeuness... NaN NaN 0.0 0.0 0.0 0.0 0.0 0.0 NaN fre NaN 46220678.0
and here is a sample of dfTV :
Titre AUTEUR DATE EDITEUR GENRE THEMESIMPLE TRADUITDE NOTE THEME LANGUEECRITURE FORMATNUMERIQUE PUBLIC MATIERE LEXIQUE DESCRIPTION
0 Les strates Bagieu, Pénélope 11/12/2021 Gallimard NaN TIPBD001 Albums NaN NaN T090200 Bandes dessinées / Bandes dessinées fre NaN NaN NaN NaN 1 vol. ; illustrations en noir et blanc ; 24 x...
And here is the code I wrote, which is not working at all.
for Titre, r in dfMediaGe.iterrows():
for Titre, r in dfTV.iterrows():
p = 0
if r['Titre'].values == (dfTV['Titre']).values.any():
p = 1
r['Au moins 1 passage TV'].append(p)
I do get this error :
AttributeError: 'str' object has no attribute 'values'
Thank you very much for your help !!

I don't think your two data frames not having the same amount of columns is a problem.
You can achieve what you are looking for using this:
data_dfMediaGe = [
['Les strates Bagieu'],
['La petite mare du grand crocodile'],
['La souris des dents NaN Roger'],
['Movie XYZ']
]
dfMediaGe = pd.DataFrame(data=data_dfMediaGe, columns=['Titre'])
dfMediaGe['Au moins 1 passage TV'] = 0
data_dfTV = [
['La petite mare du grand crocodile'],
['Movie XYZ']
]
dfTV = pd.DataFrame(data=data_dfTV, columns=['Titre'])
for i, row in dfMediaGe.iterrows():
if row['Titre'] in list(dfTV['Titre']):
dfMediaGe.at[i, 'Au moins 1 passage TV'] = 1
print(dfMediaGe)
Titre Au moins 1 passage TV
0 Les strates Bagieu 0
1 La petite mare du grand crocodile 1
2 La souris des dents NaN Roger 0
3 Movie XYZ 1
All you have to do is iterate through rows in dfMediaGe and check if the value in the Titrecolumn is present in dfTV in the Titre column.

Pandas categorize a dataframe based on another dataframe with substrings

I'm trying to learn pandas and python to transfer some problems from excel to pandas/python. I have a big csv file from my bank with over 10000 records. I want to categorize the records based on the description. For that I have a big mapping file with keywords. In excel I used vLookup and I'm trying to get this solution into Pandas/python
So I can read the csv into a dataframe dfMain. One column (in dfMain) with text called description is for me input to categorize it based on an the mapping file called dfMap.
dfMain looks simplified something like this:
Datum Bedrag Description
2020-01-01 -166.47 een cirkel voor je uit
2020-01-02 -171.79 even een borreling
2020-01-02 -16.52 stilte zacht geluid
2020-01-02 -62.88 een steentje in het water
2020-01-02 -30.32 gooi jij je zorgen weg
2020-01-02 -45.99 dan ben je laf weet je dat
2020-01-02 -322.44 je klaagt ook altijd over pech
2020-01-03 -4.80 jij kan niet ophouden zorgen
2020-01-07 5.00 de wereld te besnauwen
dfMap looks simplified like this
sleutel code
0 borreling A1
1 zorgen B2
2 steentje C2
3 een C1
dfMap contains keywords('sleutel') and a Category code ('code').
When the 'sleutel' is a substring of 'description' in dfMain an extra to be added column called 'category' in dfMain will get the value of the code.
I'm aware that multiple keywords can apply to certain values of description but first come counts, in other words: the number of rows in dfMain must stay the same.
the resulting data frame must then look like this:
Out[34]:
Datum Bedrag Description category
2020-01-01 -166.47 een cirkel voor je uit C1
2020-01-02 -171.79 even een borreling A1
2020-01-02 -16.52 stilte zacht geluid NaN
2020-01-02 -62.88 een steentje in het water C2
2020-01-02 -30.32 gooi jij je zorgen weg B2
2020-01-02 -45.99 dan ben je laf weet je dat NaN
2020-01-02 -322.44 je klaagt ook altijd over pech NaN
2020-01-03 -4.80 jij kan niet ophouden zorgen B2
2020-01-07 5.00 de wereld te besnauwen NaN
I tried a lot of things with join but can't get it to work.

Try this:
import pandas as pd
# prepare the data
Datum = ['2020-01-01', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-02', '2020-01-03', '2020-01-03']
Bedrag = [-166.47, -171.79, -16.52, -62.88, -30.32, -45.99, -322.44, -4.80, 5.00]
Description = ["een cirkel voor je uit", "even een borreling", "stilte zacht geluid", "een steentje in het water",
"gooi jij je zorgen weg", "dan ben je laf weet je dat", "je klaagt ook altijd over pech", "jij kan niet ophouden zorgen", "de wereld te besnauwen"]
dfMain = pd.DataFrame(Datum, columns=['Datum'])
dfMain['Bedrag'] = Bedrag
dfMain['Description'] = Description
sleutel = ["borreling", "zorgen", "steentje", "een"]
code = ["A1", "B2", "C2", "C1"]
dfMap = pd.DataFrame(sleutel, columns=['sleutel'])
dfMap['code'] = code
print(dfMap)
# solution
map_code = pd.Series(dfMap.code.values ,index=dfMap.sleutel).to_dict()
def extract_codes(row):
for item in map_code:
if item in row:
return map_code[item]
return "NaN"
dfMain['category'] = dfMain['Description'].apply(extract_codes)
print(dfMain)

An efficient solution is to use a regex with extract and then to map the result:
regex = '(%s)' % dfMap['sleutel'].str.cat(sep='|')
dfMain['category'] = (
dfMain['Description']
.str.extract(regex, expand=False)
.map(dfMap.set_index('sleutel')['code'])
)
Output:
Datum Bedrag Description category
0 2020-01-01 -166.47 een cirkel voor je uit C1
1 2020-01-02 -171.79 even een borreling C1
2 2020-01-02 -16.52 stilte zacht geluid NaN
3 2020-01-02 -62.88 een steentje in het water C1
4 2020-01-02 -30.32 gooi jij je zorgen weg B2
5 2020-01-02 -45.99 dan ben je laf weet je dat NaN
6 2020-01-02 -322.44 je klaagt ook altijd over pech NaN
7 2020-01-03 -4.80 jij kan niet ophouden zorgen B2
8 2020-01-07 5.00 de wereld te besnauwen NaN
The regex generated will end up as '(borreling|zorgen|steentje|een)'

how to concatenate names in a DF whit with surnames when there are nan entries

well i have this DF in python
folio id_incidente nombre app apm \
0 1 1 SIN DATOS SIN DATOS SIN DATOS
1 131 100085 JUAN DOMINGO GONZALEZ DELGADO
2 132 100085 FRANCISCO JAVIER VELA RAMIREZ
3 133 100087 JUAN CARLOS PEREZ MEDINA
4 134 100088 ARMANDO SALINAS SALINAS
... ... ... ... ... ...
1169697 1223258 866846 IVAN RIVERA SILVA
1169698 1223259 866847 EDUARDO PLASCENCIA MARTINEZ
1169699 1223260 866848 FRANCISCO JAVIER PLASCENCIA MARTINEZ
1169700 1223261 866849 JUAN ALBERTO MARTINEZ ARELLANO
1169701 1223262 866850 JOSE DE JESUS SERRANO GONZALEZ
foto_barandilla fecha_hora_registro
0 1.jpg 0/0/0000 00:00:00
1 131.jpg 2008-08-07 15:42:25
2 132.jpg 2008-08-07 15:50:42
3 133.jpg 2008-08-07 16:37:24
4 134.jpg 2008-08-07 17:18:12
... ... ...
1169697 20200330103123_239288573.jpg 2020-03-30 10:32:10
1169698 20200330103726_1160992585.jpg 2020-03-30 10:38:25
1169699 20200330103837_999151106.jpg 2020-03-30 10:39:44
1169700 20200330104038_29275767.jpg 2020-03-30 10:41:52
1169701 20200330104145_640780023.jpg 2020-03-30 10:45:35
here the app and apm are the mother and father surnames, then i tried these in order to get another column with the whole name
names = {}
for i in range(1,df.shape[0]+1):
try:
names[i] = df["nombre"].iloc[i]+' '+df["app"].iloc[i]+' '+df["apm"].iloc[i]
except:
print(df["folio"].iloc[i], df["nombre"].iloc[i],df["app"].iloc[i],df["apm"].iloc[i])
but i get these
400085 nan nan nan
400631 nan nan nan
401267 nan nan nan
401933 nan nan nan
401942 nan nan nan
402030 nan nan nan
403008 nan nan nan
403010 nan nan nan
403011 nan nan nan
403027 nan nan nan
403384 nan nan nan
403399 nan nan nan
403415 nan nan nan
403430 nan nan nan
404764 nan nan nan
501483 CARLOS ESPINOZA nan
504723 RICARDO JARED LOPEZ ACOSTA nan
506989 JUAN JOSE FLORES OCHOA nan
507376 JOSE DE JESUS VENEGAS nan
.....
i tried to use the fillna.('') like this
df["app"].fillna('')
df["apm"].fillna('')
df["nombre"].fillna('')
but the result is the same, i hope you can help me in order to make the column with the whole name, like name+surname1+surname2
edit: here is my minimal version, the reporte files are(each one) a part of the whole database as show up here,
for i in range(1,31):
exec('reporte_%d = pd.read_excel("/home/workstation/Desktop/fotos/Fotos/Detenidos/Reporte Detenidos CER %d.xlsx", encoding="latin1" )'%(i,i))
reportes = [reporte_1,reporte_2,reporte_3,reporte_4,reporte_5,reporte_6,reporte_7,reporte_8,reporte_9,reporte_10,reporte_11,reporte_12,reporte_13,reporte_14,reporte_15,reporte_16,reporte_17,reporte_18,reporte_19,reporte_20,reporte_21,reporte_22,reporte_23,reporte_24,reporte_25,reporte_26,reporte_27,reporte_28,reporte_29,reporte_30]
df = pd.concat(reportes)
now when i run
df['Full_name'] = [' '.join([y for y in x if pd.notna(y)]) for x in zip(df['nombre'], df['app'], df['apm'])]
i get this error TypeError: sequence item 1: expected str instance, int found

You will ' '.join all the words after removing the null values. It's a string operation and apply(axis=1) gets slow so we can use a list comprehension:
Sample Data
nombre app apm
0 Mr. blah bar
1 blah blah foo
2 NaN NaN NaN
3 blah Mr. bar
4 blah foo Mr.
5 foo Mr. blah
6 NaN Mr. foo
7 blah Mr. NaN
8 NaN bar bar
9 foo Mr. Mr.
Code
df['Full_name'] = [' '.join([y for y in x if pd.notna(y)])
for x in zip(df['nombre'], df['app'], df['apm'])]
# nombre app apm Full_name
#0 Mr. blah bar Mr. blah bar
#1 blah blah foo blah blah foo
#2 NaN NaN NaN # value is the empty string `''`
#3 blah Mr. bar blah Mr. bar
#4 blah foo Mr. blah foo Mr.
#5 foo Mr. blah foo Mr. blah
#6 NaN Mr. foo Mr. foo
#7 blah Mr. NaN blah Mr.
#8 NaN bar bar bar bar
#9 foo Mr. Mr. foo Mr. Mr.

You want to keep your processing within pandas as much as possible. By building a python dictionary with strings, you explode your memory footprint and defeat the purpose of using pandas in the first place. You can use the pandas str.concat method to put the strings together, so nominally its just
df["Concatenated"] = df["nombre"].str.cat([df["app"], df["apm"]], sep=" ")
But it sounds like your dataframe need a bit of cleanup first. Like for instance, what is that "foto_barandilla fecha_hora_registro" stuff half way down? Here is a fully worked example of a clean dataframe and concatenation
import pandas as pd
import re
data = """folio id_incidente nombre app apm
1 1 SIN DATOS SIN DATOS SIN DATOS
131 100085 JUAN DOMINGO GONZALEZ DELGADO
132 100085 FRANCISCO JAVIER VELA RAMIREZ
133 100087 JUAN CARLOS PEREZ MEDINA
134 100088 ARMANDO SALINAS SALINAS
1223258 866846 IVAN RIVERA SILVA
1223259 866847 EDUARDO PLASCENCIA MARTINEZ
1223260 866848 FRANCISCO JAVIER PLASCENCIA MARTINEZ
1223261 866849 JUAN ALBERTO MARTINEZ ARELLANO
1223262 866850 JOSE DE JESUS SERRANO GONZALEZ"""
# make test dataframe
table = []
for line in data.split("\n"):
line = line.strip()
table.append(re.split(r"\s{2,}", line))
df = pd.DataFrame(table[1:], columns=table[0])
# enusre data types and scrub the data
df = df.astype(
{"folio":int, "id_incidente":int, "nombre":"string",
"app":"string", "apm":"string"},errors="ignore")
df.update(df[["nombre", "app", "apm"]].fillna(" "))
# build new column
df["Concatenated"] = df["nombre"].str.cat([df["app"], df["apm"]], sep=" ")
print(df)
# ... or, if you don't want to scrub the dataframe first
df["Concatenated"] = df["nombre"].fillna(" ").str.cat(
[df["app"].fillna(" "), df["apm"].fillna(" ")], sep=" ")
print("================================================")
print(df)

AttributeError: 'NoneType' object has no attribute 'find_all' Beautifulsoup wrong class

I'm trying to scrape a house's price of this link : https://www.bienici.com/recherche/achat/france?page=2
And I need to know what's wrong with my program ?
My program :
from bs4 import BeautifulSoup
import requests
import csv
with open("out.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow("Prix")
for i in range(1, 20):
url = "https://www.bienici.com/recherche/achat/france?page=%s" % i
soup = BeautifulSoup(requests.get(url).text, "html.parser")
data = soup.find(class_="resultsListContainer")
for data in data.find_all(class_="sideListItemContainerInTableForListInResult"):
prix = data.find("span", {"class": "thePrice"})
prix = prix.text if prix else ""
writer.writerow(prix)
I get this error :
Traceback (most recent call last):
File "1.py", line 16, in <module>
for data in data.find_all(class_="sideListItemContainerInTableForListInResult"):
AttributeError: 'NoneType' object has no attribute 'find_all'
I think my error is in the class_="sideListItemContainerInTableForListInResult" , but when I inspect the html code I think it's the right one !!

It looks like you can access the data through the json response. You'll have to play around with the paramters and dig through the json structure to pull out what you want need, but look like quite a bit of data you are able to grab:
import requests
payload = {'filters': '{"size":24,"from":0,"filterType":"buy","propertyType":["house","flat"],"newProperty":false,"page":2,"resultsPerPage":24,"maxAuthorizedResults":2400,"sortBy":"relevance","sortOrder":"desc","onTheMarket":[true],"limit":"ih{eIzjhZ?q}qrAzaf}AlrD?rvfrA","showAllModels":false,"blurInfoType":["disk","exact"]}'}
url = 'https://www.bienici.com/realEstateAds.json'
response = requests.get(url, params = payload).json()
for prop in response['realEstateAds']:
title = prop['title']
city = prop['city']
desc = prop['description']
price = prop['price']
print ('%s - %s\n%s\n%s' %(price, title, city, desc))
Output:
print ('%s - %s\n%s\n%s' %(price, title, city, desc))
1190000 - BALMA - Bien d'exception 12 pièces 400 m2
Pin-Balma
- EXCLUSIVITÉ - Bien d'exception à BALMA (31130) . - 400 m2 - 12 pièces - 7 chambres - Sur 3000 m2 de terrain. . Luxe, calme et volupté dans cette magnifique et très rare maison divisée en deux lots d'habitation communicants (qui peuvent aussi être indépendants). L'un de 260 m2 et l'autre de 140 m2. Chacun avec son entrée, son séjour, sa cuisine, ses salles de bains et pièces d'eau, ses chambres, ses terrasses et ses équipements de qualité. Et ce, ouvrant le champ des possibles quant aux projets potentiels !. . Cette bâtisse de prestige du milieu du XVIIIème siècle a vu ses rénovations et prestations inhérentes réalisées avec des matériaux et des façons d'excellence.. . Sur les 3000 m2 de terrain, un jardin paysager orne les abords de la maison et de la piscine. Puis, vous trouverez un pré et un bois privé qui réveilleront vos aspirations bucoliques. Vous pourrez ainsi vous blottir dans un écrin précieux niché à proximité de TOULOUSE.. . Vos hôtes et vous serez à proximité des commodités, des transports (dont le métro), des cliniques, des établissements scolaires et des hypermarchés ; et tout aussi proches d'Airbus (BLAGNAC et Défense), du CEAT, des SSII, de Orange Business Services, etc.. . Recevez notre invitation au voyage, là où tout n'est qu'ordre et beauté, Luxe, calme et volupté.. . Visite virtuelle disponible en agence ou en LiveRoom avec un de nos conseillers.
To get to a csv, you'll need to convert to a dataframe. Now the json structure is nested, so there are some columns that won't be entirelly flattened out. There are ways to handle that, but to get a basic dataframe:
from pandas.io.json import json_normalize
df = json_normalize(response['realEstateAds'])
Output:
print (df.to_string())
accountType adCreatedByPro adType adTypeFR addressKnown agencyFeePercentage agencyFeeUrl annualCondominiumFees availableDate balconyQuantity balconySurfaceArea bathroomsQuantity bedroomsQuantity blurInfo.bbox blurInfo.centroid.lat blurInfo.centroid.lon blurInfo.origin blurInfo.position.lat blurInfo.position.lon blurInfo.radius blurInfo.type city condominiumPartsQuantity description descriptionTextLength district.code_insee district.cp district.id district.id_polygone district.id_type district.insee_code district.libelle district.name district.postal_code district.type_id enclosedParkingQuantity endOfPromotedAsExclusive energyClassification energyValue exposition feesChargedTo floor floorQuantity greenhouseGazClassification greenhouseGazValue hasAirConditioning hasAlarm hasBalcony hasCaretaker hasCellar hasDoorCode hasElevator hasFirePlace hasGarden hasIntercom hasPool hasSeparateToilet hasTerrace heating highlightMailContact id isCalm isCondominiumInProcedure isDisabledPeopleFriendly isExclusiveSaleMandate isInCondominium isRefurbished isStudio landSurfaceArea modificationDate needHomeStaging newOrOld newProperty nothingBehindForm parkingPlacesQuantity photos postalCode price priceHasDecreased pricePerSquareMeter priceWithoutFees propertyType publicationDate reference roomsQuantity showerRoomsQuantity status.autoImported status.closedByUser status.highlighted status.is3dHighlighted status.isLeading status.onTheMarket surfaceArea terracesQuantity thresholdDate title toiletQuantity transactionType userRelativeData.accountIds userRelativeData.canChangeOnTheMarket userRelativeData.canModifyAd userRelativeData.canModifyAdBlur userRelativeData.canSeeAddress userRelativeData.canSeeContacts userRelativeData.canSeeExactPosition userRelativeData.canSeePublicationCertificateHtml userRelativeData.canSeePublicationCertificatePdf userRelativeData.canSeeRealDates userRelativeData.canSeeStats userRelativeData.importAccountId userRelativeData.isAdModifier userRelativeData.isAdmin userRelativeData.isFavorite userRelativeData.isNetwork userRelativeData.isOwner userRelativeData.searchAccountIds virtualTours with360 with3dModel workToDo yearOfConstruction
0 agency True buy vente True NaN https://www.immoceros.fr/mentions-legales-hono... 2517.0 NaN NaN NaN 1.0 3 [2.27006, 48.92827, 2.27006, 48.92827] 48.928270 2.270060 manual 48.928270 2.270060 NaN exact Colombes 47.0 COLOMBES | Agent-Sarre - Champarons |\r\nSitué... 1149 92025 92700 100331 100331 1 92025 Fossés Jean Bouvier Colombes - Fossés Jean Bouvier 92700 1 NaN 0 D 197.00 NaN seller 5.0 6.0 B 9.00 NaN NaN False NaN False NaN True NaN False NaN NaN NaN True électricité individuel False ag922079-195213238 NaN NaN NaN True True NaN NaN NaN 2019-05-11T08:53:15.943Z NaN ancien False True 2.0 [{'url_photo': 'http://photos.ubiflow.net/9220... 92700 469000 False 5097.826087 469000.0 flat 2019-04-23T18:42:59.742Z VA1952-IMMOCEROS2 4 1.0 True False False False True True 92.00 NaN NaN COLOMBES | APPARTEMENT A VENDRE | 4 PIECES - ... 1.0 buy [ubiflow-easybusiness-ag922079] False False False False False False False False False False 558bbfd06fbf04e50075bbce False False False False False [ubiflow-easybusiness-ag922079, contract-type-... [{'originalUrl': 'https://www.nodalview.com/bK... True False NaN NaN
1 agency True buy vente True NaN NaN NaN NaN NaN NaN NaN 1 [7.251231, 43.700846999999996, 7.251231, 43.70... 43.700847 7.251231 custom 43.700847 7.251231 NaN exact Nice NaN A vendre à Nice dans le quartier Grosso / Tzar... 1542 06088 06000 300102 300102 1 06088 Parc Impérial - Le Piol Nice - Parc Impérial - Le Piol 06000 1 NaN 0 E 270.00 Ouest NaN NaN 7.0 C 14.00 NaN NaN NaN NaN NaN NaN True NaN NaN True NaN NaN NaN Individuel False apimo-2871096 NaN NaN NaN True NaN NaN NaN NaN 2019-04-25T17:04:27.723Z NaN NaN False True NaN [{'url_photo': 'https://d1qfj231ug7wdu.cloudfr... 06000 215000 False 4383.282365 NaN flat 2019-04-04T18:04:38.323Z 1508 2 NaN True False False False False True 49.05 NaN NaN Nice François Grosso : F2 dernier étage, terra... NaN buy [apimo-3120] False False False False False False False False False False 5913331e150de0009ce38406 False False False False False [apimo-3120, contract-type-basic, 5913331e150d... [{'originalUrl': 'https://www.nodalview.com/PX... True False False NaN
2 agency True buy vente True NaN NaN NaN NaN NaN NaN NaN 0 [7.2526839999999995, 43.69589099999998, 7.2526... 43.695891 7.252684 custom 43.695891 7.252684 NaN exact Nice NaN Joli studio entièrement meublé et équipé à ven... 1205 06088 06000 300070 300070 1 06088 Gambetta Nice - Gambetta 06000 1 NaN 0 D 224.47 Est NaN 2.0 6.0 B 8.49 True NaN NaN NaN NaN NaN True NaN NaN True NaN NaN NaN Individuel False apimo-1008273 NaN NaN NaN True NaN NaN True NaN 2019-04-24T17:02:19.834Z NaN NaN False True NaN [{'url_photo': 'https://d1qfj231ug7wdu.cloudfr... 06000 145000 False 6722.299490 NaN flat 1970-01-01T00:00:00.000Z 1496 1 NaN True False False False False True 21.57 NaN 2019-03-29T09:52:38.387Z Nice proche mer : studio meublé avec balcon NaN buy [apimo-3120] False False False False False False False False False False 5913331e150de0009ce38406 False False False False False [apimo-3120, contract-type-basic, 5913331e150d... [{'originalUrl': 'https://www.nodalview.com/xV... True False False NaN
3 agency True buy vente True NaN http://www.willman.fr/i/redac/honoraires?honof... NaN 2017-12-27T00:00:00.000Z NaN NaN 3.0 3 [7.165397, 43.666337999999996, 7.165397, 43.66... 43.666338 7.165397 accounts 43.666338 7.165397 NaN exact Cagnes-sur-Mer NaN Située dans le quartier recherché des Bréguièr... 1229 0
Then to save it:
df.to_csv('file.csv', index=False)

The problem here is that data is NoneType, i.e. data = soup.find(class_="resultsListContainer") is returning None, which means the for loop will fail.
I don't know enough about the exact problem you're trying to solve to know if this is a problem with your code or if the website sometimes doesn't have anything in the "resultListContainer" class. If it is the case that sometime this is missing, you can do a check before reaching the for loop to make sure the data variable is not None.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Generate a pandas dataframe from collections.OrderedDict - python

Related

beautifulsoup problems extracting a table

AttributeError: 'str' object has no attribute 'values' (compare dataframes)

Pandas categorize a dataframe based on another dataframe with substrings

how to concatenate names in a DF whit with surnames when there are nan entries

AttributeError: 'NoneType' object has no attribute 'find_all' Beautifulsoup wrong class

Categories

Resources