I want to build URL's responding to a specific format, the URL's are made of a concatenation of French regions and departments (subdivisions of regions) codes, such as this one : https://www.resultats-elections.interieur.gouv.fr/telechargements/PR2022/resultatsT1/027/021/
I have set two dictionaries, one for the regions and one for the departments codes associated.
regions = {
'centre_loire': '024',
'bourgogne': '027',
'normandie': '028',
'grand_est': '044',
'pays_loire': '052',
'bretagne': '053',
'aquitaine': '075',
}
departements = {
'centre_loire': ['018','028','036','037','041','045'],
'bourgogne': ['021','025','039','058','070','071','089','090'],
'normandie': ['014','027','050','061','076'],
'grand_est': ['008','010','051','052','054','055','057','067','068','088'],
'pays_loire': ['044','049','053','072','085'],
'bretagne': ['022','029','035','056'],
'aquitaine': ['016','017','019','023','024','033','040','047','064','079','086','087'],
}
The idea is to iterate through those two dictionaries to build URL's but the way I have arranged the for loop associates all the departments with all the regions, even those that have nothing to do together.
regs = []
urls_final = []
for i in regions.values():
regs = (url_base+tour+'/'+str(i)+'/')
for key, values in departements.items():
for value in values:
deps_result = (regs+str(value)+'/'+str(value)+'/')
urls_final.append(deps_result)
print(urls_final)
For example, for the "bourgogne" region, the URL's created should contain only the 8 departments codes corresponding to the "bourgogne" region.
Use the key from the regions dict to get the list values of the departements dict.
regions = {
'centre_loire': '024',
'bourgogne': '027',
'normandie': '028',
'grand_est': '044',
'pays_loire': '052',
'bretagne': '053',
'aquitaine': '075',
}
departements = {
'centre_loire': ['018','028','036','037','041','045'],
'bourgogne': ['021','025','039','058','070','071','089','090'],
'normandie': ['014','027','050','061','076'],
'grand_est': ['008','010','051','052','054','055','057','067','068','088'],
'pays_loire': ['044','049','053','072','085'],
'bretagne': ['022','029','035','056'],
'aquitaine': ['016','017','019','023','024','033','040','047','064','079','086','087'],
}
url_base = "https://whatever.com"
urls = []
for reg, reg_code in regions.items():
for dep_code in departements[reg]:
urls.append(f"{url_base}/{reg_code}/{dep_code}")
from pprint import pprint
pprint(urls)
output
['https://whatever.com/024/018',
'https://whatever.com/024/028',
'https://whatever.com/024/036',
'https://whatever.com/024/037',
'https://whatever.com/024/041',
'https://whatever.com/024/045',
'https://whatever.com/027/021',
'https://whatever.com/027/025',
'https://whatever.com/027/039',
'https://whatever.com/027/058',
'https://whatever.com/027/070',
'https://whatever.com/027/071',
'https://whatever.com/027/089',
'https://whatever.com/027/090',
'https://whatever.com/028/014',
'https://whatever.com/028/027',
'https://whatever.com/028/050',
'https://whatever.com/028/061',
'https://whatever.com/028/076',
'https://whatever.com/044/008',
'https://whatever.com/044/010',
'https://whatever.com/044/051',
'https://whatever.com/044/052',
'https://whatever.com/044/054',
'https://whatever.com/044/055',
'https://whatever.com/044/057',
'https://whatever.com/044/067',
'https://whatever.com/044/068',
'https://whatever.com/044/088',
'https://whatever.com/052/044',
'https://whatever.com/052/049',
'https://whatever.com/052/053',
'https://whatever.com/052/072',
'https://whatever.com/052/085',
'https://whatever.com/053/022',
'https://whatever.com/053/029',
'https://whatever.com/053/035',
'https://whatever.com/053/056',
'https://whatever.com/075/016',
'https://whatever.com/075/017',
'https://whatever.com/075/019',
'https://whatever.com/075/023',
'https://whatever.com/075/024',
'https://whatever.com/075/033',
'https://whatever.com/075/040',
'https://whatever.com/075/047',
'https://whatever.com/075/064',
'https://whatever.com/075/079',
'https://whatever.com/075/086',
'https://whatever.com/075/087']
base = "www.domain.com/{}/{}"
urls = [base.format(dept, subdept) for region, dept in regions.items() for subdept in departments[region]]
print(urls)
Output:
['www.domain.com/024/018', 'www.domain.com/024/028', 'www.domain.com/024/036', 'www.domain.com/024/037', 'www.domain.com/024/041', 'www.domain.com/024/045', 'www.domain.com/027/021', 'www.domain.com/027/025', 'www.domain.com/027/039', 'www.domain.com/027/058', 'www.domain.com/027/070', 'www.domain.com/027/071', 'www.domain.com/027/089', 'www.domain.com/027/090', 'www.domain.com/028/014', 'www.domain.com/028/027', 'www.domain.com/028/050', 'www.domain.com/028/061', 'www.domain.com/028/076', 'www.domain.com/044/008', 'www.domain.com/044/010', 'www.domain.com/044/051', 'www.domain.com/044/052', 'www.domain.com/044/054', 'www.domain.com/044/055', 'www.domain.com/044/057', 'www.domain.com/044/067', 'www.domain.com/044/068', 'www.domain.com/044/088', 'www.domain.com/052/044', 'www.domain.com/052/049', 'www.domain.com/052/053', 'www.domain.com/052/072', 'www.domain.com/052/085', 'www.domain.com/053/022', 'www.domain.com/053/029', 'www.domain.com/053/035', 'www.domain.com/053/056', 'www.domain.com/075/016', 'www.domain.com/075/017', 'www.domain.com/075/019', 'www.domain.com/075/023', 'www.domain.com/075/024', 'www.domain.com/075/033', 'www.domain.com/075/040', 'www.domain.com/075/047', 'www.domain.com/075/064', 'www.domain.com/075/079', 'www.domain.com/075/086', 'www.domain.com/075/087']
>>>
Note: I've renamed your departements dictionary to departments.
Related
def read_data(service_client):
data = list_data(domain, realm) # This returns a data frame
building_data = []
building_names = {}
all_buildings = {}
for elem in data.iterrows():
building = elem[1]['building_name']
region_id = elem[1]['region_id']
bandwith = elem[1]['bandwith']
building_id = elem[1]['building_id']
return {
'Building': building,
'Region Id': region_id,
'Bandwith': bandwith,
'Building Id': building_id,
}
Basically I am able to return a single dictionary value upon a iteration here in this example. I have tried printing it as well and others.
I am trying to find a way to store multiple dictionary values on each iteration and return it, instead of just returning one.. Does anyone know any ways to achieve this?
You may replace your for-loop with the following to get all dictionaries in a list.
naming = {
'building_name': 'Building',
'region_id': 'Region Id',
'bandwith': 'Bandwith',
'building_id': 'Building Id',
}
return [
row[list(naming.values())].to_dict()
for idx, row in data.rename(naming, axis=1).iterrows()
]
I have two columns:
One looks like:
"Link": "https://url.com/item?variant=",
"Link": "https://url2.com/item?variant=",
"Link": "https://url3.com/item?variant=",
2nd looks like:
"link extension": ["1","2"],
"link extension": ["1","2"],
"link extension": ["1","1","3"],
What I'm trying to do is to combine them together so that my Link column looks like this:
"Link": "https://url.com/item?variant=1"
"Link": "https://url.com/item?variant=2"
"Link": "https://url2.com/item?variant=1"
"Link": "https://url2.com/item?variant=2"
"Link": "https://url3.com/item?variant=1"
"Link": "https://url3.com/item?variant=2"
"Link": "https://url3.com/item?variant=3"
However, I'm a beginner of Python - and even basic level at Pandas. I tried to find the answer, and I came across map/append options but none of them seem to work throwing different TypeError
Any help or advice on what/where to read would be very helpful.
Thank you in advance.
Here is my basic code:
def parse(self, response):
items = response.xpath("//*[#id='bc-sf-filter-products']/div")
for item in items:
link = item.xpath(".//div[#class='figcaption product--card--text under text-center']/a/#href").get()
yield response.follow(url=link, callback=self.parse_item)
def parse_item(self, response):
Title = response.xpath(".//div[#class='hide-on-mobile']/div[#class='productTitle']/text()").get()
Item_Link = response.url
n_item_link = f"{Item_Link}?variant="
idre = r'("id":\d*)' #defining regex
id = response.xpath("//script[#id='ProductJson-product']/text()").re(idre) #applying regex
id1 = [item.replace('"id":', '') for item in id] #cleaning list of url-ids
id2 = id1[1:] #dropping first item
test = n_item_link.append(id2) # doesn't work
test2 = n_item_link.str.cat(id2) # doesn't work either
yield{
'test':test,
'test2':test2
}
# recreating the DataFrame
df = pd.DataFrame({
"link": ["https://url.com/item?variant=",
"https://url2.com/item?variant=",
"https://url3.com/item?variant="],
"variants" : [["1","2"],
["1","2"],
["1","1","3"]]
}
)
#creating a new column containg the lenght of each list
df["len_list"] = [len(x) for x in df["variants"].to_list()]
# creating a list of all values in df.variants and converting values to string
flat_list_variants = [str(item) for sublist in df["variants"].to_list() for item in sublist]
# creating a new DataFrame which contains each index replicated by the lenght of df["len_list"]
df_new = df.loc[df.index.repeat(df.len_list)]
# assign the list to a new column
df_new["flat_variants"] = flat_list_variants
#compose the result by sum strings
df_new["results"] = df_new["link"] + df_new["flat_variants"]
I don't know how exactly your input looks like but assuming you have a list (or a different iterable) for your links and your extensions this will work:
def join_url(links, ext_lists):
urls = []
for link, extension_list in zip(links, ext_lists):
for extension in extension_list:
urls.append(link + extension)
return urls
Sample input:
websites = ['web1&hello=', 'web2--', 'web3=']
extensions = [['1', '2'], ['1', '2', '3'], ['3', '1']]
url_list = join_url(websites, extensions)
print(url_list)
Output:
['web1&hello=1', 'web1&hello=2', 'web2--1', 'web2--2', 'web2--3', 'web3=3', 'web3=1']
Noob here. I need to get a report that is an aggregate of two collections from two databases. Tried to wrap my head around the problem but failed. The examples I have found are for aggregating two collections from the same database.
Collection 1
SweetsResults = client.ItemsDB.sweets
Collection sweets : _id, type, color
Collection2
SearchesResults = client.LogsDB.searches
Collection searches : _id, timestamp, type, color
The report I need will list all the sweets from the type “candy” with all the listed colors in the sweets collection, and for each line, the number (count) of searches for any available combination of “candy”+color.
Any help will be appreciated.
Thanks.
You can use the below script in mongo shell.
Get the distinct color for each type followed by count for each type and color combination.
var itemsdb = db.getSiblingDB('ItemsDB');
var logsdb = db.getSiblingDB('LogsDB');
var docs = [];
itemsdb.getCollection("sweets").aggregate([
{$match:{"type":"candy"}},
{$group: {_id:{type:"$type", color:"$color"}},
{$project: {_id:0, type:"$_id.type", color:"$_id.color"}}
]).forEach(function(doc){
doc.count = logsdb.getCollection("searches").count({ "type":"candy","color":doc.color});
docs.push(doc)
});
Exactly the same as in #Veeram answer but with python:
uri = 'mongodb://localhost'
client = MongoClient(uri)
items_db = client.get_database('ItemsDB')
logs_db = client.get_database('LogsDB')
docs = []
aggr = items_db.get_collection('sweets').aggregate([
{'$match': {"type": "candy"}},
{'$group': {'_id': {'type': "$type", 'color': "$color"}}},
{'$project': {'_id': 0, 'type': "$_id.type", 'color': "$_id.color"}},
])
for doc in aggr:
doc['count'] = logs_db.get_collection("searches").count({"type": "candy", "color": doc['color']})
docs.append(doc)
I have a dataframe that I need to transform into JSON. I think it would be easier to first turn it into a dictionary, but I can't figure out how. I need to transform it into JSON so that I can visualize it with js.d3
Here is what the data looks like currently:
NAME, CATEGORY, TAG
Ex1, Education, Books
Ex2, Transportation, Bus
Ex3, Education, Schools
Ex4, Education, Books
Ex5, Markets, Stores
Here is what I want the data to look like:
Data = {
Education {
Books {
key: Ex1,
key: Ex2
}
Schools {
key: Ex3
}
}
Transportation {
Bus {
key: Ex2
}
}
Markets {
Stores {
key: Ex5
}
}
(I think my JSON isn't perfect here, but I just wanted to convey the general idea).
This code is thanks to Brent Washburne's very helpful answer above. I just needed to remove the tags column because for now it was too messy (many of the rows had more than one tag separated by commas). I also added a column (of integers) which I wanted connected to the names. Here it is:
import json, string
import pprint
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], [])
to_append = {}
to_append[fields[0]] = fields[3]
categories.append(to_append)
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')
You can't use 'key' as a key more than once, so the innermost group is a list:
import json, string
def to_json(file):
data = {}
for line in open(file):
fields = map(string.strip, line.split(','))
categories = data.get(fields[1], {})
tags = categories.get(fields[2], [])
tags.append(fields[0])
categories[fields[2]] = tags
data[fields[1]] = categories
return json.dumps(data)
print to_json('data.csv')
Result:
{"Markets": {"Stores": ["Ex5"]}, "Education": {"Schools": ["Ex3"], "Books": ["Ex1", "Ex4"]}, "Transportation": {"Bus": ["Ex2"]}}
I am trying to parse a JSON data set that looks something like this:
{"data":[
{
"Rest":0,
"Status":"The campaign is moved to the archive",
"IsActive":"No",
"StatusArchive":"Yes",
"Login":"some_login",
"ContextStrategyName":"Default",
"CampaignID":1111111,
"StatusShow":"No",
"StartDate":"2013-01-20",
"Sum":0,
"StatusModerate":"Yes",
"Clicks":0,
"Shows":0,
"ManagerName":"XYZ",
"StatusActivating":"Yes",
"StrategyName":"HighestPosition",
"SumAvailableForTransfer":0,
"AgencyName":null,
"Name":"Campaign_01"
},
{
"Rest":82.6200000000008,
"Status":"Impressions will begin tomorrow at 10:00",
"IsActive":"Yes",
"StatusArchive":"No",
"Login":"some_login",
"ContextStrategyName":"Default",
"CampaignID":2222222,
"StatusShow":"Yes",
"StartDate":"2013-01-28",
"Sum":15998,"StatusModerate":"Yes",
"Clicks":7571,
"Shows":5535646,
"ManagerName":"XYZ",
"StatusActivating":"Yes",
"StrategyName":"HighestPosition",
"SumAvailableForTransfer":0,
"AgencyName":null,
"Name":"Campaign_02"
}
]
}
Lets assume that there can be many of these data sets.
I would like to iterate through each one of them and grab the "Name" and the "Campaign ID" parameter.
So far my code looks something like this:
decoded_response = response.read().decode("UTF-8")
data = json.loads(decoded.response)
for item in data[0]:
for x in data[0][item] ...
-> need a get name procedure
-> need a get campaign_id procedure
Probably quite straight forward! I am not good with lists/dictionaries :(
Access dictionaries with d[dict_key] or d.get(dict_key, default) (to provide default value):
jsonResponse=json.loads(decoded_response)
jsonData = jsonResponse["data"]
for item in jsonData:
name = item.get("Name")
campaignID = item.get("CampaignID")
I suggest you read something about dictionaries.