Array keeps getting replaced when being added to a dictionary in python - python

Im trying to make a JSON object, which is basically a dictionary. This is my code which created a dictionary:
# Adding the data to the JSONData object
JSONData[str(gerechtNaam)] = {
"afbeeldingURL": gerechtAfbeelding,
"receptURL": recept,
"prijs": totalePrijs,
"porties": porties,
"moeilijkheid" :moeilijkheid,
"caloriePortie": calorien,
"voorbereidingsTijd": voorbereidingsTijd,
"wachtTijd": wachtTijd,
"totaleTijd": totaleTijd,
"ingredienten": naamEnKwantiteitIngredienten
}
This works, and generates the following:
{
'Gerooktekipsalade met avocado en walnoten': {
'afbeeldingURL': 'https://static-images.jumbo.com/product_images/Recipe_502535-01_560x560.jpg',
'receptURL': 'http://www.jumbo.com/gerooktekipsalade-met-avocado-en-walnoten/502535/',
'prijs': 16.868000000000002,
'porties': '4 porties',
'moeilijkheid': 'Eenvoudig',
'caloriePortie': '842 kcal per persoon',
'voorbereidingsTijd': '15 min',
'wachtTijd': '0',
'totaleTijd': '15 min',
'ingredienten': [
'2 kroppen minisla romaine ',
'200 g cherrytomaatjes',
'4 stengels bleekselderij',
'2 friszoete handappels ',
'380 g Nieuwe Standaard Kip gerookte kipfilet ',
'2 bosuitjes',
'2 avocado',
'150 ml whisky-cocktailsaus',
'3 el bieslook',
'60 g walnoten',
'1 stokbrood',
'1 snufje peper'
]
}
}
Which I then convert using the following code:
with open('receptData.json', 'w') as outfile:
json.dump(JSONData, outfile)
This works, and generated working JSON. The only problem is that when trying to run the code twice in a for loop, the last variabel, called 'ingredienten' which is a list that gets created in the loop, gets replaced for all objects in the dictionary. So when the second 'ingredienten' array is created, the 'ingredienten' array that had already been made and added to JSONData gets replaced by the new one. All the other variables stay correct, yet the list/array gets replaced every time the loop runs.
So the second time the code runs, this is the dictionary I get:
{
'Gerooktekipsalade met avocado en walnoten': {
'afbeeldingURL': 'https://static-images.jumbo.com/product_images/Recipe_502535-01_560x560.jpg',
'receptURL': 'http://www.jumbo.com/gerooktekipsalade-met-avocado-en-walnoten/502535/',
'prijs': 16.868000000000002,
'porties': '4 porties',
'moeilijkheid': 'Eenvoudig',
'caloriePortie': '842 kcal per persoon',
'voorbereidingsTijd': '15 min',
'wachtTijd': '0',
'totaleTijd': '15 min',
'ingredienten': **[
'4 avocado',
'100 g gerookte zalm',
'8 kleine eieren ',
'25 g alfalfa',
'1 snufje peper',
'1 bakplaat'
]**
},
'Gevulde avocado met ei en zalm uit de oven': {
'afbeeldingURL': 'https://static-images.jumbo.com/product_images/Recipe_502536-01_560x560.jpg',
'receptURL': 'http://www.jumbo.com/gevulde-avocado-met-ei-en-zalm-uit-de-oven/502536/',
'prijs': 8.72,
'porties': '4 porties',
'moeilijkheid': 'Eenvoudig',
'caloriePortie': '234 kcal per persoon',
'voorbereidingsTijd': '10 min',
'wachtTijd': '15 min',
'totaleTijd': '25 min',
'ingredienten': **[
'4 avocado',
'100 g gerookte zalm',
'8 kleine eieren ',
'25 g alfalfa',
'1 snufje peper',
'1 bakplaat'
]**
}
}
In which the first 'ingredienten' list is now the same as the second one, which should not be the case. I've tried multiple things but none worked....

While you haven't shown the code that creates it, I'm pretty sure the problem is that you're reusing the variable naamEnKwantiteitIngredienten, which is the list you're using as the value pointed to by the 'ingredienten' key in your dictionary. If that list gets modified in place (perhaps by filling it up with a different set of ingredients), you'll also see the modified version in your previous dictionary if you haven't dumped it to a JSON string yet.
There are I think a two main ways you could fix the problem.
One is to create the JSON immediately after you make the dictionary, rather than waiting to do it later. While this might resolve this issue, it might be inconvenient for your program (or impossible, if you need all the dictionaries to be defined at the same time for other reasons).
The other solution is to make sure that the dictionaries you create are independent of each other. Rather than reusing the same list in all of them, you should make sure that each one contains a separate list. The most obvious place to fix this may be wherever you create the value that ends up in naamEnKwantiteitIngredienten, but you could instead fix it within the code you show by copying the list just before you put it in the dictionary:
JSONData[str(gerechtNaam)] = {
"afbeeldingURL": gerechtAfbeelding,
"receptURL": recept,
"prijs": totalePrijs,
"porties": porties,
"moeilijkheid" :moeilijkheid,
"caloriePortie": calorien,
"voorbereidingsTijd": voorbereidingsTijd,
"wachtTijd": wachtTijd,
"totaleTijd": totaleTijd,
"ingredienten": naamEnKwantiteitIngredienten[:] # slice here to copy the list!
}

Related

Sum of Nested Lists for each key in dictionary

How do you get the sum of each nested list for each key in the dictionary below?
Let's say the following below is called msgs
I tried the following code:
I ended up getting the result:
It is almost right but for some reason the sum of the first nested list is incorrect, being 0 whereas it should be 19. I have a feeling this has to do with the total = 0 part in the above code I wrote but I am not sure if this is the case and I don't know how to fix the issue.
The way I got the values in the nested list was I summed the number of strings in each index of the nested list. So for instance, this here was for the first key. As you can see, there are 15 entries in the first one and 4 in the second one.
(this dictionary is called 'kakao' in my code)
{'Saturday, July 28, 2018': [['hey', 'ben', 'u her?', 'here?', 'ok so basically', 'farzam and avash dont wanna go to vegas', 'lol', 'im offering a spontaneous trip me and you to SF', 'lol otherwise ill just go back to LA', 'i mean sf is far but', 'i mean if u really wanna hhah', 'we could go and see chris', 'but otherwise its fine', 'alright send me the code too', 'im on my way right now'], ['Wtf is happening lol', '8 haha', 'Key is #8000', 'Hf']]}
The code I used to get the sums as a nested list was:
kakao = {'Saturday, July 28, 2018': [['hey', 'ben', 'u her?', 'here?', 'ok so basically', \
'farzam and avash dont wanna go to vegas', 'lol', 'im offering a spontaneous trip me and you to SF', \
'lol otherwise ill just go back to LA', 'i mean sf is far but', 'i mean if u really wanna hhah', \
'we could go and see chris', 'but otherwise its fine', 'alright send me the code too', 'im on my way right now'], \
['Wtf is happening lol', '8 haha', 'Key is #8000', 'Hf']],
'Friday, August 3, 2018': [['Someone', 'said', 'something'], ['Just', 'test']],}
print({key: [sum(map(lambda letters: len(letters), val))] for key, val in kakao.items()})
#the result --> {'Saturday, July 28, 2018': [19], 'Friday, August 3, 2018': [5]}
I guess you want to count the letters form the sentences at the same day, hope this code can help you.

In Scrapy, how to extract two groups in a regular expression into two different fields?

I'm writing a spider trulia to scrape pages of properties for sale on Trulia.com such as https://www.trulia.com/property/1072559047-1860-Lombard-St-San-Francisco-CA-94123; the current version can be found on https://github.com/khpeek/trulia-scraper.
I'm using Item Loaders and invoking the add_xpath method with the re keyword argument to specify regular expressions to extract. In the example in the documentation, there is just one group in the regular expression and one field to extract to.
However, I would actually like to define two groups and extract them to two separate Scrapy fields. Here is an 'excerpt' from the parse_property_page method:
def parse_property_page(self, response):
l = TruliaItemLoader(item=TruliaItem(), response=response)
details = l.nested_css('.homeDetailsHeading')
overview = details.nested_xpath('.//span[contains(text(), "Overview")]/parent::div/following-sibling::div[1]')
overview.add_xpath('overview', xpath='.//li/text()')
overview.add_xpath('area', xpath='.//li/text()', re=r'([\d,]+) sqft$')
overview.add_xpath('lot_size', xpath='.//li/text()', re=r'([\d,]+) (acres|sqft) lot size$')
Notice how the lot_size field has two groups extracted: one for the number, and one for the units which can be either 'acres' or 'sqft'. If I run this parse method using the command
scrapy parse https://www.trulia.com/property/1072559047-1860-Lombard-St-San-Francisco-CA-94123 --spider=trulia --callback=parse_property_page
then I get the following scraped item:
# Scraped Items ------------------------------------------------------------
[{'address': '1860 Lombard St',
'area': 2524.0,
'city_state': 'San Francisco, CA 94123',
'dates': ['10/22/2002', '04/25/2002', '03/20/2000'],
'description': ['Outstanding investment opportunity to own this light-fixer '
'mixed use Marina 2-unit property w/established income and '
'not on liquefaction. The first floor of this building '
'houses a commercial business currently leased to Jigalin '
'Fitness until 2018. The second floor presents a 2bed/1bath '
'apartment fully outfitted in a contemporary design w/full '
'kitchen, 10ft high ceilings & laundry area. The apartment '
'will be delivered vacant. The structure has undergone '
'renovation & features concrete perimeter foundation, '
'reinforced walls, ADA compliant commercial restroom, '
'electrical updates & rolling door. This property makes an '
"ideal investment with instant cash flow. Don't let this "
'pass you by. As-Is sale.'],
'events': ['Sold', 'Sold', 'Sold'],
'listing_information': ['2 Bedrooms', 'Multi-Family'],
'listing_information_date_updated': '11/03/2017',
'lot_size': ['1620', 'sqft'],
'neighborhood': 'Marina',
'overview': ['Multi-Family',
'2 Beds',
'Built in 1908',
'1 days on Trulia',
'1620 sqft lot size',
'2,524 sqft',
'$711/sqft'],
'prices': ['$850,000', '$1,350,000', '$1,200,000'],
'public_records': ['1 Bathroom',
'Multi-Family',
'1,296 Square Feet',
'Lot Size: 1,620 sqft'],
'public_records_date_updated': '07/01/2017',
'url': 'https://www.trulia.com/property/1072559047-1860-Lombard-St-San-Francisco-CA-94123'}]
where the lot_size field is a list with the number and the unit. However, I'd ideally like to extract the unit (acres or sqft) to a separate field lot_size_units. I could do this by first loading the item and doing my own processing, but I was wondering whether there is a more Scrapy-native way to 'unpack' the matched groups into different items?
(I've perused the get_value method on https://github.com/scrapy/scrapy/blob/129421c7e31b89b9b0f9c5f7d8ae59e47df36091/scrapy/loader/init.py, but this hasn't 'shown me the way' yet if there is any).
You could try this (ignoring one group at a time):
overview.add_xpath('lot_size', xpath='.//li/text()', re=r'([\d,]+) (?:acres|sqft) lot size$')
overview.add_xpath('lot_size_units', xpath='.//li/text()', re=r'(?:[\d,]+) (acres|sqft) lot size$')

Most pythonic way to get a range from the length of a list?

I'm trying to get some slices from some tuples, that look like this:
classes = ('1 hrs A', '2 hrs A', '3 hrs A', '3 hrs B', '3 hrs C', '3 hrs C', '3 hrs C')
What I have done is:
for i in range(len(classes)):
print(classes[i][0])
Which produces the desired effect of only printing out the integer portion, but it's kind of ugly with the whole range(len(classes)) portion, I was wondering if there was a different way to acheive the same results?
You could just do:
for i in classes:
print(i[0])
The cleanest way is probably to iterate over classes itself, and - since you're only interested in the first element - unpack them:
for item, *_ in classes:
print(item)
Note, however, that this only works when the number is a single character. If it has multiple characters, you should split the string:
for item in classes:
print(item.split()[0])

Python - slow speed when extracting numbers/words from a string

Noob here trying to learn python by doing a project as I don't learn well from books.
I am using a huge lump of code to perform what seems to me to be a small operation -
I want to extract 4 variables from the following string
'Miami 0, New England 28'
(variables being home_team, away_team, home_score, away_score)
My program is running pretty slow and I think it might be this bit of code. I guess I am looking for the quickest/most efficient way of doing this.
Would regex be quicker? Thanks
It seems like your text could be split twice. First on , and next on whitespace:
info1,info2 = s.split(',')
home,home_score = info1.rsplit(None,1)
away,away_score = info2.rsplit(None,1)
e.g.:
>>> s = 'Miami 0, New England 28'
>>> info1,info2 = s.split(',')
>>> home,home_score = info1.rsplit(None,1)
>>> away,away_score = info2.rsplit(None,1)
>>> print [home,home_score,away,away_score]
['Miami', '0', ' New England', '28']
You could do this with regex without too much difficulty -- but you pay for it in terms of readability.
In case you do want a regex:
import re
s='Miami 0, New England 28'
l=re.findall(r'^([^\d]+)\s(\d+)\s*,\s*([^\d]+)\s(\d+)',s)
hm_team,away_team,hm_score,away_score=l[0]
print l
Prints [('Miami', '0', 'New England', '28')] and assigns those values to the variables.
import re
reg = re.compile('\s*(\D+?)\s*(\d+)'
'[,;:.#=#\s]*'
'(\D+?)\s*(\d+)'
'\s*')
for s in ('Miami 0, New England 28',
'Miami0,New England28 ',
' Miami 0 . New England28',
'Miami 0 ; New England 28',
'Miami0#New England28 ',
' Miami 0 # New England28'):
print reg.search(s).groups()
result
('Miami', '0', 'New England', '28')
('Miami', '0', 'New England', '28')
('Miami', '0', 'New England', '28')
('Miami', '0', 'New England', '28')
('Miami', '0', 'New England', '28')
('Miami', '0', 'New England', '28')
'\D' means 'no digit'

Iterating over multiple lists in Python

I have a list within a list, and I am trying to iterate through one list, and then in the inner list I want to search for a value, and if this value is present, place that list in a variable.
Here's what I have, which doesn't seem to be doing the job:
for z, g in range(len(tablerows), len(andrewlist)):
tablerowslist = tablerows[z]
if "Andrew Alexander" in tablerowslist:
andrewlist[g] = tablerowslist
Any ideas?
This is the list structure:
[['Kyle Bazzy', 'FUP dropbox message', '8/18/2011', 'Swing Trade Stocks</a>', ' ', 'Affiliate blog'], ['Kyle Bazzy', 'FUP dropbox message', '8/18/2011', 'Swing Trade Software</a>', ' ', 'FUP from dropbox message. Affiliate blog'], ['Kyle Bazzy', 'FUP dropbox message', '8/18/2011', 'Start Day Trading (Blog)</a>', ' ', 'FUP from dropbox message'], ['Kyle Bazzy', 'Call, be VERY NICE', '8/18/2011', ' ', 'r24867</a>', 'We have been very nice to him, but he wants to cancel, we need to keep being nice and seeing what is wrong now.'], ['Jason Raznick', 'Reach out', '8/18/2011', 'Lexis Nexis</a>', ' ', '-'], ['Andrew Alexander', 'Check on account in one week', '8/18/2011', ' ', 'r46876</a>', '-'], ['Andrew Alexander', 'Cancel him from 5 dollar feed', '8/18/2011', ' ', 'r37693</a>', '-'], ['Aaron Wise', 'FUP with contract', '8/18/2011', 'YouTradeFX</a>', ' ', "Zisa is on vacation...FUP next week and then try again if she's still gone."], ['Aaron Wise', 'Email--JASON', '8/18/2011', 'Lexis Nexis</a>', ' ', 'email by today'], ['Sarah Knapp', '3rd FUP', '8/18/2011', 'Steven L. Pomeranz</a>', ' ', '-'], ['Sarah Knapp', 'Are we really interested in partnering?', '8/18/2011', 'Reverse Spins</a>', ' ', "V. political, doesn't seem like high quality content. Do we really want a partnership?"], ['Sarah Knapp', '2nd follow up', '8/18/2011', 'Business World</a>', ' ', '-'], ['Sarah Knapp', 'Determine whether we are actually interested in partnership', '8/18/2011', 'Fayrouz In Dallas</a>', ' ', "Hasn't updated since September 2010."], ['Sarah Knapp', 'See email exchange w/Autumn; what should happen', '8/18/2011', 'Graham and Doddsville</a>', ' ', "Wasn't sure if we could partner bc of regulations, but could do something meant simply to increase traffic both ways."], ['Sarah Knapp', '3rd follow up', '8/18/2011', 'Fund Action</a>', ' ', '-']]
For any value that has a particular value in it, say, Andrew Alexander, I want to make a separate list of these.
For example:
[['Andrew Alexander', 'Check on account in one week', '8/18/2011', ' ', 'r46876</a>', '-'], ['Andrew Alexander', 'Cancel him from 5 dollar feed', '8/18/2011', ' ', 'r37693</a>', '-']]
Assuming you have a list whose elements are lists, this is what I'd do:
andrewlist = [row for row in tablerows if "Andrew Alexander" in row]
>>> #I have a list within a list,
>>> lol = [[1, 2, 42, 3], [4, 5, 6], [7, 42, 8]]
>>> found = []
>>> #iterate through one list,
>>> for i in lol:
... #in the inner list I want to search for a value
... if 42 in i:
... #if this value is present, place that list in a variable
... found.append(i)
...
>>> found
[[1, 2, 42, 3], [7, 42, 8]]
for z, g in range(len(tablerows), len(andrewlist)):
This means "make a list of the numbers which are between the length of tablerows and the length of andrewlist, and then look at each of those numbers in turn, and treat those numbers as a list of two values, and assign the two values to z and g each time through the loop".
A number cannot be treated as a list of two values, so this fails.
You need to be much, much clearer about what you are doing. Show an example of the contents of tablerows before the loop, and the contents of andrewlist before the loop, and what it should look like afterwards. Your description is muddled: I can only guess that when you say "and then I want to iterate through one list" you mean one of the lists in your list-of-lists; but I can't tell whether you want one specific one, or each one in turn. And then when you next say "and then in the inner list I want to...", I have no idea what you're referring to.

Categories

Resources