Local variables in a dictionary function in Python - python

I am trying to handle the below requirement. As a beginner to Python programming, I couldn't get out of the issue which am facing in declaring the variables. I have a huge XML that I need to open and create three dictionaries out of it.
Here are my programming steps.
Open the file using the built-in open function
Read each line from the object created above
Between certain tags, I need to search for a pattern and fill the data into the dictionary.
The XML file looks like
<tag_1>
name=(pattern1)
age=(pattern1.1)
company=(pattern1.2)
<\tag_1>
<tag_2>
name=(pattern2)
age=(pattern2.1)
company=(pattern2.2)
<\tag_2>
<tag_3>
name=(pattern3)
age=(pattern3.1)
comapany=(pattern3.2)
<\tag_3>
and so on, with repeated above tags.
From each tag above, i need to create 3 dictionaries like:
dict1[pattern1]['age']=pattern1.1
dict1[pattern1]['company']=pattern1.2
Similarly for dict2, & dict3 as well.
Created a dictionary function, with passing arguments as line, dictionary.
for line in file.readlines():
dict_instance(line, dictionary_1 )
dict_instance(line, dictionary_2 )
dict_instance(line, dictionary_3 )
def dict_instance(line, object):
#ON TAG START (i have this condition set in my code)
if re.search(r'name=(.*)', line):
name=re.search(r'name=(.*)', line).group(1)
if re.search(r'age=(.*)', line):
age=re.search(r'age=(.*)', line).group(1)
if re.search(r'company=(.*)', line):
company=re.search(r'company=(.*)', line).group(1)
#ON TAG END (i have this condition set in my code)
object[name]={}
if not age:
object[name]['age']=age
if not company:
object[name]['company']=company
Each tag of data should go in each dictionary, like tag1 to dict1, tag2 to dict2 and tag3 to dict3.
Now my question is how do I can create the "name", "age" & "company" variables local to each dictionary, if I create global variables, these will mix up in all three dictionaries which creates incorrect data in it.
Please ignore if any indentation issues in the above.

I'm not sure I understand the requirements. But here are some methods which might be helpful:
xml_content = """<tags>
<tag_1>
name=(pattern1)
age=(pattern1.1)
company=(pattern1.2)
</tag_1>
<tag_2>
name=(pattern2)
age=(pattern2.1)
company=(pattern2.2)
</tag_2>
<tag_3>
name=(pattern3)
age=(pattern3.1)
company=(pattern3.2)
</tag_3>
</tags>
"""
from xml.etree import ElementTree
document = ElementTree.fromstring(xml_content)
You can iterate over the tags and get the desired information:
for tag in document:
print(tag.tag)
print(tag.text)
print(tag.text.split())
print(dict(line.split('=') for line in tag.text.split()))
print("---------------------")
It outputs:
tag_1
name=(pattern1)
age=(pattern1.1)
company=(pattern1.2)
['name=(pattern1)', 'age=(pattern1.1)', 'company=(pattern1.2)']
{'name': '(pattern1)', 'age': '(pattern1.1)', 'company': '(pattern1.2)'}
---------------------
tag_2
name=(pattern2)
age=(pattern2.1)
company=(pattern2.2)
['name=(pattern2)', 'age=(pattern2.1)', 'company=(pattern2.2)']
{'name': '(pattern2)', 'age': '(pattern2.1)', 'company': '(pattern2.2)'}
---------------------
tag_3
name=(pattern3)
age=(pattern3.1)
company=(pattern3.2)
['name=(pattern3)', 'age=(pattern3.1)', 'company=(pattern3.2)']
{'name': '(pattern3)', 'age': '(pattern3.1)', 'company': '(pattern3.2)'}
If you want one big list or one big dict:
def tag_to_dict(tag):
return dict(line.split('=') for line in tag.text.split())
[tag_to_dict(tag) for tag in document]
{tag.tag:tag_to_dict(tag) for tag in document}
Which return:
[{'name': '(pattern1)', 'age': '(pattern1.1)', 'company': '(pattern1.2)'},
{'name': '(pattern2)', 'age': '(pattern2.1)', 'company': '(pattern2.2)'},
{'name': '(pattern3)', 'age': '(pattern3.1)', 'company': '(pattern3.2)'}]
and
{'tag_1': {'name': '(pattern1)',
'age': '(pattern1.1)',
'company': '(pattern1.2)'},
'tag_2': {'name': '(pattern2)',
'age': '(pattern2.1)',
'company': '(pattern2.2)'},
'tag_3': {'name': '(pattern3)',
'age': '(pattern3.1)',
'company': '(pattern3.2)'}}

Related

Python JSON append if value doesn't exist

I've got a json file with 30-ish, blocks of "dicts" where every block has and ID, like this:
{
"ID": "23926695",
"webpage_url": "https://.com",
"logo_url": null,
"headline": "aewafs",
"application_deadline": "2020-03-31T23:59:59",
}
Since my script pulls information in the same way from an API more than once, I would like to append new "blocks" to the json file only if the ID doesn't already exist in the JSON file.
I've got something like this so far:
import os
check_empty = os.stat('pbdb.json').st_size
if check_empty == 0:
with open('pbdb.json', 'w') as f:
f.write('[\n]') # Writes '[' then linebreaks with '\n' and writes ']'
output = json.load(open("pbdb.json"))
for i in jobs:
output.append({
'ID': job_id,
'Title': jobtitle,
'Employer' : company,
'Employment type' : emptype,
'Fulltime' : tid,
'Deadline' : deadline,
'Link' : webpage
})
with open('pbdb.json', 'w') as job_data_file:
json.dump(output, job_data_file)
but I would like to only do the "output.append" part if the ID doesn't exist in the Json file.
I am not able to complete the code you provided but I added an example to show how you can achieve the none duplicate list of jobs(hopefully it helps):
# suppose `data` is you input data with duplicate ids included
data = [{'id': 1, 'name': 'john'}, {'id': 1, 'name': 'mary'}, {'id': 2, 'name': 'george'}]
# using dictionary comprehension you can eliminate the duplicates and finally get the results by calling the `values` method on dict.
noduplicate = list({itm['id']:itm for itm in data}.values())
with open('pbdb.json', 'w') as job_data_file:
json.dump(noduplicate, job_data_file)
I'll just go with a database guys, thank you for your time, we can close this thread now

Python - CSV File to Dict with Dataflow Template

I am trying to process a CSV file into a dict using a Dataflow template and Python.
As it is a template I have to use ReadFromText from the textio module, to be able to provide the path at runtime.
| beam.io.ReadFromText(contact_options.path)
All I need is to be able to extract the first line of this text/csv file, I can then use this data in DictReader as the fieldnames.
If I use split lines it brings back a each element of the text file in a list:
return element.splitlines()
or
csv_data = []
split_element = element.split('\n')
for row in split_element:
csv_data.append(row)
return csv_data
['phone_number', 'cid', 'first_name', 'last_name']
[' ', '101XXXXX', 'MurXXX', 'LevXXXX']
['3052XXXXX', '109XXXXX', 'MerXXXX', 'CoXXXX']
['954XXXXX', '10XXXXXX', 'RoXXXX', 'MaXXXXX']
Although If I then use say element[0], it just brings everythin back without the list brackets. I have also tried splitting by '\n', then using a for loop to produce a list object, although it produces almost the same result.
I cannot rely on using predetermined fieldnames as the csv files to be processed will all have different fieldnames and DictReader will not work effectively without fieldnames given.
EDIT:
The expected output is:
[{'phone_Number': '561XXXXX', 'first_Name': '', 'last_Name': 'BeXXXX', 'cid': '745XXXXX'}, {'phone_Number': '561XXXXX', 'first_Name': 'A', 'last_Name': 'BXXXX', 'cid': '61XXXXX'}]
EDIT:
Element contents:
"phone_Number","cid","first_Name","last_Name"
"5616XXXXX","745XXXX","","BeXXXXX"
"561XXXXXX","61XXXXX","A","BXXXXXXt"
"95XXXXXXX","6XXXXXX","A","BXXXXXX"
"727XXXXXX","98XXXXXX","A","CaXXXXXX"
Use Pandas to load the values and use first line as colheaders
import pandas as pd
a_big_list=[['phone_number', 'cid', 'first_name', 'last_name'],
[' ', '101XXXXX', 'MurXXX', 'LevXXXX'],
['3052XXXXX', '109XXXXX', 'MerXXXX', 'CoXXXX'],
['954XXXXX', '10XXXXXX', 'RoXXXX', 'MaXXXXX']]
df=pd.DataFrame(a_big_list[1:],columns=a_big_list[0])
df.to_dict('records')
#[{'cid': '101XXXXX',
'first_name': 'MurXXX',
'last_name': 'LevXXXX',
'phone_number': ' '},
{'cid': '109XXXXX',
'first_name': 'MerXXXX',
'last_name': 'CoXXXX',
'phone_number': '3052XXXXX'},
{'cid': '10XXXXXX',
'first_name': 'RoXXXX',
'last_name': 'MaXXXXX',
'phone_number': '954XXXXX'}]
I was able to figure this problem out with inspiration from #mad_'s answer, but this still didn't give me the correct answer initally, as I needed to first group my pcollection into one element. I found a way of doing this inspired from this answer from Jiayuan Ma, and slightly altered it as so:
class Group(beam.DoFn):
def __init__(self):
self._buffer = []
def process(self, element):
self._buffer.append(element)
def finish_bundle(self):
if len(self._buffer) != 0:
yield list(self._buffer)
self._buffer = []
lines = p | 'File reading' >> ReadFromText(known_args.input)
| 'Group' >> beam.ParDo(Group(known_args.N)
...
Thus it grouped the entire CSV file as one object, and then I was able to apply mad_'s method to turn it into a dictionary.

combine dictionaries and pass as output to another function

I am learning python and coding. I am trying one web scraping example. I download the currency exchange data from a website and I want to compute average exchange rate for each currency over a 50 days period. The problem is that I am unable to do the following.
I get results from first function which should be in form of a dictionary and then pass these dictionaries to another function as argument and to perform averaging of those values. I am unable to pass correctly dict values to another function.
my code is as follow
import os
import webbrowser
import requests as rq
import sys
from bs4 import BeautifulSoup
from xml.etree import ElementTree as ET
def saveData(path, date):
session = rq.session()
url = 'https://www.bnm.md/en/official_exchange_rates?get_xml=1&date=' + date
datastore = session.get(url)
with open(path, 'wb') as f:
f.write(datastore.content)
data = ET.fromstring(datastore.content)
'''
elements = {}
for element in data.iter():
if element.tag in ('Name', 'Value'):
elements[element.tag] = element.text
print 'elements:', elements
# Here I want to combine those all dictionaries in variable so that i can pass it as argument to another function
return elements
'''
# i replace the above triple quote code with the following below code
elements = {}
for tag, text in data.items():
if tag in ('Name', 'Value'):
elements.setdefault(tag, [])
elements[tag].append(text)
return elements
def computeAverage(elements): # I want to pass function saveData() results who are in dictioanry form to this function but I am unable to solve this issue.
print elements
def main():
dates = ['20.04.2016', '21.04.2016', '22.04.2016']
paths = []
for date in dates:
path = '/home/robbin/Desktop/webscrape/{}.xml'.format(date)
paths.append(path)
data3 = {}
for path, date in zip(paths, dates):
data2 = saveData(path, date)
print 'data2: ', data2
for k, v in data2.items():
data3.setdefault(k, [])
data3[k].append(v)
print 'data3: ', data3
computeAverage(data3)
if __name__ == '__main__':
main()
Also I am getting the results from saveData() function as dictionaries like this and it repeat every dictionary for the next item too which is wrong.
elements: {'Name': 'Euro'}
elements: {'Name': 'Euro', 'Value': '22.4023'}
elements: {'Name': 'US Dollar', 'Value': '22.4023'}
elements: {'Name': 'US Dollar', 'Value': '19.7707'}
elements: {'Name': 'Russian Ruble', 'Value': '19.7707'}
elements: {'Name': 'Russian Ruble', 'Value': '0.3014'}
elements: {'Name': 'Romanian Leu', 'Value': '0.3014'}
elements: {'Name': 'Romanian Leu', 'Value': '4.9988'}
Also what I tried to get results like this but failed
elements: {'Name': 'Euro', 'Value': '22.4023'}
elements: {'Name': 'US Dollar', 'Value': '19.7707'}
elements: {'Name': 'Russian Ruble', 'Value': '0.3014'}
elements: {'Name': 'Romanian Leu', 'Value': '4.9988'}
Updates:-------------
elements = []
for element in data.iter():
if element.tag in ('Name', 'Value'):
elements.append(element.text)
# print 'elements: ', elements
return elements
and in the main function() i make
for path, date in zip(paths, dates):
data = saveData(path, date)
# print 'data from main: ', data
computeAverage(data)
and the output of "print 'data from main: ', data" looks like this
['Euro', '22.4023', 'US Dollar', '19.7707', 'Russian Ruble', '0.3014', 'Romanian Leu', '4.9988',.........'Special Drawing Rights', '27.8688']
['Euro', '22.4408', 'US Dollar', '19.7421', 'Russian Ruble', '0.3007', 'Romanian Leu', '5.0012',.....'Special Drawing Rights', '27.8606']
I am newbie to coding and if someone help me regarding these two problems. I would be really thankful.
First of all, I agree with #Prakhar Verma.
Second, you didn't mention clearly what you want. But I can assume that you want to merge the data that you got from the 'saveData' function and then calculate average. So, here is the missing code.
data3 = {}
for path, date in zip(paths, dates):
data2 = saveData(path, date)
for k, v in data2.items():
# you can move this line after declaring the data3 dict if keys returned by saveData are fixed i.e. name, value
data3.setdefault(k, [])
data3[k].append(v)
computeAverage(data3)
Update to saveData function:
elements = {}
for tag, text in data.items():
if tag in ('Name', 'Value'):
elements.setdefault(tag, [])
elements[tag].append(text)
===================================================
Update 2:
def saveData(path, date):
#session = rq.session()
url = 'https://www.bnm.md/en/official_exchange_rates?get_xml=1&date=' + date
datastore = rq.get(url)
with open(path, 'wb') as f:
f.write(datastore.content)
data = ET.fromstring(datastore.content)
# i replace the above triple quote code with the following below code
elements = {}
for element in data.iter():
tag = element.tag
text = element.text
if tag in ('Name', 'Value'):
elements.setdefault(tag, [])
elements[tag].append(text)
return elements
def main():
dates = ['20.03.2016', '21.03.2016', '22.03.2016']
paths = []
for date in dates:
#please edit this
path = '{}.xml'.format(date)
paths.append(path)
data3 = {}
for path, date in zip(paths, dates):
data2 = saveData(path, date)
for k, v in data2.items():
data3.setdefault(k, [])
data3[k].append(v)
computeAverage(data3)
The 'saveData' function is returning data but you are not saving it in any variable. So what you need to do is save the data when it's returned from 'saveData' function and then send it as a parameter to 'computeAverage' function.
Please go through the basics of coding and follow any programming tutorial. :)

Best way to use variables across modules? (Python3)

I'm trying to separate various functions in my program to keep things neat. And I'm getting stuck trying to use variables created in one module in another module. I tried using global list_of_names but it wasn't working, and I've read that it's recommended not to do so anyway.
Below is a sample of my code. In my opinion, it doesn't make sense to pass list_of_names as a function argument because there are multiple other variables that I need to do this with, aside from the actual arguments that do get passed.
Unfortunately, even if I were to move read_json into engine.py, I'd still have the same problem in main.py as I need to reference list_of_names there as well.
# main.py:
import json
from engine import create_person
def read_json():
with open('names.json', 'r') as file
data = json.load(file)
return data
list_of_names = read_json()
person1 = create_person()
# engine.py:
from random import choice
def create_person():
name = choice(list_of_names)
new_person = {
'name': name,
# other keys/values created in similar fashion
}
return new_person
EDIT1:
Here's my new code. To me, this doesn't seem efficient to have to build the parameter list and then deconstruct it inside the function. (I know I'm reusing variable names for this example) Then I have to pass some of those parameters to other functions.
# main.py:
import json
from engine import create_person
def read_json():
with open('names.json', 'r') as file
data = json.load(file)
return data
player_id_index = 0
list_of_names = read_json()
person_parameters = [
list_of_names,
dict_of_locations,
player_id_index,
dict_of_occupations,
.
.
.
]
person1, player_id_index = create_person()
# engine.py:
from random import choice
def create_person(person_params):
list_of_names = person_params[0]
dict_of_locations = person_params[1]
player_id_index = person_params[2]
dict_of_occupations = person_params[3]
.
.
.
attr = person_params[n]
name = choice(list_of_names)
location = get_location(dict_of_locations) # a function elsewhere in engine.py
p_id = player_id_index
occupation = get_occupation(dict_of_occupations) # a function elsewhere in engine.py
new_person = {
'name': name,
'hometown': location,
'player id': p_id,
'occupation': occupation,
.
.
.
}
player_id_index += 1
return new_person, player_id_index
In general you should not be relying on shared global state. If you need to share state encapsulate the state in objects or pass as function arguments.
Regarding your specific problem it looks like you want to assemble random dictionaries from a set of options. It could be coded like this:
from random import choice
person_options = {
'name': ['fred', 'mary', 'john', 'sarah', 'abigail', 'steve'],
'health': [6, 8, 12, 15],
'weapon': ['sword', 'bow'],
'armor': ['naked', 'leather', 'iron']
}
def create_person(person_options):
return {k:choice(opts) for k, opts in person_options.items()}
for _ in range(4):
print create_person(person_options)
In action:
>>> for _ in range(4):
... print(create_person(person_options))
...
{'armor': 'naked', 'weapon': 'bow', 'health': 15, 'name': 'steve'}
{'armor': 'iron', 'weapon': 'sword', 'health': 8, 'name': 'fred'}
{'armor': 'iron', 'weapon': 'sword', 'health': 6, 'name': 'john'}
{'armor': 'iron', 'weapon': 'sword', 'health': 12, 'name': 'john'}
Note that a dictionary like {'armor': 'naked', 'weapon': 'bow', 'health': 15, 'name': 'steve'} looks like it might want to be an object. A dictionary is a glob of state without any defined behavior. If you make a class to house this state the class can grow methods that act on that state. Of course, explaining all this could make this answer really really long. For now, just realize that you should move away from having shared state that any old bit of code can mess with. A little bit of discipline on this will make your code much easier to refactor later on.
This addresses your edited question:
from random import choice
from itertools import count
from functools import partial
person_options = {
'name': partial(
choice, ['fred', 'mary', 'john', 'sarah', 'abigail', 'steve']),
'location': partial(
get_location, {'heaven':1, 'hell':2, 'earth':3}),
'player id': count(1).next
}
def create_person(person_options):
return {k:func() for k, func in person_options.items()}
However, we are now way beyond the scope of your original question and getting into specifics that won't be helpful to anyone other than you. Such questions are better asked on Code Review Stack Exchange

Update and create a multi-dimensional dictionary in Python

I am parsing JSON that stores various code snippets and I am first building a dictionary of languages used by these snippets:
snippets = {'python': {}, 'text': {}, 'php': {}, 'js': {}}
Then when looping through the JSON I'm wanting add the information about the snippet into its own dictionary to the dictionary listed above. For example, if I had a JS snippet - the end result would be:
snippets = {'js':
{"title":"Script 1","code":"code here", "id":"123456"}
{"title":"Script 2","code":"code here", "id":"123457"}
}
Not to muddy the waters - but in PHP working on a multi-dimensional array I would just do the following (I am lookng for something similiar):
snippets['js'][] = array here
I know I saw one or two people talking about how to create a multidimensional dictionary - but can't seem to track down adding a dictionary to a dictionary within python. Thanks for the help.
This is called autovivification:
You can do it with defaultdict
def tree():
return collections.defaultdict(tree)
d = tree()
d['js']['title'] = 'Script1'
If the idea is to have lists, you can do:
d = collections.defaultdict(list)
d['js'].append({'foo': 'bar'})
d['js'].append({'other': 'thing'})
The idea for defaultdict it to create automatically the element when the key is accessed. BTW, for this simple case, you can simply do:
d = {}
d['js'] = [{'foo': 'bar'}, {'other': 'thing'}]
From
snippets = {'js':
{"title":"Script 1","code":"code here", "id":"123456"}
{"title":"Script 2","code":"code here", "id":"123457"}
}
It looks to me like you want to have a list of dictionaries. Here is some python code that should hopefully result in what you want
snippets = {'python': [], 'text': [], 'php': [], 'js': []}
snippets['js'].append({"title":"Script 1","code":"code here", "id":"123456"})
snippets['js'].append({"title":"Script 1","code":"code here", "id":"123457"})
print(snippets['js']) #[{'code': 'code here', 'id': '123456', 'title': 'Script 1'}, {'code': 'code here', 'id': '123457', 'title': 'Script 1'}]
Does that make it clear?

Categories

Resources