How to loop through an document saving every results that applies

How to loop through an document saving every results that applies - python

i have a document containing a dict of results, what i want to do is loop through the document and save each result that is right
this is my current code which works fine but will only the return the first result
#Fetch router descriptors based on a given flag
def getHSDirFlag():
for r in router.itervalues():
if 'HSDir' in r['flags']:
return r
return None
i have tried :
def getHSDirFlag():
HSDirList =()
for r in router.itervalues():
if 'HSDir' in r['flags']:
HSDirList += r
return HSDirList
return None
but get the error TypeError: can only concatenate tuple (not "dict") to tuple
what is the best data type to save a dict to and how can i loop through the doc finding every result

First, why would you call a variable HSDirList and make it a tuple, not a list?!
Second, why return the "list" inside the for loop, then tack a return None (which will never be reached) to the end of the function?
Try:
def getHSDirFlag(router):
HSDirList = [] # an actual list
for r in router.itervalues():
if 'HSDir' in r['flags']:
HSDirList.append(r) # add to the list
return HSDirList # return the list
Note that the return is outside the for loop, so doesn't happen until you've iterated over all itervalues. Also, router is now an argument to the function, rather than relying on scope.
Finally, you should read and consider implementing the Python style guide, PEP-0008.

you can save your dictionaries in JSON file ! in this code you have a tuple and you want to concatenate dictionaries on it but i suggest you use JSON for saving dicts !
this code is for saving a json file :
import json
with open('data.json', 'wb') as fp:
json.dump(data, fp)
and this one for load
with open('data.json', 'rb') as fp:
data = json.load(fp)
read more at https://docs.python.org/2/library/json.html

Related

Reading from nested json and getting None type Error -> try/except

I am reading data from nested json with this code:
data = json.loads(json_file.json)
for nodesUni in data["data"]["queryUnits"]['nodes']:
try:
tm = (nodesUni['sql']['busData'][0]['engine']['engType'])
except:
tm = ''
try:
to = (nodesUni['sql']['carData'][0]['engineData']['producer']['engName'])
except:
to = ''
json_output_for_one_GU_owner = {
"EngineType": tm,
"EngineName": to,
}
I am having an issue with None type error (eg. this one doesn't exists at all nodesUni['sql']['busData'][0]['engine']['engType'] cause there are no data, so I am using try/except. But my code is more complex and having a try/except for every value is crazy. Is there any other option how to deal with this?
Error: "TypeError: 'NoneType' object is not subscriptable"

This is non-trivial as your requirement is to traverse the dictionaries without errors, and get an empty string value in the end, all that in a very simple expression like cascading the [] operators.
First method
My approach is to add a hook when loading the json file, so it creates default dictionaries in an infinite way
import collections,json
def superdefaultdict():
return collections.defaultdict(superdefaultdict)
def hook(s):
c = superdefaultdict()
c.update(s)
return(c)
data = json.loads('{"foo":"bar"}',object_hook=hook)
print(data["x"][0]["zzz"]) # doesn't exist
print(data["foo"]) # exists
prints:
defaultdict(<function superdefaultdict at 0x000001ECEFA47160>, {})
bar
when accessing some combination of keys that don't exist (at any level), superdefaultdict recursively creates a defaultdict of itself (this is a nice pattern, you can read more about it in Is there a standard class for an infinitely nested defaultdict?), allowing any number of non-existing key levels.
Now the only drawback is that it returns a defaultdict(<function superdefaultdict at 0x000001ECEFA47160>, {}) which is ugly. So
print(data["x"][0]["zzz"] or "")
prints empty string if the dictionary is empty. That should suffice for your purpose.
Use like that in your context:
def superdefaultdict():
return collections.defaultdict(superdefaultdict)
def hook(s):
c = superdefaultdict()
c.update(s)
return(c)
data = json.loads(json_file.json,object_hook=hook)
for nodesUni in data["data"]["queryUnits"]['nodes']:
tm = nodesUni['sql']['busData'][0]['engine']['engType'] or ""
to = nodesUni['sql']['carData'][0]['engineData']['producer']['engName'] or ""
Drawbacks:
It creates a lot of empty dictionaries in your data object. Shouldn't be a problem (except if you're very low in memory) as the object isn't dumped to a file afterwards (where the non-existent values would appear)
If a value already exists, trying to access it as a dictionary crashes the program
Also if some value is 0 or an empty list, the or operator will pick "". This can be workarounded with another wrapper that tests if the object is an empty superdefaultdict instead. Less elegant but doable.
Second method
Convert the access of your successive dictionaries as a string (for instance just double quote your expression like "['sql']['busData'][0]['engine']['engType']", parse it, and loop on the keys to get the data. If there's an exception, stop and return an empty string.
import json,re,operator
def get(key,data):
key_parts = [x.strip("'") if x.startswith("'") else int(x) for x in re.findall(r"\[([^\]]*)\]",key)]
try:
for k in key_parts:
data = data[k]
return data
except (KeyError,IndexError,TypeError):
return ""
testing with some simple data:
data = json.loads('{"foo":"bar","hello":{"a":12}}')
print(get("['sql']['busData'][0]['engine']['engType']",data))
print(get("['hello']['a']",data))
print(get("['hello']['a']['e']",data))
we get, empty string (some keys are missing), 12 (the path is valid), empty string (we tried to traverse a non-dict existing value).
The syntax could be simplified (ex: "sql"."busData".O."engine"."engType") but would still have to retain a way to differentiate keys (strings) from indices (integers)
The second approach is probably the most flexible one.

How to return python dictionary to use in other function?

In this function I'm reading from a .txt file, and store the values in a dictionary. I want to be able to pass this dictionary to another function, to do further calculations and sorting.
I can manage to print all rows from the .txt file, but that's it.
Return breaks the loop and only gives the first row.
Global variables and nested functions are bad form.
Have tried to use yield (for the first time), but that only prints "generator object get_all_client_id at 0x03369A20"
file_with_client_info = open("C:\\Users\\clients.txt", "r")
def get_all_client_id():
client_details = {}
for line in file_with_client_info:
element = line.split(",")
while element:
client_details['client_id'] = element[0]
client_details['coordinates'] = {}
client_details['coordinates']['lat'] = element[1]
client_details['coordinates']['long'] = element[2]
break
print(client_details)

There are a few errors in you code.
Use a return statement to output the dictionary.
The while-loop does not loop as you are breaking on the first iteration. Use an if-statement to check if the line is empty instead.
The last entries in the client_details dict are overwritten on each iteration. Create a new entry instead, probably using the client_id as key.
It is recommended you use a with context manager to open your file.
It is preferable to provide the name of your file to your function and let it open it instead of having a globally opened file.
Here is a fixed version of your code.
def get_all_client_id(file):
client_details = {}
with open(file, 'r') as f:
for line in f:
element = line.strip().split(',')
if element:
client_id, lat, long, *more = element
client_details[client_id] = {'lat': lat, 'long': long}
return client_details
clients_dict = get_all_client_id("C:\\Users\\clients.txt")

List Multiples of the Same Variable only Once

I have a python script that looks at a json file and lists variables to a CSV. The problem I am having is latitude and logitude are listed twice. Therefore, when I write the row, it looks at those variables and creates an output with duplicate values.
import csv, json, sys
def find_deep_value(d, key):
# Modified from https://stackoverflow.com/questions/48568649/convert-json-to-csv-using-python/48569129#48569129
if key in d:
return d[key]
for k in d.keys():
if isinstance(d[k], dict):
for j in find_deep_value(d[k], key):
return j
inputFile = open("pywu.cache.json", 'r') # open json file
outputFile = open("CurrentObs.csv", 'w') # load csv file
data = json.load(inputFile) # load json content
inputFile.close() # close the input file
output = csv.writer(outputFile) # create a csv.write
# Gives you latitude coordinates from within the json
lat = find_deep_value(data, "latitude")
# Gives you longitude coordinates from within the json
lon = find_deep_value(data, "longitude")
# Gives you a list of weather from within the json
weather = find_deep_value(data, "weather")
# Gives you a list of temperature_strings from within the json
temp = find_deep_value(data, "temperature_string")
output.writerow([lat, lon, weather, temp])
outputFile.close()
Is there a way to only list them once?

You need to use return rather than yield. Yield is for generators. Once you fix that, you'll also need to change
list(find_deep_value(data, "latitude"))
to
find_deep_value(data, "latitude")
for each of those lines. And finally, change
output.writerow(lat + lon + weather + temp)
to
output.writerow([lat, lon, weather, temp])
What's happening (you might want to read up on generators first) is when a key is not in the top-level dictionary, you start looping through them, and when the first 'latitude' is reached, the yield keyword returns a generator object. You have that generator wrapped in list() which immediately unpacks the entire generator into a list. So if you have more than one sub-dictionary with the given key in it, you're going to end up looking through and finding every single one.

`return` exits the function after the first iteration of the loop

I know I am missing something really small concept here.
Here is what I am trying to do:
- Return all titles in the file with "*.html" extensions in the directory.
However, the function I wrote generated only first files title. But if I use "print" it prints all.
def titles():
for file_name in glob.glob(os.path.join(dir_path, "*.html")):
with open(file_name) as html_file:
soup = BeautifulSoup(html_file)
return str(soup.title.get_text().strip())
titles()

Return exits within the function, giving you only the result of the first iteration. Once the function returns, control is passed back to the caller. It does not resume.
As a solution, you have 2 options.
Option 1 (recommended for a large amount of data): Change return to yield. Using yield converts your function into a generator from which you can loop across its return values:
def titles():
for file_name in glob.glob(os.path.join(dir_path, "*.html")):
with open(file_name) as html_file:
soup = BeautifulSoup(html_file)
yield soup.title.get_text().strip() # yield inside the loop, happens multiple times
for s in titles():
print(s)
Option 2: Store all your output in a list and return the list at the end:
def titles():
data = []
for file_name in glob.glob(os.path.join(dir_path, "*.html")):
with open(file_name) as html_file:
soup = BeautifulSoup(html_file)
data.append(soup.title.get_text().strip())
return data # return outside the loop, happens once
print(titles())

You have two choices. Either add each result to a local data structure (say, a list) in the loop and return the list after the loop; or create this function to be a generator and yield on each result in the loop (no return).
The return approach is ok for smaller data sets. The generator approach is more friendly or even necessary for larger data sets.

TypeError: list indices must be integers or slices, not str

I've got two lists that I want to merge into a single array and finally put it in a csv file.
How I can avoid this error :
def fill_csv(self, array_urls, array_dates, csv_file_path):
result_array = []
array_length = str(len(array_dates))
# We fill the CSV file
file = open(csv_file_path, "w")
csv_file = csv.writer(file, delimiter=';', lineterminator='\n')
# We merge the two arrays in one
for i in array_length:
result_array[i][0].append(array_urls[i])
result_array[i][1].append(array_dates[i])
i += 1
csv_file.writerows(result_array)
And got :
File "C:\Users\--\gcscan.py", line 63, in fill_csv
result_array[i][0].append(array_urls[i])
TypeError: list indices must be integers or slices, not str
How can my count work ?

First, array_length should be an integer and not a string:
array_length = len(array_dates)
Second, your for loop should be constructed using range:
for i in range(array_length): # Use `xrange` for python 2.
Third, i will increment automatically, so delete the following line:
i += 1
Note, one could also just zip the two lists given that they have the same length:
import csv
dates = ['2020-01-01', '2020-01-02', '2020-01-03']
urls = ['www.abc.com', 'www.cnn.com', 'www.nbc.com']
csv_file_patch = '/path/to/filename.csv'
with open(csv_file_patch, 'w') as fout:
csv_file = csv.writer(fout, delimiter=';', lineterminator='\n')
result_array = zip(dates, urls)
csv_file.writerows(result_array)

Follow up on Abdeali Chandanwala answer above (couldn't comment because rep<50) -
TL;DR: I was trying to iterate through a list of dictionaries incorrectly by focusing to iterate over the keys in the dictionary but instead had to iterate over the dictionaries themselves!
I came across the same error while having a structure like this:
{
"Data":[
{
"RoomCode":"10",
"Name":"Rohit",
"Email":"rohit#123.com"
},
{
"RoomCode":"20"
"Name":"Karan",
"Email":"karan#123.com"
}
]
}
And I was trying to append the names in a list like this-
Fixed it by-

I had same error and the mistake was that I had added list and dictionary into the same list (object) and when was iterating over the list of dictionaries and hit a list type object then I would get this error since I was trying to access keys within each dictionary.
I had to made sure that I only added dictionary objects to that list

In my case I was trying to change the value of a dict key but since my dict was there in a for loop and was getting changed to type list i was getting the same error.
for value in source_list:
my_dict['my_key']=some_val
dict=list(mydict)
exctraction0 = dict[0]
i resolved it by making sure the type of dict remains the same by making a deepcopy and re-initializing after every iteration(that is what the use-case was all about).
copy_dict = copy.deepcopy(my_dict)
for value in source_list:
my_dict =copy.deepcopy(copy_dict)
my_dict['my_key']=some_val
dict=list(mydict)
exctraction0 = dict[0]

I received this error overloading a function in python where one function wrapped another:
def getsomething(build_datastruct_inputs : list[str]) -> int:
# builds datastruct and calls getsomething
return getsomething(buildit(build_datastruct_inputs))
def getsomething(datastruct : list[int]) -> int:
# code
# received this error on first use of 'datastruct'
Fix was to not overload and use unique method name.
def getsomething_build(build_datastruct_inputs : list[str]) -> int:
# builds datastruct and calls getsomething
return getsomething_ds(buildit(build_datastruct_inputs))
def getsomething_ds(datastruct : list[int]) -> int:
# code
# works fine again regardless of whether invoked directly/indirectly
Another fix could be to use python multipledispatch package which will let you overload and figures this out for you.
Was a bit confusing because where the error was occuring (nor message) corresponded to what cause was. I thought I had seen that python supported overloading natively but now I've learned it's implementation requires more work from the user.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to loop through an document saving every results that applies - python

Related

Reading from nested json and getting None type Error -> try/except

How to return python dictionary to use in other function?

List Multiples of the Same Variable only Once

`return` exits the function after the first iteration of the loop

TypeError: list indices must be integers or slices, not str

Categories

Resources