Python JSON append if value doesn't exist - python

I've got a json file with 30-ish, blocks of "dicts" where every block has and ID, like this:
{
"ID": "23926695",
"webpage_url": "https://.com",
"logo_url": null,
"headline": "aewafs",
"application_deadline": "2020-03-31T23:59:59",
}
Since my script pulls information in the same way from an API more than once, I would like to append new "blocks" to the json file only if the ID doesn't already exist in the JSON file.
I've got something like this so far:
import os
check_empty = os.stat('pbdb.json').st_size
if check_empty == 0:
with open('pbdb.json', 'w') as f:
f.write('[\n]') # Writes '[' then linebreaks with '\n' and writes ']'
output = json.load(open("pbdb.json"))
for i in jobs:
output.append({
'ID': job_id,
'Title': jobtitle,
'Employer' : company,
'Employment type' : emptype,
'Fulltime' : tid,
'Deadline' : deadline,
'Link' : webpage
})
with open('pbdb.json', 'w') as job_data_file:
json.dump(output, job_data_file)
but I would like to only do the "output.append" part if the ID doesn't exist in the Json file.

I am not able to complete the code you provided but I added an example to show how you can achieve the none duplicate list of jobs(hopefully it helps):
# suppose `data` is you input data with duplicate ids included
data = [{'id': 1, 'name': 'john'}, {'id': 1, 'name': 'mary'}, {'id': 2, 'name': 'george'}]
# using dictionary comprehension you can eliminate the duplicates and finally get the results by calling the `values` method on dict.
noduplicate = list({itm['id']:itm for itm in data}.values())
with open('pbdb.json', 'w') as job_data_file:
json.dump(noduplicate, job_data_file)

I'll just go with a database guys, thank you for your time, we can close this thread now

Related

How to take from key value and take from this value another value

How can i get needed value, because i send post request to other site and cant edit answer from site.
I have this dict from responded content:
{'username': 'DeadFinder', 'subscriptions': [{'subscription': 'default', 'expiry': '1635683460'}], 'ip': 'not at this life'}
How you can see in this dict there is a key subscriptions, i'm need value expiry(this is timestamp) but how can i get this value if when i'm trying to call this value i'm not see any results (code not gives needed value), maybe any variants how to get this value? I'm not finded anything like this.
Maybe my small part of code can smally help you but i doubt.
data1 = {f"hwid":"", "type":"login", "username": {username}, "pass": {password},
"sessionid":f"{response_cut2}", "name":"test_app", "ownerid":"5OLbm5S3fS"}
url1 = "nope"
response1 = requests.post(url1, data1)
data = response1.json()
#get = data.get('expiry')
file_write = open("test.txt", "w")
file_write.write(str(data))
file_write.close()
for key in data.keys():
if key == 'info':
print (data[key])
Are you trying to achieve this as result ?
data = {'username': 'DeadFinder', 'subscriptions': [{'subscription': 'default', 'expiry': '1635683460'}], 'ip': 'not at this life'}
print(data['subscriptions'][0]['expiry'])
# first get 'subscriptions' which returns an array,
# so use [0] to get this dict {'subscription': 'default', 'expiry': '1635683460'}
# then get 'expiry'
EDIT : In case subscriptions has multiple values then use for loop
subscriptions = data['subscriptions']
for subscription in subscriptions:
print(subscription['expiry'])
Output
1635683460

AttributeError: 'dict' object has no attribute 'split'

I am trying to run this code where data of a dictionary is saved in a separate csv file.
Here is the dict:
body = {
'dont-ask-for-email': 0,
'action': 'submit_user_review',
'post_id': 76196,
'email': email_random(),
'subscribe': 1,
'previous_hosting_id': prev_hosting_comp_random(),
'fb_token': '',
'title': review_title_random(),
'summary': summary_random(),
'score_pricing': star_random(),
'score_userfriendly': star_random(),
'score_support': star_random(),
'score_features': star_random(),
'hosting_type': hosting_type_random(),
'author': name_random(),
'social_link': '',
'site': '',
'screenshot[image][]': '',
'screenshot[description][]': '',
'user_data_process_agreement': 1,
'user_email_popup': '',
'subscribe_popup': 1,
'email_asked': 1
}
Now this is the code to write in a CSV file and finally save it:
columns = []
rows = []
chunks = body.split('}')
for chunk in chunks:
row = []
if len(chunk)>1:
entry = chunk.replace('{','').strip().split(',')
for e in entry:
item = e.strip().split(':')
if len(item)==2:
row.append(item[1])
if chunks.index(chunk)==0:
columns.append(item[0])
rows.append(row)
df = pd.DataFrame(rows, columns = columns)
df.head()
df.to_csv ('r3edata.csv', index = False, header = True)
but this is the error I get:
Traceback (most recent call last):
File "codeOffshoreupdated.py", line 125, in <module>
chunks = body.split('}')
AttributeError: 'dict' object has no attribute 'split'
I know that dict has no attribute named split but how do I fix it?
Edit:
format of the CSV I want:
dont-ask-for-email, action, post_id, email, subscribe, previous_hosting_id, fb_token, title, summary, score_pricing, score_userfriendly, score_support, score_features, hosting_type,author, social_link, site, screenshot[image][],screenshot[description][],user_data_process_agreement,user_email_popup,subscribe_popup,email_asked
0,'submit_user_review',76196,email_random(),1,prev_hosting_comp_random(),,review_title_random(),summary_random(),star_random(),star_random(),star_random(),star_random(),hosting_type_random(),name_random(),,,,,1,,1,1
Note: all these functions mentioned are return values
Edit2:
I am picking emails from the email_random() function like this:
def email_random():
with open('emaillist.txt') as emails:
read_emails = csv.reader(emails, delimiter = '\n')
return random.choice(list(read_emails))[0]
and the emaillist.txt is like this:
xyz#gmail.com
xya#gmail.com
xyb#gmail.com
xyc#gmail.com
xyd#gmail.com
other functions are also picking the data from the files like this too.
Since body is a dictionary, you don't have to a any manual parsing to get it into a CSV format.
If you want the function calls (like email_random()) to be written into the CSV as such, you need to wrap them into quotes (as I have done below). If you want them to resolve as function calls and write the results, you can keep them as they are.
import csv
def email_random():
return "john#example.com"
body = {
'dont-ask-for-email': 0,
'action': 'submit_user_review',
'post_id': 76196,
'email': email_random(),
'subscribe': 1,
'previous_hosting_id': "prev_hosting_comp_random()",
'fb_token': '',
'title': "review_title_random()",
'summary': "summary_random()",
'score_pricing': "star_random()",
'score_userfriendly': "star_random()",
'score_support': "star_random()",
'score_features': "star_random()",
'hosting_type': "hosting_type_random()",
'author': "name_random()",
'social_link': '',
'site': '',
'screenshot[image][]': '',
'screenshot[description][]': '',
'user_data_process_agreement': 1,
'user_email_popup': '',
'subscribe_popup': 1,
'email_asked': 1
}
with open('example.csv', 'w') as fhandle:
writer = csv.writer(fhandle)
items = body.items()
writer.writerow([key for key, value in items])
writer.writerow([value for key, value in items])
What we do here is:
with open('example.csv', 'w') as fhandle:
this opens a new file (named example.csv) with writing permissions ('w') and stores the reference into variable fhandle. If using with is not familiar to you, you can learn more about them from this PEP.
body.items() will return an iterable of tuples (this is done to guarantee dictionary items are returned in the same order). The output of this will look like [('dont-ask-for-email', 0), ('action', 'submit_user_review'), ...].
We can then write first all the keys using a list comprehension and to the next row, we write all the values.
This results in
dont-ask-for-email,action,post_id,email,subscribe,previous_hosting_id,fb_token,title,summary,score_pricing,score_userfriendly,score_support,score_features,hosting_type,author,social_link,site,screenshot[image][],screenshot[description][],user_data_process_agreement,user_email_popup,subscribe_popup,email_asked
0,submit_user_review,76196,john#example.com,1,prev_hosting_comp_random(),,review_title_random(),summary_random(),star_random(),star_random(),star_random(),star_random(),hosting_type_random(),name_random(),,,,,1,,1,1

Python - CSV File to Dict with Dataflow Template

I am trying to process a CSV file into a dict using a Dataflow template and Python.
As it is a template I have to use ReadFromText from the textio module, to be able to provide the path at runtime.
| beam.io.ReadFromText(contact_options.path)
All I need is to be able to extract the first line of this text/csv file, I can then use this data in DictReader as the fieldnames.
If I use split lines it brings back a each element of the text file in a list:
return element.splitlines()
or
csv_data = []
split_element = element.split('\n')
for row in split_element:
csv_data.append(row)
return csv_data
['phone_number', 'cid', 'first_name', 'last_name']
[' ', '101XXXXX', 'MurXXX', 'LevXXXX']
['3052XXXXX', '109XXXXX', 'MerXXXX', 'CoXXXX']
['954XXXXX', '10XXXXXX', 'RoXXXX', 'MaXXXXX']
Although If I then use say element[0], it just brings everythin back without the list brackets. I have also tried splitting by '\n', then using a for loop to produce a list object, although it produces almost the same result.
I cannot rely on using predetermined fieldnames as the csv files to be processed will all have different fieldnames and DictReader will not work effectively without fieldnames given.
EDIT:
The expected output is:
[{'phone_Number': '561XXXXX', 'first_Name': '', 'last_Name': 'BeXXXX', 'cid': '745XXXXX'}, {'phone_Number': '561XXXXX', 'first_Name': 'A', 'last_Name': 'BXXXX', 'cid': '61XXXXX'}]
EDIT:
Element contents:
"phone_Number","cid","first_Name","last_Name"
"5616XXXXX","745XXXX","","BeXXXXX"
"561XXXXXX","61XXXXX","A","BXXXXXXt"
"95XXXXXXX","6XXXXXX","A","BXXXXXX"
"727XXXXXX","98XXXXXX","A","CaXXXXXX"
Use Pandas to load the values and use first line as colheaders
import pandas as pd
a_big_list=[['phone_number', 'cid', 'first_name', 'last_name'],
[' ', '101XXXXX', 'MurXXX', 'LevXXXX'],
['3052XXXXX', '109XXXXX', 'MerXXXX', 'CoXXXX'],
['954XXXXX', '10XXXXXX', 'RoXXXX', 'MaXXXXX']]
df=pd.DataFrame(a_big_list[1:],columns=a_big_list[0])
df.to_dict('records')
#[{'cid': '101XXXXX',
'first_name': 'MurXXX',
'last_name': 'LevXXXX',
'phone_number': ' '},
{'cid': '109XXXXX',
'first_name': 'MerXXXX',
'last_name': 'CoXXXX',
'phone_number': '3052XXXXX'},
{'cid': '10XXXXXX',
'first_name': 'RoXXXX',
'last_name': 'MaXXXXX',
'phone_number': '954XXXXX'}]
I was able to figure this problem out with inspiration from #mad_'s answer, but this still didn't give me the correct answer initally, as I needed to first group my pcollection into one element. I found a way of doing this inspired from this answer from Jiayuan Ma, and slightly altered it as so:
class Group(beam.DoFn):
def __init__(self):
self._buffer = []
def process(self, element):
self._buffer.append(element)
def finish_bundle(self):
if len(self._buffer) != 0:
yield list(self._buffer)
self._buffer = []
lines = p | 'File reading' >> ReadFromText(known_args.input)
| 'Group' >> beam.ParDo(Group(known_args.N)
...
Thus it grouped the entire CSV file as one object, and then I was able to apply mad_'s method to turn it into a dictionary.

How to format a csv file to put all information on a line and then move down?

I have a file that takes information and writes it to a csv file. I am having trouble getting it to format the way I want. It is looping 10 times and the information is there I can confirm. I am including the code to show you the exact setup of the csv writing part.
Here is my code:
outfile = open('Accounts_Details.csv', 'a')
for i in range(0, 11):
#Calling all the above functions
soc_auth_requests()
create_account()
config_admin_create()
account_user_create()
account_activate()
account_config_DNS_create()
#Creating the dictionary for the CSV file with the data fields made and modified from before
#It is necessary this be done after the above method calls to ensure the data field values are correct
data = {
'Account_Name': acc_name,
'Account_Id': acc_id,
'User_Email': user_email,
'User_id': user_id
}
#Creating a csv file and writing the dictionary titled "data" to it
for key, value in sorted(data.items()):
outfile.write('\t' + str(value))
outfile.write('\n')
So I have four bits of data in the dict and I want the format to be laid out in the csv file so that the four bits of info are put on one line and when it loops through the for loop it moves to the next line and does the same there.
Ex.
name1, id1, email1, uId1
name2, id2, email2, uId2
name3, id3, email3, uId3
I assume it has to do with how I open the file, but I am not sure and can't figure it out.
Thanks for the help!
Here is the current output I am getting. I want all the 1's to be on one line and then move down.
name1
id1
email1
uID1
name2
id2
email2
uID2
Try deleting arguments from last statement:
for key, value in sorted(data.items()):
outfile.write('\t' + str(value))
# outfile.write('\n')
# modified for
outfile.close()
please let me know how it was and if this worked!
:)
I don't think your file opening arguments are the issue -- I wasn't able to replicate the issue. However, you could probably streamline the code and remove any possibility of an issue by using list comprehensions for your data:
for i in range(12):
data = {
'Account_Name': 'AccountName',
'Account_Id': '#12345',
'User_Email': 'a#b',
'User_id': 'LOGGER'
}
with open("test.txt", "a") as f:
f.write('{}\n'.format(', '.join(data.values())))
Just use the csv module which will automagically format the line for you:
outfile = open('Accounts_Details.csv', 'a')
writer = csv.DictWriter(outfile, [ 'Account_Name', 'Account_Id', 'User_Email', 'User_id' ])
# optionaly if you want a header line:
writer.writeheader()
for i in range(0, 11):
...
data = {
'Account_Name': acc_name,
'Account_Id': acc_id,
'User_Email': user_email,
'User_id': user_id
}
writer.writerow(data)
outfile.close()

writing json-ish list to csv, line by line, in python for bitcoin addresses

I'm querying the onename api in an effort to get the bitcoin addresses of all the users.
At the moment I'm getting all the user information as a json-esque list, and then piping the output to a file, it looks like this:
[{'0': {'owner_address': '1Q2Tv6f9vXbdoxRmGwNrHbjrrK4Hv6jCsz', 'zone_file': '{"avatar": {"url": "https://s3.amazonaws.com/kd4/111"}, "bitcoin": {"address": "1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh"}, "cover": {"url": "https://s3.amazonaws.com/dx3/111"}, "facebook": {"proof": {"url": "https://facebook.com/jasondrake1978/posts/10152769170542776"}, "username": "jasondrake1978"}, "graph": {"url": "https://s3.amazonaws.com/grph/111"}, "location": {"formatted": "Mechanicsville, Va"}, "name": {"formatted": "Jason Drake"}, "twitter": {"username": "000001"}, "v": "0.2", "website": "http://1642.com"}', 'verifications': [{'proof_url': 'https://facebook.com/jasondrake1978/posts/10152769170542776', 'service': 'facebook', 'valid': False, 'identifier': 'jasondrake1978'}], 'profile': {'website': 'http://1642.com', 'cover': {'url': 'https://s3.amazonaws.com/dx3/111'}, 'facebook': {'proof': {'url': 'https://facebook.com/jasondrake1978/posts/10152769170542776'}, 'username': 'jasondrake1978'}, 'twitter': {'username': '000001'}, 'bitcoin': {'address': '1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh'}, 'name': {'formatted': 'Jason Drake'}, 'graph': {'url': 'https://s3.amazonaws.com/grph/111'}, 'location': {'formatted': 'Mechanicsville, Va'}, 'avatar': {'url': 'https://s3.amazonaws.com/kd4/111'}, 'v': '0.2'}}}]
what I'm really interested in is the field {"address": "1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh"}, the rest of the stuff I don't need, I just want the addresses of every user.
Is there a way that I can just write only the addresses to a file using python?
I'm trying to write it as something like:
1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh,
1GA9RVZHuEE8zm4ooMTiqLicfnvymhzRVm,
1BJdMS9E5TUXxJcAvBriwvDoXmVeJfKiFV,
1NmLvYVEZqPGeQNcgFS3DdghpoqaH4r5Xh,
...
and so on.
I've tried a number of different ways using dump, dumps, etc. but I haven't yet been able to pin it down.
My code looks like this:
import os
import json
import requests
#import py2neo
import csv
# set up authentication parameters
#py2neo.authenticate("46.101.180.63:7474", "neo4j", "uni-bonn")
# Connect to graph and add constraints.
neo4jUrl = os.environ.get('NEO4J_URL',"http://46.101.180.63:7474/db/data/")
#graph = py2neo.Graph(neo4jUrl)
# Add uniqueness constraints.
#graph.run("CREATE CONSTRAINT ON (q:Person) ASSERT q.id IS UNIQUE;")
# Build URL.
apiUrl = "https://api.onename.com/v1/users"
# apiUrl = "https://raw.githubusercontent.com/s-matthew-english/26.04/master/test.json"
# Send GET request.
Allusersjson = requests.get(apiUrl, headers = {"accept":"application/json"}).json()
#print(json)])
UsersDetails=[]
for username in Allusersjson['usernames']:
usernamex= username[:-3]
apiUrl2="https://api.onename.com/v1/users/"+usernamex+"?app-id=demo-app-id&app-secret=demo-app-secret"
userinfo=requests.get(apiUrl2, headers = {"accept":"application/json"}).json()
# try:
# if('bitcoin' not in userinfo[usernamex]['profile']):
# continue
# else:
# UsersDetails.append(userinfo)
# except:
# continue
try:
address = userinfo[usernamex]["profile"]["bitcoin"]["address"]
UsersDetails.append(address)
except KeyError:
pass # no address
out = "\n".join(UsersDetails)
print(out)
open("out.csv", "w").write(out)
# f = csv.writer(open("test.csv", "wb+"))
# Build query.
query = """
RETURN {json}
"""
# Send Cypher query.
# py2neo.CypherQuery(graph, query).run(json=json)
# graph.run(query).run(json=json)
#graph.run(query,json=json)
anyway, in such a situation, what's the best way to write out those addresses as csv :/
UPDATE
I ran it, and at first it worked, but then I got the following error:
Instead of adding all the information to the UsersDetails list
UsersDetails.append(userinfo)
you can add just the relevant part (address)
try:
address = userinfo[usernamex]["profile"]["bitcoin"]["address"]
UsersDetails.append(address)
except KeyError:
pass # no address
except TypeError:
pass # illformed data
To print the values to the screen:
out = "\n".join(UsersDetails)
print(out)
(replace "\n" with "," for comma separated output, instead of one per line)
To save to a file:
open("out.csv", "w").write(out)
You need to reformat the list, either through map() or a list comprehension, to get it down to just the information you want. For example, if the top-level key used in the response from the api.onename.com API is always 0, you can do something like this
UsersAddresses = [user['0']['profile']['bitcoin']['address'] for user in UsersDetails]

Categories

Resources