I have a database of scientific articles with their authors, the date of publication (on arXiV) and their respective arXiV id. Now, I want to add to this database the number of citations each year after the article has been created.
For instance, I would to like to retrieve the graph on the right hand side (example).
Is there an API that could help me?
I could use this method here opencitationAPI, but I wondered if there was a more straightforward way using the inspirehep data.
I figured out how to do that by using the inspirehep api. A sleeping time should also be considered.
import pandas as pd
import requests
from collections import defaultdict
ihep_search_arxiv = "https://inspirehep.net/api/arxiv/"
ihep_search_article = "https://inspirehep.net/api/literature?sort=mostcited&size=100&page=1&q=refersto%3Arecid%3A"
year = [str(x+1) for x in range(2009,2022)]
def count_year(year, input_list):
#counting the number of citations each year
year_count = {}
for y in year:
if input_list[0] == 'NaN':
year_count[y] = 0
else:
year_count[y] = input_list.count(y)
return year_count
def get_cnumber():
citation_url = []
for id in arxiv_id:
inspirehep_url_arxiv = f"{ihep_search_arxiv}{id}"
control_number = requests.get(inspirehep_url_arxiv).json()["metadata"]["control_number"]
citation_url.append(f"{ihep_search_article}{control_number}")
return citation_url
def get_citations():
citation_url = get_cnumber()
citation_per_year = pd.DataFrame(columns=year)
citation_date = defaultdict(list)
for i, url in enumerate(citation_url):
data_article = requests.get(url).json()
if len(data_article["hits"]["hits"]) == 0:
citation_date[i].append('NaN')
else :
for j, _ in enumerate(data_article["hits"]["hits"]):
citation_date[i].append(data_article["hits"]["hits"][j]["created"][:4])
for p, _ in enumerate(citation_date):
citation_per_year = citation_per_year.append(count_year(year,citation_date[p]), True)
citation_per_year.insert(0,"arxiv_id",arxiv_id,True)
return citation_per_year
arxiv_id = recollect_data() #list of arxiv ids collected in a separate way
print(get_citations())
I'm not sure I am approaching this in the right way.
Scenario:
I have two SQL tables that contain rent information. One table contains rent due, and the other contains rent received.
I'm trying to build a rent book which takes the data from both tables for a specific lease and generates a date ordered statement which will be displayed on a webpage.
I'm using Python, Flask and SQL Alchemy.
I am currently learning Python, so I'm not sure if my approach is the best.
I've created a dictionary which contains the keys 'Date', 'Payment type' and 'Payment Amount', and in each of these keys I store a list which contains the data from my SQL queries. The bit im struggling on is how to sort the dictionary so it sorts by the date key, keeping the values in the other keys aligned to their date.
lease_id = 5
dates_list = []
type_list = []
amounts_list = []
rentbook_dict = {}
payments_due = Expected_Rent_Model.query.filter(Expected_Rent_Model.lease_id == lease_id).all()
payments_received = Rent_And_Fee_Income_Model.query.filter(Rent_And_Fee_Income_Model.lease_id == lease_id).all()
for item in payments_due:
dates_list.append(item.expected_rent_date)
type_list.append('Rent Due')
amounts_list.append(item.expected_rent_amount)
for item in payments_received:
dates_list.append(item.payment_date)
type_list.append(item.payment_type)
amounts_list.append(item.payment_amount)
rentbook_dict.setdefault('Date',[]).append(dates_list)
rentbook_dict.setdefault('Type',[]).append(type_list)
rentbook_dict.setdefault('Amount',[]).append(amounts_list)
I was then going to use a for loop within the flask template to iterate through each value and display it in a table on the page.
Or am I approaching this in the wrong way?
so I managed to get this working just using zipped list. Im sure there is a better way for me to accomplish this but im pleased I've got it working.
lease_id = 5
payments_due = Expected_Rent_Model.query.filter(Expected_Rent_Model.lease_id == lease_id).all()
payments_received = Rent_And_Fee_Income_Model.query.filter(Rent_And_Fee_Income_Model.lease_id == lease_id).all()
total_due = 0
for debit in payments_due:
total_due = total_due + int(debit.expected_rent_amount)
total_received = 0
for income in payments_received:
total_received = total_received + int(income.payment_amount)
balance = total_received - total_due
if balance < 0 :
arrears = "This account is in arrears"
else:
arrears = ""
dates_list = []
type_list = []
amounts_list = []
for item in payments_due:
dates_list.append(item.expected_rent_date)
type_list.append('Rent Due')
amounts_list.append(item.expected_rent_amount)
for item in payments_received:
dates_list.append(item.payment_date)
type_list.append(item.payment_type)
amounts_list.append(item.payment_amount)
payment_data = zip(dates_list, type_list, amounts_list)
sorted_payment_data = sorted(payment_data)
tuples = zip(*sorted_payment_data)
list1, list2, list3 = [ list(tuple) for tuple in tuples]
return(render_template('rentbook.html',
payment_data = zip(list1,list2,list3),
total_due = total_due,
total_received = total_received,
balance = balance))
I am trying to work out how to iterate over a list and print out each item with a print statement describing what element is. my project is to create a user management system and print out something similar to the image I have attached.
The output I am trying to produce
The output I am getting
My code:
records = 0
userFirst = ["John"]
userLast = ["Doe"]
autoUsername = ["Johndoe91"]
autoPassword = ["123456789"]
hiddenPassword = ["*****789"]
userRole = ["User"]
userDept = ["Administration"]
users = []
confidentialUserDetails = []
users.append(userFirst + userLast + userRole + userDept + autoUsername + autoPassword)
confidentialUserDetails.append(users)
for row in range(len(confidentialUserDetails)):
records += 1
print("-" * 25)
print("Record: ", records)
for col in range(len(confidentialUserDetails[row])):
print(confidentialUserDetails[row][col])
Any help would be greatly appreciated. :)
Your data structures are unusual. I'm assuming that those lists are going to be provided to your code somehow and will, in practice, have multiple user details appended to them so that they are all the same length.
Anyhow, you can achieve the output you're looking for with some readable f-strings like this:
from functools import reduce
userFirst = ["John"]
userLast = ["Doe"]
autoUsername = ["Johndoe91"]
autoPassword = ["123456789"]
hiddenPassword = ["*****789"]
userRole = ["User"]
userDept = ["Administration"]
for row in range(len(userFirst)):
s = (f"""\
Name : {userFirst[row]} {userLast[row]}
Role : {userRole[row]}
Department : {userDept[row]}
Username : {autoUsername[row]}
Password : {hiddenPassword[row]}""")
maxlen = reduce(lambda x,y: max(x, len(y)), s.split("\n"), 0)
print(f"{s}\n{'-'*maxlen}\n")
Output:
Name : John Doe
Role : User
Department : Administration
Username : Johndoe91
Password : *****789
------------------------------
I created a dictionary called user instead of your list and after that I appended it to the second list and finally I printed the key and the value of the dictionary.
Also to get the full name I joined userFirst and userLast as string.
Code:
records = 0
userFirst = ["John"]
userLast = ["Doe"]
autoUsername = ["Johndoe91"]
autoPassword = ["123456789"]
hiddenPassword = ["*****789"]
userRole = ["User"]
userDept = ["Administration"]
confidentialUserDetails = [] # 2d list for asterisked passwords
users={'Name' : [' '.join(userFirst + userLast)] ,'Role' : userRole , 'Departement' : userDept ,'Username' : autoUsername ,'Password' : hiddenPassword }
confidentialUserDetails.append(users)
for user in confidentialUserDetails:
records += 1
print("-" * 25)
print("Record: ", records)
for ele,v in user.items():
print(ele,':',v[0])
Output:
-------------------------
Record: 1
Name : John Doe
Role : User
Departement : Administration
Username : Johndoe91
Password : *****789
Using a dictionary or f strings like the two other answers suggested is probably the best. But if you just want to use your current code to print your desired output, you can simply grab each item by its index number in your print statement.
Change the line:
print(confidentialUserDetails[row][col])
To something like this:
print("Name : ", confidentialUserDetails[row][col][0], confidentialUserDetails[row][col][1])
print("Role: : ", confidentialUserDetails[row][col][2])
Output:
-------------------------
Record: 1
Name : John Doe
Role: : User
I have this JSON output in a HTML and I want to check the stock. I build everything already but I am stuck at the part when Python needs to tell me if the stock is true or not.
All the numbers are stores around the Netherlands. I just want to code that Python prints ''In Stock'' if only ONE of them is TRUE. I did the '' if ... or ... == 'True', but then if one of the stores is False, it's telling me it's still out of stock.
Any idea what kind of code I need to use to let Python tell me if one of the stores has stock?
I am using BS4, Beautifulsoup to parse the JSON.
Just stuck at the ''If... == 'True' part.
Thanks!
{"1665134":{"642":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1298":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1299":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1322":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1325":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1966":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1208":{"hasStock":false,"hasShowModel":false,"lowStock":false},"193":{"hasStock":false,"hasShowModel":false,"lowStock":false},"194":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1102":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1360":{"hasStock":false,"hasShowModel":false,"lowStock":false},"852":{"hasStock":false,"hasShowModel":false,"lowStock":false},"853":{"hasStock":false,"hasShowModel":false,"lowStock":false},"854":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1239":{"hasStock":false,"hasShowModel":false,"lowStock":false},"855":{"hasStock":false,"hasShowModel":false,"lowStock":false},"856":{"hasStock":false,"hasShowModel":false,"lowStock":false},"857":{"hasStock":false,"hasShowModel":false,"lowStock":false},"858":{"hasStock":false,"hasShowModel":false,"lowStock":false},"859":{"hasStock":false,"hasShowModel":false,"lowStock":false},"860":{"hasStock":false,"hasShowModel":false,"lowStock":false},"861":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1246":{"hasStock":false,"hasShowModel":false,"lowStock":false},"862":{"hasStock":false,"hasShowModel":false,"lowStock":false},"863":{"hasStock":false,"hasShowModel":false,"lowStock":false},"864":{"hasStock":false,"hasShowModel":false,"lowStock":false},"865":{"hasStock":false,"hasShowModel":false,"lowStock":false},"866":{"hasStock":false,"hasShowModel":false,"lowStock":false},"867":{"hasStock":false,"hasShowModel":false,"lowStock":false},"484":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1380":{"hasStock":false,"hasShowModel":false,"lowStock":false},"868":{"hasStock":false,"hasShowModel":false,"lowStock":false},"869":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1381":{"hasStock":false,"hasShowModel":false,"lowStock":false},"870":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1255":{"hasStock":false,"hasShowModel":false,"lowStock":false},"871":{"hasStock":false,"hasShowModel":false,"lowStock":false},"360":{"hasStock":false,"hasShowModel":false,"lowStock":false},"872":{"hasStock":false,"hasShowModel":false,"lowStock":false},"873":{"hasStock":false,"hasShowModel":false,"lowStock":false},"746":{"hasStock":false,"hasShowModel":false,"lowStock":false},"875":{"hasStock":false,"hasShowModel":false,"lowStock":false},"876":{"hasStock":false,"hasShowModel":false,"lowStock":false},"749":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1391":{"hasStock":false,"hasShowModel":false,"lowStock":false},"880":{"hasStock":false,"hasShowModel":false,"lowStock":false},"499":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1275":{"hasStock":false,"hasShowModel":false,"lowStock":false},"1149":{"hasStock":false,"hasShowModel":false,"lowStock":false},"637":{"hasStock":false,"hasShowModel":false,"lowStock":false}}}
Python code;
def monitor():
try:
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
voorraad = response.json()
v1 = (voorraad['{}'.format(productid)]['193']['hasStock'])
v2 = (voorraad['{}'.format(productid)]['194']['hasStock'])
v3 = (voorraad['{}'.format(productid)]['360']['hasStock'])
v4 = (voorraad['{}'.format(productid)]['484']['hasStock'])
v5 = (voorraad['{}'.format(productid)]['499']['hasStock'])
v6 = (voorraad['{}'.format(productid)]['637']['hasStock'])
v7 = (voorraad['{}'.format(productid)]['642']['hasStock'])
v8 = (voorraad['{}'.format(productid)]['746']['hasStock'])
v9 = (voorraad['{}'.format(productid)]['749']['hasStock'])
v10 = (voorraad['{}'.format(productid)]['852']['hasStock'])
v11 = (voorraad['{}'.format(productid)]['853']['hasStock'])
v12 = (voorraad['{}'.format(productid)]['854']['hasStock'])
v13 = (voorraad['{}'.format(productid)]['855']['hasStock'])
v14 = (voorraad['{}'.format(productid)]['856']['hasStock'])
v15 = (voorraad['{}'.format(productid)]['857']['hasStock'])
v16 = (voorraad['{}'.format(productid)]['858']['hasStock'])
v17 = (voorraad['{}'.format(productid)]['859']['hasStock'])
v18 = (voorraad['{}'.format(productid)]['860']['hasStock'])
v19 = (voorraad['{}'.format(productid)]['861']['hasStock'])
v20 = (voorraad['{}'.format(productid)]['862']['hasStock'])
v21 = (voorraad['{}'.format(productid)]['863']['hasStock'])
v22 = (voorraad['{}'.format(productid)]['864']['hasStock'])
v23 = (voorraad['{}'.format(productid)]['865']['hasStock'])
v24 = (voorraad['{}'.format(productid)]['866']['hasStock'])
v25 = (voorraad['{}'.format(productid)]['867']['hasStock'])
v26 = (voorraad['{}'.format(productid)]['868']['hasStock'])
v27 = (voorraad['{}'.format(productid)]['869']['hasStock'])
v28 = (voorraad['{}'.format(productid)]['870']['hasStock'])
v29 = (voorraad['{}'.format(productid)]['871']['hasStock'])
v30 = (voorraad['{}'.format(productid)]['872']['hasStock'])
v31 = (voorraad['{}'.format(productid)]['873']['hasStock'])
v32 = (voorraad['{}'.format(productid)]['875']['hasStock'])
v33 = (voorraad['{}'.format(productid)]['876']['hasStock'])
v34 = (voorraad['{}'.format(productid)]['880']['hasStock'])
v35 = (voorraad['{}'.format(productid)]['1102']['hasStock'])
v36 = (voorraad['{}'.format(productid)]['1149']['hasStock'])
v37 = (voorraad['{}'.format(productid)]['1208']['hasStock'])
v38 = (voorraad['{}'.format(productid)]['1239']['hasStock'])
v39 = (voorraad['{}'.format(productid)]['1246']['hasStock'])
v40 = (voorraad['{}'.format(productid)]['1255']['hasStock'])
v41 = (voorraad['{}'.format(productid)]['1275']['hasStock'])
v42 = (voorraad['{}'.format(productid)]['1298']['hasStock'])
v43 = (voorraad['{}'.format(productid)]['1299']['hasStock'])
v44 = (voorraad['{}'.format(productid)]['1322']['hasStock'])
v45 = (voorraad['{}'.format(productid)]['1325']['hasStock'])
v46 = (voorraad['{}'.format(productid)]['1360']['hasStock'])
v47 = (voorraad['{}'.format(productid)]['1380']['hasStock'])
v48 = (voorraad['{}'.format(productid)]['1381']['hasStock'])
v49 = (voorraad['{}'.format(productid)]['1391']['hasStock'])
v50 = (voorraad['{}'.format(productid)]['1966']['hasStock'])
if any(v1, v2, v3):
print(colored('[{}] ' + 'IN STOCK | ' + (product_title), 'green').format(str(datetime.now())))
send_to_discord(product_title, webpagina, footerlogo, url, image_url)
time.sleep(50)
exit()
else:
print(colored('[{}] ' + 'OUT OF STOCK | ' + (product_title), 'red').format(str(datetime.now())))
time.sleep(2)
Any was a test, not familiar with it...
Your code will get way out of hand if you have that many manual entries that each do the same thing. First, I'd suggest making a list of product codes
products = [193, 194, 360, 384, ...]
the response.json() from js isn't what you'd use in python. first import
import json
then use json.loads() or json.dumps() to 'parse' and 'stringify' respectively
store = json.loads(<soup.whatever>)
and then I assume "1665134" is a merch id or something, which you can just iterate through subsequent objects
for product in store:
if(product['hasStock']):
# do stuff with stock
else:
# has no stock, you're sol
First, never write 50 variables manually when you can replace that with a loop...
assuming you need some specific ids, do the following:
ids = ['193','192','360'...] and loop over the array.
You said " just want to code that Python prints ''In Stock'' if only ONE of them is TRUE." So only one, if the value is true for more than 1 store, you want the program to return false (logically you need at least one, not only 1, but I'm just following what you said). Also, judging by the code you already wrote, you don't seem to care which specific store has the stock.
In this case, this should work for you:
ids = ['193','192','360'...]
occurancesOfTrue=0;
for i in ids:
if (voorraad['{}'.format(productid)][i]['hasStock']): occurancesOfTrue++;
if occurancesOfTrue==1:
print(colored('[{}] ' + 'IN STOCK | ' + (product_title), 'green').format(str(datetime.now())))
send_to_discord(product_title, webpagina, footerlogo, url, image_url)
time.sleep(50)
exit()
else:
print(colored('[{}] ' + 'OUT OF STOCK | ' + (product_title), 'red').format(str(datetime.now())))
time.sleep(2)
if you need it to be at least 1 occurrence, instead of 1 unique, replace if occurancesOfTrue==1: by if occurancesOfTrue>=1:
I have a python script that process several files of some gigabytes. With the following code I show below, I store some data into a list, which is stored into a dictionary snp_dict. The RAM consumption is huge. Looking at my code, could you suggest some ways to reduce RAM consumption, if any?
def extractAF(files_vcf):
z=0
snp_dict=dict()
for infile_name in sorted(files_vcf):
print ' * ' + infile_name
###single files
vcf_reader = vcf.Reader(open(infile_name, 'r'))
for record in vcf_reader:
snp_position='_'.join([record.CHROM, str(record.POS)])
ref_F = float(record.INFO['DP4'][0])
ref_R = float(record.INFO['DP4'][1])
alt_F = float(record.INFO['DP4'][2])
alt_R = float(record.INFO['DP4'][3])
AF = (alt_F+alt_R)/(alt_F+alt_R+ref_F+ref_R)
if not snp_position in snp_dict:
snp_dict[snp_position]=list((0) for _ in range(len(files_vcf)))
snp_dict[snp_position][z] = round(AF, 3) #record.INFO['DP4']
z+=1
return snp_dict
I finally adopted the following implementation with MySQL:
for infile_name in sorted(files_vcf):
print infile_name
###single files
vcf_reader = vcf.Reader(open(infile_name, 'r'))
for record in vcf_reader:
snp_position='_'.join([record.CHROM, str(record.POS)])
ref_F = float(record.INFO['DP4'][0])
ref_R = float(record.INFO['DP4'][1])
alt_F = float(record.INFO['DP4'][2])
alt_R = float(record.INFO['DP4'][3])
AF = (alt_F+alt_R)/(alt_F+alt_R+ref_F+ref_R)
if not snp_position in snp_dict:
sql_insert_table = "INSERT INTO snps VALUES ('" + snp_position + "'," + ",".join(list(('0') for _ in range(len(files_vcf)))) + ")"
cursor = db1.cursor()
cursor.execute(sql_insert_table)
db1.commit()
snp_dict.append(snp_position)
sql_update = "UPDATE snps SET " + str(z) + "g=" + str(AF) + " WHERE snp_pos='" + snp_position + "'";
cursor = db1.cursor()
cursor.execute(sql_update)
db1.commit()
z+=1
return snp_dict
For this sort of thing, you are probably better off using another data structure. A pandas DataFrame would work well in your situation.
The simplest solution would be to use an existing library, rather than writing your own parser. vcfnp can read vcf files into a format that is easily convertible to a pandas DataFrame. Something like this should work:
import pandas as pd
def extractAF(files_vcf):
dfs = []
for fname in sorted(files_vcf):
vars = vcfnp.variants(fname, fields=['CHROM', 'POS', 'DP4'])
snp_pos = np.char.add(np.char.add(vars.CHROM, '_'), record.POS.astype('S'))
dp4 = vars.DP4.astype('float')
AF = dp4[2:].sum(axis=0)/dp4.sum(axis=0)
dfs.append(pd.DataFrame(AF, index=snp_pos, columns=[fname]).T)
return pd.concat(dfs).fillna(0.0)
If you absolutely must use PyVCF, it will be slower, but hopefully this will at least be faster than your existing implementation, and should produce the same result as the above code:
def extractAF(files_vcf):
files_vcf = sorted(files_vcf)
dfs = []
for fname in files_vcf:
print ' * ' + fname
vcf_reader = vcf.Reader(open(fname, 'r'))
vars = ((rec.CHROM, rec.POS) + tuple(rec.INFO['DP4']) for rec in vcf_reader)
df = pd.DataFrame(vars, columns=['CHROMS', 'POS', 'ref_F', 'ref_R', 'alt_F', 'alt_R'])
df['snp_position'] = df['CHROMS'] + '_' + df['POS'].astype('S')
df_alt = df.loc[:, ('alt_F', 'alt_R')]
df_dp4 = df.loc[:, ('alt_F', 'alt_R', 'ref_F', 'ref_R')]
df[fname] = df_alt.sum(axis=1)/df_dp4.sum(axis=1)
df = df.set_index('snp_position', drop=True).loc[:, fname:fname].T
dfs.append(df)
return pd.concat(dfs).fillna(0.0)
Now lets say you wanted to read a particular snp_position, say contained in a variable snp_pos, that may or may not be there (from your comment), you wouldn't actually have to change anything:
all_vcf = extractAF(files_vcf)
if snp_pos in all_vcf:
linea_di_AF = all_vcf[snp_pos]
The result will be slightly different, though. It will be a pandas Series, which is like an array but can also be accessed like a dictionary:
all_vcf = extractAF(files_vcf)
if snp_pos in all_vcf:
linea_di_AF = all_vcf[snp_pos]
f_di_AF = linea_di_AF[files_vcf[0]]
This allows you to access a particular file/snp_pos pair directly:
all_vcf = extractAF(files_vcf)
if snp_pos in all_vcf:
f_di_AF = linea_di_AF[snp_pos][files_vcf[0]]
Or, better yet:
all_vcf = extractAF(files_vcf)
if snp_pos in all_vcf:
f_di_AF = linea_di_AF.loc[files_vcf[0], snp_pos]
Or you can get all snp_pos values for a given file:
all_vcf = extractAF(files_vcf)
fpos = linea_di_AF.loc[fname]