I have a Python script that pulls in all fields from JIRA. I have multiple projects within JIRA and the set of columns within each project is different. Hence I have the below script that works just fine.
## Login to Jira
jira = JIRA(basic_auth=('login#login.com', 'password'), options={'server': 'https://company.atlassian.net'})
# Pulling Proj_A tickets
issues = jira.search_issues('project= Proj_A',maxResults=False) ## Get Proj_A tickets
## Create the full df by normalizing the output
issue_list = []
for i in range(len(issues)):
result = json_normalize(issues[i].raw['fields'])
result['issue_id'] = issues[i]
result['issue_link'] = 'https://company.atlassian.net/browse/' + str(issues[i])
issue_list.append(result)
final_issue_df_a = pd.concat(issue_list, axis=0, sort=True).reset_index()
# Pulling Proj_B tickets
issues = jira.search_issues('project= Proj_B',maxResults=False) ## Get Proj_B tickets
## Create the full df by normalizing the output
issue_list = []
for i in range(len(issues)):
result = json_normalize(issues[i].raw['fields'])
result['issue_id'] = issues[i]
result['issue_link'] = 'https://company.atlassian.net/browse/' + str(issues[i])
issue_list.append(result)
final_issue_df_b = pd.concat(issue_list, axis=0, sort=True).reset_index()
# Concatenating fields from the 2 projects into one single dataframe
Final_DF = pd.concat([final_issue_df_a,final_issue_df_b], axis=0, ignore_index=True)
This above code works just fine. Now I am trying to optimize the script wherein I am trying to pass a loop that has list of all projects as below:
project = [Proj_A,Proj_B,Proj_C..]
This gets passed to the first line
(issues = jira.search_issues('project=',maxResults=False))
and iterates each project and pulls in relevant fields that gets stored in the final Dataframe.
Could anyone assist. Thanks
Use the project api to get a list of projects in JIRA. Then extract Project Name from each Project returned.
r = requests.get('https://<jira>/rest/api/2/project', auth=('username', 'password'), verify=False)
projects = r.json()
#Get name of each project
for i in projects:
print i['name']
You could do something like this:
projects = ['Proj_A', 'Proj_B', 'Proj_C']
final_df_list = []
for project in projects:
issues = jira.search_issues('project= '+project, maxResults=False)
# Rest of the code processing the issues obtained above
final_issue_df_x = pd.concat(issue_list, axis=0, sort=True).reset_index()
final_df_list.append(final_issue_df_x)
Final_DF = pd.concat(final_df_list, axis=0, ignore_index=True)
Related
so im not quite sure how to formulate the question, as im quite new in pythong and coding in general.
I have a GUI that displays already available information form a csv:
def updatetext(self):
"""adds information extracted from database already provided"""
df_subj = Content.extract_saved_data(self.date)
self.lineEditFirstDiagnosed.setText(str(df_subj["First_Diagnosed_preop"][0])) \
if str(df_subj["First_Diagnosed_preop"][0]) != 'nan' else self.lineEditFirstDiagnosed.setText('')
self.lineEditAdmNeurIndCheck.setText(str(df_subj['Admission_preop'][0])) \
works great
now, if i chenge values in the GUI, i want them to be updated in the csv.
I started like this:
def onClickedSaveReturn(self):
"""closes GUI and returns to calling (main) GUI"""
df_general = Clean.get_GeneralData()
df_subj = {k: '' for k in Content.extract_saved_data(self.date).keys()} # extract empty dictionary
df_subj['ID'] = General.read_current_subj().id[0]
df_subj['PID'] = df_general['PID_ORBIS'][0]
df_subj['Gender'] = df_general['Gender'][0]
df_subj['Diagnosis_preop'] = df_general['diagnosis'][0]
df_subj["First_Diagnosed_preop"] = self.lineEditFirstDiagnosed.text()
df_subj['Admission_preop'] = self.lineEditAdmNeurIndCheck.text()
df_subj['Dismissal_preop'] = self.DismNeurIndCheckLabel.text()
and this is what my boss added now:
subj_id = General.read_current_subj().id[0] # reads data from curent_subj (saved in ./tmp)
df = General.import_dataframe('{}.csv'.format(self.date), separator_csv=',')
if df.shape[1] == 1:
df = General.import_dataframe('{}.csv'.format(self.date), separator_csv=';')
idx2replace = df.index[df['ID'] == subj_id][0]
# TODO: you need to find a way to turn the dictionaryy df_subj into a dataframe and replace the data at
# the index idxreplace of 'df' with df_subj. Later I would suggest to use line 322 to save everything to the
# file
df.iloc[idx2replace] = pds.DataFrame([df_subj])
df.to_csv("preoperative.csv", index=False)
# df.to_csv(os.path.join(FILEDIR, "preoperative.csv"), index=False)
self.close()
I'm not really sure how to approach this, or to be honest, what to do at all.
Hope someone can help me.
Thank youu
You should load the file only once and keep the DF (self.df or something). Then display it and every time the user changes a value in the GUI the DF should update and when the user clicks save you should just overwrite the existing file with the current DF in memory.
I'm trying to create in Python what a macro does in SAS. I have a list of over 1K tickers that I'm trying to download information for but doing all of them in one step made python crash so I split up the data into 11 portions. Below is the code we're working with:
t0=t.time()
printcounter=0
for ticker in tickers1:
printcounter+=1
print(printcounter)
try:
selected = yf.Ticker(ticker)
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info_1 = company_info_1.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info_1 = company_info_1.append(comb)
print("total run time:", round(t.time()-t0,3),"s")
What I'd like to do is instead of re-writing and running this code for all 11 portions of data and manually changing "tickers1" and "company_info_1" to "tickers2" "company_info_2" "tickers3" "company_info_3" (and so on)... I'd like to see if there is a way to make a python version of a SAS macro/call so that I can get this data more dynamically. Is there a way to do this in python?
You need to generalize your existing code and wrap it in a function.
def comany_info(tickers):
for ticker in tickers:
try:
selected = yf.Ticker(ticker) # you may also have to pass the yf object
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info = company_info.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info = company_info.append(comb)
return company_info # return the dataframe
Create a master dataframe to collect your results from the function call. Loop over the 11 groups of tickers passing each group into your function. Append the results to your master.
# master df to collect results
master = pd.DataFrame()
# assuming you have your tickers in a list of lists
# loop over each of the 11 groups of tickers
for tickers in groups_of_tickers:
df = company_info(tickers) # fetch data from Yahoo Finance
master = master.append(df))
Please note I typed this on the fly. I have no way of testing this. I'm quite sure there are syntactical issues to work through. Hopefully it provides a framework for how to think about the solution.
I am trying to retrieve all journals that exist within the a subject area of Scopus, say 'Medicine', using the python package pybliometrics.
According to the Scopus search (online), there are 13,477 Journals in this category.
Accessing the SerialTitle API of Scopus via pybliometrics.scopus.SerialSearch() for category Medicine, the subjArea='MEDI' and subjCode='2700'. The list of all codes associated with the Scopus subject categories are listed here
I am not able to get more than 5000 journals. But with parameter subjArea='MEDI' I am able to retrieve 5000+ documents but not more than 10,000.
I do not understand why searching with subjArea and subjCode fetches different results for me. Can anyone help me understand why this could be happening?
I am adding my code for both these search queries for better understanding:
import pandas as pd
from pybliometrics.scopus import SerialSearch
def search_by_subject_area(subject_area):
print("Searching journals by subject area....")
df = pd.DataFrame()
i = 0
# limitation of i<10000 is added otherwise raises error of scopus500
while (i > -1 and i < 10000):
s = SerialSearch(query={"subj": f"{str(subject_area)}"}, start=f'{i}', refresh=True)
if s.get_results_size() == 0:
break
else:
i += s.get_results_size()
df_new = pd.DataFrame(s.results)
df = pd.concat([df, df_new], axis=0, ignore_index=True)
print(i, " journals obtained!")
def search_by_subject_code(code):
print("------------------------------------------------\n Searching journals by subject codes....")
df = pd.DataFrame()
i = 0
while (i > -1):
s = SerialSearch(query={"subjCode": f"{code}"}, start=f'{i}', refresh=True)
if s.get_results_size() == 0:
break
else:
i += s.get_results_size()
df_new = pd.DataFrame(s.results)
df = pd.concat([df, df_new], axis=0, ignore_index=True)
print(i, " journals obtained!")
if __name__ == '__main__':
search_by_subject_area(subject_area = 'MEDI')
search_by_subject_code('2700')
Certain Scopus APIs, including the Serial Search API, are restricted: They do not allow more than 5,000 results.
There are some Search APIs that have pagination active, where they allow you to cycle through a potentially unlimited number of results.
I've a list of BQ tables that I'd like to use one at a time. The purpose is to process each table individually, perform some action (in my example, score the dataset for a previously fitted model), then compute, append, and save the probabilities in the all scores list.
Here's a screenshot of the entire code snippet.
# List of BQ Table
scoring_tables = ["`Customer_Analytics.HS_AF_PROPERTY_ALL_DATA_SCORE_PV_CLEANED_01`",
"`Customer_Analytics.HS_AF_PROPERTY_ALL_DATA_SCORE_PV_CLEANED_02`",
"`Customer_Analytics.HS_AF_PROPERTY_ALL_DATA_SCORE_PV_CLEANED_03`",
"`Customer_Analytics.HS_AF_PROPERTY_ALL_DATA_SCORE_PV_CLEANED_04`",
"`Customer_Analytics.HS_AF_PROPERTY_ALL_DATA_SCORE_PV_CLEANED_05`"]
# List to store probabilities/scores
all_scores = []
# Loop through each BQ tables, calculate, append and store the probabilities in the all_score = []
for t in scoring_tables:
%%bigquery property_data_score_00
SELECT * FROM t
score_data = property_data_score_00.copy()
score_data.set_index('HH_ID',inplace = True)
# Fixing for HOUSE_INCOME = 0 & AGE = 0 based on means
score_data['HOUSE_INCOME'] = np.where(score_data['HOUSE_INCOME']==0,107,score_data['HOUSE_INCOME'])
score_data['AGE'] = np.where(score_data['AGE']==0,54,score_data['AGE'])
# recategorize PROP_EXTR_WALL_TYPE | PROP_GRG_TYPE | PROP_ROOF_TYPE
condition = [(score_data['PROP_EXTR_WALL_TYPE'].str.contains("BRICK")),
(score_data['PROP_EXTR_WALL_TYPE'].str.contains("WOOD")),
(score_data['PROP_EXTR_WALL_TYPE'].str.contains("CONCRETE")),
(score_data['PROP_EXTR_WALL_TYPE'].str.contains("METAL")),
(score_data['PROP_EXTR_WALL_TYPE'].str.contains("STEEL"))]
choice = ["BRICK","WOOD","CONCRETE","METAL","METAL"]
score_data['PROP_EXTR_WALL_TYPE_MOD'] = np.select(condition,choice,default="OTHERS")
condition = [(score_data['PROP_GRG_TYPE'].str.contains("ATTACHED")),
(score_data['PROP_GRG_TYPE'].str.contains("DETACHED")),
(score_data['PROP_GRG_TYPE'].str.contains("CARPORT")),
(score_data['PROP_GRG_TYPE'].str.contains("BASEMENT"))]
choice = ["ATTACHED","DETACHED","CARPORT","BASEMENT"]
score_data['PROP_GRG_TYPE_MOD'] = np.select(condition,choice,default="OTHERS")
condition = [(score_data['PROP_ROOF_TYPE'].str.contains("GABLE")),
(score_data['PROP_ROOF_TYPE'].str.contains("HIP")),
(score_data['PROP_ROOF_TYPE'].str.contains("GAMBREL"))]
choice = ["GABLE","HIP","GAMBREL"]
score_data['PROP_ROOF_TYPE_MOD'] = np.select(condition,choice,default="OTHERS")
# one-hot encoding
to_encode = ["IND_ETHNICITY","IND_GENDER","IND_MOVERS_FLAG","IND_OCCUPATION","IND_REGION","PROP_EXTR_WALL_TYPE","PROP_GRG_TYPE","PROP_ROOF_TYPE","TSI"]
score_data_dm = pd.get_dummies(data = score_data, columns = to_encode, drop_first = False)
columns_not_in_score_data_dm = [c for c in train_X.columns if c not in score_data_dm.columns] #columns which might not get produced during pd.get_dummies(data = score_data....), if categories are not available in score data
score_data_dm[columns_not_in_score_data] = 0 #initializing above columns as 0
score_data_dm_filt = score_data_dm[select_columns] # making sure to select only the columns which are in the train_X
y_pred = xgb_prop_PV.predict_proba(score_data_dm_filt)[:,1] #final scoring
all_scores = all_scores + y_pred
Inside the looping, I'm having trouble with the SELECT * FROM t step. The error is shown below. I believe the indent within the loop is causing the %% bigquery step to fail. I looked at itertools here, however it appears that it is only useful when conditional looping is there, which is not the case in my situation.
Also, this appears to be a complex approach; is there a more elegant solution? Because the table was too large (600GB), it needed to be split into smaller datasets, so we tried this method. PS: It works without the loop if run for one table at a time. But its quite a manual effort.
Thanks,
Piyush
To answer the issue of the error message. If you are going to pass a variable in to a query using the bigquery magics command you need to use the params flag.
Your code should end up looking something like this:
t="table_name"
my_params = {"t": t}
%%bigquery --params $my_params
select #t
You may consider and try below approach.
Instead of using BigQuery magics, you may use BiQuery Client Library for Python. From there, you may loop to your list of tables by using f string as shown on below sample code.
from google.cloud import bigquery
bqclient = bigquery.Client()
scoring_tables = ["`your-project-id.your-dataset.test_table1`",
"`your-project-id.your-dataset.test_table2`","`your-project-id.your-dataset.test_table3`"]
for t in scoring_tables:
# Download query results.
query_string = f"""
SELECT * FROM {t}
"""
property_data_score_00 = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
score_data = property_data_score_00.copy()
score_data.set_index('HH_ID',inplace = True)
print(score_data)
Sample output of above code:
I'm trying to run a script (API to google search console) over a table of keywords and dates in order to check if there was improvement in keyword performance (SEO) after the date.
Since i'm really clueless im guessing and trying but Jupiter notebook isn't responding so i can't even tell if im wrong...
This git was made by Josh Carty
the git from which i took this code is:
https://github.com/joshcarty/google-searchconsole
Already pd.read_csv the input table (consist of two columns 'keyword' and 'date'),
made the columns into two separate lists (or maybe it better to use dictionary/other?):
KW_list and
Date_list
I tried:
for i in KW_list and j in Date_list:
for i in KW_list and j in Date_list:
account = searchconsole.authenticate(client_config='client_secrets.json',
credentials='credentials.json')
webproperty = account['https://www.example.com/']
report = webproperty.query.range(j, days=-30).filter('query', i, 'contains').get()
report2 = webproperty.query.range(j, days=30).filter('query', i, 'contains').get()
df = pd.DataFrame(report)
df2 = pd.DataFrame(report2)
df
Expect to see the data frame of all the different keywords (keyowrd1-stat1 , keyword2 - stats2 below, etc. [no overwrite]) at the dates 30 days before the date in the neighbor cell (in the input file)
or at least some respond from J.notebook so i will know what is going on.
Try using the zip function to combine the lists into a list of tuples. This way, the date and the corresponding keyword are combined.
account = searchconsole.authenticate(client_config='client_secrets.json', credentials='credentials.json')
webproperty = account['https://www.example.com/']
df1 = None
df2 = None
first = True
for (keyword, date) in zip(KW_list, Date_list):
report = webproperty.query.range(date, days=-30).filter('query', keyword, 'contains').get()
report2 = webproperty.query.range(date, days=30).filter('query', keyword, 'contains').get()
if first:
df1 = pd.DataFrame(report)
df2 = pd.DataFrame(report2)
first = False
else:
df1 = df1.append(pd.DataFrame(report))
df2 = df2.append(pd.DataFrame(report2))