How to get latest repository through GitHub API with Python? - python

I would like to create app, that will show lately updated repository at somebody's (organisation) GitHub account.
I tried PyGitHub, I tried json in many ways (with parameters, different iterations, relating to keys) but with no result.
Can somebody help?
from github import Github
import requests
import json
parameters = {"sort": 'pushed'}
r = requests.get("https://api.github.com/users/:github_user_name:/repos.json", params=parameters)
resp = r.json()
for item in resp['updated_at']:
print(item['updated_at'])

You can call the updated_at method on a repository and store the date for a comparison the next time you check if the repo was updated.
Getting the date of the last update:
from github import Github
g = Github('username', 'password')
repo = g.get_repo('CouchPotato/CouchPotatoServer') # full repo name or id
date = repo.updated_at
date = 2017-05-02 13:48:58
then you just need to:
1. store the date associated with the repository
2. call the function once every X hours or whatever interval you choose.
3. then compare the stored date with the new date.

Related

Export public Google Calendar events to csv (Python)

There exists a public Google calendar whose calendar ID I have. I would like to set a start and end date and then get all the events between these dates in a CSV file. What is the most efficient way to do this in Python?
The closest solution I have found is https://stackoverflow.com/a/27213635/1936752 which yields a json file and does not have the filtering by date. I could do this with the json file and writing some code to filter only the dates I want and then export to csv, but I guess there is a smarter way?
The manual way of doing what I want is to download the ics file using the "Export Calendar" function and then using an ics to csv converted like https://openicsfile.com/csv-convert.html. I can then filter easily the dates I want. I wish to do exactly this but using a Python script.
I believe your goal is as follows.
You want to retrieve the events from a publicly shared Google Calendar.
You want to retrieve the event title and the date as a CSV file.
You want to achieve this using python.
In this case, how about the following sample script?
Sample script:
In this sample script, the event list is retrieved using "Events: list" of Calendar API with API key. So, please retrieve your API key. Ref And, please enable Calendar API at the API console.
And, please set the variables of the following sample script.
import csv
from googleapiclient.discovery import build
api_key = "###" # Please set your API key.
calendar_id = "###" # Please set the calendar ID.
start = "2022-10-01T00:00:00Z" # Please set the start date you want to search.
end = "2022-10-31T00:00:00Z" # Please set the end date you want to search.
# Retrieve event list using googleapis for python.
service = build("calendar", "v3", developerKey=api_key)
events_result = service.events().list(calendarId=calendar_id, timeMin=start, timeMax=end, fields="items(summary,start,end)", timeZone="UTC").execute()
# Retrieve the event title and date from all-day events.
allDayEvents = [["Event title", "Date"], *[[e.get("summary"), e.get("start").get("date")] for e in events_result.get("items", []) if e.get("start").get("date") and e.get("end").get("date")]]
# Output the retrieved values as a CSV file.
with open("sample.csv", "w") as f:
writer = csv.writer(f, lineterminator="\n")
writer.writerows(allDayEvents)
When this script is run, the event list is retrieved from the publicly shared Google Calendar using an API key. And, a CSV file is created by including the event title and the date. In this sample, the all-day events from October 1, 2022 to October 31, 2022 are retrieved.
You can see about the googleapis for python at here and here.
References:
Authenticate using API keys
Events: list

Query S3 from Python

I am using python to send a query to Athena and get table DDL. I am using start_query_execution and get_query_execution functions in the awswrangler package.
import boto3
import awswrangler as wr
import time
import pandas as pd
boto3.setup_default_session(region_name="us-east-1")
sql="show create table 'table-name'"
query_exec_id = wr.athena.start_query_execution(sql=sql, database='database-name')
time.sleep(20)
res=wr.athena.get_query_execution(query_execution_id=query_exec_id)
The code above creates a dict object that stores query results in an s3 link.
The link can be accessed by
res['ResultConfiguration']['OutputLocation']. It's a text link: s3://.....txt
Can someone help me figure how to access the output in the link. I tried using readlines() but it seemes to error out.
Here is what I did
import urllib3
target_url = res['ResultConfiguration']['OutputLocation']
f = urllib3.urlopen(target_url)
for l in f.readlines():
print (l)
Or if someone can suggest an easier way to get table DDL in python.
Keep in mind that the returned link will time out after a short while... and make sure your credentials allow you to get the data from the URL specified. If you drop the error message here we can help you better. –
Oh... "It's a text link: s3://.....txt" is not a standard URL. You cant read that with urllib3. You can use awswrangler to read the bucket. –
I think the form is
wr.s3.read_fwf(...)

how to make these 2 JSON files communicate?

I have this python code where "Employee" is a variable.
Whenever I search for a company in my first JSON file, it gives me the responsible employee and assign it's name to "EMPLOYEE" variable.
In another JSON file, i have a list of the employees with their address and emails.
What I want is, whenever an employee is fetched using the first JSON file, I would want the second file to get on board, and pulls his email + home address.
Context:
Context: A company wants an appointment, so we check who's free to assign this company a calendar slot.
Users Job : insert a date, time, and company name.
Purpose of the code: each company in my file has 3 assigned employees to it, by order (1st JSON file). The code will check the first one, then check in my google calendar if he's busy, if he is, it will check the second one...and so on.
# Reads the json data
with open('convertcsv.json') as json_file:
data = json.load(json_file)
employeesChosen = []
event_email = 'abc#abc.com'
event_start = '2020-05-9T13:00:00'
event_end = '2020-05-09T15:00:00'
employeeInsert = False
# Adds all the current employees for the company picked
for i in range(len(data)):
if data[i]['name_enterprise'] == event_fabricant:
employeesChosen.append(data[i]['employee1'])
employeesChosen.append(data[i]['employee2'])
employeesChosen.append(data[i]['employee3'])
location = data[i]['location']
print("Employees found")
break
If you want a data change trigger some other action, you'll need command/script. If you have them in a git repo, you could use a kind of git-hooks. Or any cron jobs to check the content of the first file and do something out of the second file (e.g. getting the email and address).

Too many open files: '/home/USER/PATH/SERVICE_ACCOUNT.json' when calling Google's Natural Language API

I'm working on a Sentiment Analysis project using the Google Cloud Natural Language API and Python, this question might be similar to this other question, what I'm doing is the following:
Reads a CSV file from Google Cloud Storage, file has approximately 7000 records.
Converts the CSV into a Pandas DataFrame.
Iterates over the dataframe and calls the Natural Language API to perform sentiment analysis on one of the dataframe's columns, on the same for loop I extract the score and magnitude from the result and add those values to a new column on the dataframe.
Store the result dataframe back to GCS.
I'll put my code below, but prior to that, I just want to mention that I have tested it with a sample CSV with less than 100 records and it works well, I am also aware about the quota limit of 600 requests per minute, reason why I put a delay on each iteration, still, I'm getting the error I specify at the title.
I'm also aware about the suggestion of increasing the ulimit, but I don't think that's a good solution.
Here's my code:
from google.cloud import language_v1
from google.cloud.language_v1 import enums
from google.cloud import storage
from time import sleep
import pandas
import sys
pandas.options.mode.chained_assignment = None
def parse_csv_from_gcs(csv_file):
df = pandas.read_csv(f, encoding = "ISO-8859-1")
return df
def analyze_sentiment(text_content):
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = 'es'
document = {"content": text_content, "type": type_, "language": language}
encoding_type = enums.EncodingType.UTF8
response = client.analyze_sentiment(document, encoding_type=encoding_type)
return response
gcs_path = sys.argv[1]
output_bucket = sys.argv[2]
output_csv_file = sys.argv[3]
dataframe = parse_csv_from_gcs(gcs_path)
for i in dataframe.index:
print(i)
response = analyze_sentiment(dataframe.at[i, 'FieldOfInterest'])
dataframe.at[i, 'Score'] = response.document_sentiment.score
dataframe.at[i, 'Magnitude'] = response.document_sentiment.magnitude
sleep(0.5)
print(dataframe)
dataframe.to_csv("results.csv", encoding = 'ISO-8859-1')
gcs = storage.Client()
gcs.get_bucket(output_bucket).blob(output_csv_file).upload_from_filename('results.csv', content_type='text/csv')
The 'analyze_sentiment' function is very similar to what we have in Google's documentation, I just modified it a little, but it does pretty much the same thing.
Now, the program is raising that error and crashes when it reaches a record between 550 and 700, but I don't see the correlation between the service account JSON and calling the Natural Language API, so I also think that when I call the the API, it opens the account credential JSON file but doesn't close it afterwards.
I'm currently stuck with this issue and ran out of ideas, so any help will be much appreciated, thanks in advance =)!
[UPDATE]
I've solved this issue by extracting the 'client' out of the 'analyze_sentiment' method and passing it as a parameter, as follows:
def analyze_sentiment(ext_content, client):
<Code>
Looks like every time it reaches this line:
client = language_v1.languageServiceClient()
It opens the account credential JSON file and it doesn't get closed,
so extracting it to a global variable made this work =).
I've updated the original post with the solution for this, but in any case, thanks to everyone that saw this and tried to reply =)!

Mapping Resolution & fixVersion received via Python JIRA REST API to human-readable values

I need to pull information on a long list of JIRA issues that live in a CSV file. I'm using the JIRA REST API in Python in a small script to see what kind of data I can expect to retrieve:
#!/usr/bin/python
import csv
import sys
from jira.client import JIRA
*...redacted*
csvfile = list(csv.reader(open(sys.argv[1])))
for row in csvfile:
r = str(row).strip("'[]'")
i = jira.issue(r)
print i.id,i.fields.summary,i.fields.fixVersions,i.fields.resolution,i.fields.resolutiondate
The ID (Key), Summary, and Resolution dates are human-readable as expected. The fixVersions and Resolution fields are resources as follows:
[<jira.resources.Version object at 0x105096b11>], <jira.resources.Resolution object at 0x105096d91>
How do I use the API to get the set of available fixVersions and Resolutions, so that I can populate this correctly in my output CSV?
I understand how JIRA stores these values, but the documentation on the jira-python code doesn't explain how to harness it to grab those base values. I'd be happy to just snag the available fixVersion and Resolution values globally, but the resource info I receive doesn't map to them in an obvious way.
You can use fixVersion.name and resolution.name to get the string versions of those values.
User mdoar answered this question in his comment:
How about using version.name and resolution.name?

Categories

Resources