I would like to commit a file with a custom date.
So far I've created a Commit object, but I don't understand how to bind it to a repo.
from git import *
repo = Repo('path/to/repo')
comm = Commit(repo=repo, binsha=repo.head.commit.binsha, tree=repo.index.write_tree(), message='Test Message', committed_date=1357020000)
Thanks!
Well I found the solution.
I wasn't aware, but I had to provide all the parameters otherwise the Commit object would throw a BadObject exception when trying to serialize it.
from git import *
from time import (time, altzone)
import datetime
from cStringIO import StringIO
from gitdb import IStream
repo = Repo('path/to/repo')
message = 'Commit message'
tree = repo.index.write_tree()
parents = [ repo.head.commit ]
# Committer and Author
cr = repo.config_reader()
committer = Actor.committer(cr)
author = Actor.author(cr)
# Custom Date
time = int(datetime.date(2013, 1, 1).strftime('%s'))
offset = altzone
author_time, author_offset = time, offset
committer_time, committer_offset = time, offset
# UTF-8 Default
conf_encoding = 'UTF-8'
comm = Commit(repo, Commit.NULL_BIN_SHA, tree,
author, author_time, author_offset,
committer, committer_time, committer_offset,
message, parents, conf_encoding)
After creating the commit object, a proper SHA had to be placed, I didn't have a clue how this was done, but a little bit of research in the guts of GitPython source code got me the answer.
stream = StringIO()
new_commit._serialize(stream)
streamlen = stream.tell()
stream.seek(0)
istream = repo.odb.store(IStream(Commit.type, streamlen, stream))
new_commit.binsha = istream.binsha
Then set the commit as the HEAD commit
repo.head.set_commit(new_commit, logmsg="commit: %s" % message)
A simpler solution, which avoids having to manually create commit objects (And also manually create new indexes, if you wanted to add files too):
from git import Repo, Actor # GitPython
import git.exc as GitExceptions # GitPython
import os # Python Core
# Example values
respository_directory = "."
new_file_path = "MyNewFile"
action_date = str(datetime.date(2013, 1, 1))
message = "I am a commit log! See commit log run. Run! Commit log! Run!"
actor = Actor("Bob", "Bob#McTesterson.dev" )
# Open repository
try:
repo = Repo(respository_directory)
except GitExceptions.InvalidGitRepositoryError:
print "Error: %s isn't a git repo" % respository_directory
sys.exit(5)
# Set some environment variables, the repo.index commit function
# pays attention to these.
os.environ["GIT_AUTHOR_DATE"] = action_date
os.environ["GIT_COMMITTER_DATE"] = action_date
# Add your new file/s
repo.index.add([new_file_path])
# Do the commit thing.
repo.index.commit(message, author=actor, committer=actor)
You can set commit_date when making the commit.
r.index.commit(
"Initial Commit",
commit_date=datetime.date(2020, 7, 21).strftime('%Y-%m-%d %H:%M:%S')
)
I am also using specified date/time for committing purpose. In my case after a long research I have figured out, that you have to give the datetime not really in ISO format as GitPython suggests you should, nor in timestamp format. It is actually making delusion because of its own "parsing" that is should be a standard python format.
Nope. You have to give it in the format that git itself would understand.
For example:
# commit_date is your specified date in datetime format
commit_date_as_text = commit_date.strftime('%Y-%m-%d %H:%M:%S %z')
git_repository.index.commit(commit_message, author=commit_author, commit_date=commit_date_as_text)
List of accepted formats you can find for example in another topic on StackOverflow.
Related
I've seen the topic of commiting using PyGithub in many other questions here, but none of them helped me, I didn't understood the solutions, I guess I'm too newbie.
I simply want to commit a file from my computer to a test github repository that I created. So far I'm testing with a Google Collab notebook.
This is my code, questions and problems are in the comments:
from github import Github
user = '***'
password = '***'
g = Github(user, password)
user = g.get_user()
# created a test repository
repo = user.create_repo('test')
# problem here, ask for an argument 'sha', what is this?
tree = repo.get_git_tree(???)
file = 'content/echo.py'
# since I didn't got the tree, this also goes wrong
repo.create_git_commit('test', tree, file)
The sha is a 40-character checksum hash that functions as a unique identifier to the commit ID that you want to fetch (sha is used to identify each other Git Objects as well).
From the docs:
Each object is uniquely identified by a binary SHA1 hash, being 20 bytes in size, or 40 bytes in hexadecimal notation.
Git only knows 4 distinct object types being Blobs, Trees, Commits and Tags.
The head commit sha is accessible via:
headcommit = repo.head.commit
headcommit_sha = headcommit.hexsha
Or master branch commit is accessible via:
branch = repo.get_branch("master")
master_commit = branch.commit
You can see all your existing branches via:
for branch in user.repo.get_branches():
print(f'{branch.name}')
You can also view the sha of the branch you'd like in the repository you want to fetch.
The get_git_tree takes the given sha identifier and returns a github.GitTree.GitTree, from the docs:
Git tree object creates the hierarchy between files in a Git repository
You'll find a lot of more interesting information in the docs tutorial.
Code for repository creation and to commit a new file in it on Google CoLab:
!pip install pygithub
from github import Github
user = '****'
password = '****'
g = Github(user, password)
user = g.get_user()
repo_name = 'test'
# Check if repo non existant
if repo_name not in [r.name for r in user.get_repos()]:
# Create repository
user.create_repo(repo_name)
# Get repository
repo = user.get_repo(repo_name)
# File details
file_name = 'echo.py'
file_content = 'print("echo")'
# Create file
repo.create_file(file_name, 'commit', file_content)
I want to use the Github API for Python to be able to get every repository and check the last change to the repository.
import git
from git import Repo
from github import Github
repos = []
g = Github('Dextron12', 'password')
for repo in g.get_user().get_repos():
repos.append(str(repo))
#check for last commit to repository HERE
This gets all repository's on my account but I want to be able to also get the last change to each one of them and I want a result like this:
13:46:45
I don't mind if it is 12 hour time either.
According to the documentation, the max info you can get is the SHA of the commit and the commit date:
https://pygithub.readthedocs.io/en/latest/examples/Commit.html#
with your example:
g = Github("usar", "pass")
for repo in g.get_user().get_repos():
master = repo.get_branch("master")
sha_com = master.commit
commit = repo.get_commit(sha=sha_com)
print(commit.commit.author.date)
from github import Github
from datetime import datetime
repos = {}
g = Github('username', 'password')
for repo in g.get_user().get_repos():
master = repo.get_branch('master')
sha_com = master.commit
sha_com = str(sha_com).split('Commit(sha="')
sha_com = sha_com[1].split('")')
sha_com = sha_com[0]
commit = repo.get_commit(sha_com)
#get repository name
repo = str(repo).split('Repository(full_name="Dextron12/')
repo = repo[1].split('")')
#CONVERT DATETIME OBJECT TO STRING
timeObj = commit.commit.author.date
timeStamp = timeObj.strftime("%d-%b-%Y (%H:%M:%S)")
#ADD REPOSITORY NAME AND TIMESTAMP TO repos DICTIONARY
repos[repo[0]] = timeStamp
print(repos)
I got the timeStamp by using the method Damian Lattenero suggested. upon testing his code I got a AssertationError this was because sha_commit was returning Commit=("sha") and not "sha". So I removed the brackets and Commit from the sha_com to be left with the sha all by itslef then I didn't receive that error and it worked. I then use datetime to convert the timestamp to a string and save it to a dictionary
#Dextron Just add .sha because its a property, No need to do split and form dictionary
g = Github("user", "pass")
for repo in g.get_user().get_repos():
master = repo.get_branch("master")
sha_com = master.commit
commit = repo.get_commit(sha=sha_com.sha)
print(commit.commit.author.date)
I've been struggling with this problem for a bit, I am trying to create a program that will create a datetime object based on the current date and time, create a second such object from our file data, find the difference between the two, and if it is greater than 10 minutes search for a "handshake file", which is a file we receive back when our file has successfully loaded. If we don't find that file, I want to kick out an error email.
My problem lies in being able to capture the result of my ls command in a meaningful way where I would be able to parse through it and see if the correct file exists. Here is my code:
"""
This module will check the handshake files sent by Pivot based on the following conventions:
- First handshake file (loaded to the CFL, *auditv2*): Check every half-hour
- Second handshake file (proofs are loaded and available, *handshake*): Check every 2 hours
"""
import smtplib
from email.mime.text import MIMEText
from datetime import datetime, timedelta
from csv import DictReader
from subprocess import *
from os import chdir
from glob import glob
def main():
audit_in = '/prod/bcs/lgnp/clientapp/csvbill/audit_process/lgnp.smr.csv0000.audit.qty'
with open(audit_in, 'rbU') as audit_qty:
my_audit_reader = DictReader(audit_qty, delimiter=';', restkey='ignored')
my_audit_reader.fieldnames = ("Property Code",
"Pivot ID",
"Inwork File",
"Billing Manager E-mail",
"Total Records",
"Number of E-Bills",
"Printed Records",
"File Date",
"Hour",
"Minute",
"Status")
# Get current time to reconcile against
now = datetime.now()
# Change internal directory to location of handshakes
chdir('/prod/bcs/lgnp/input')
for line in my_audit_reader:
piv_id = line['Pivot ID']
status = line['Status']
file_date = datetime(int(line['File Date'][:4]),
int(line['File Date'][4:6]),
int(line['File Date'][6:8]),
int(line['Hour']),
int(line['Minute']))
# print(file_date)
if status == 's':
diff = now - file_date
print diff
print piv_id
if 10 < (diff.seconds / 60) < 30:
proc = Popen('ls -lh *{0}*'.format(status),
shell=True) # figure out how to get output
print proc
def send_email(recipient_list):
msg = MIMEText('Insert message here')
msg['Subject'] = 'Alert!! Handshake files missing!'
msg['From'] = r'xxx#xxx.com'
msg['To'] = recipient_list
s = smtplib.SMTP(r'xxx.xxx.xxx')
s.sendmail(msg['From'], msg['To'], msg.as_string())
s.quit()
if __name__ == '__main__':
main()
To parse ls output is not the best solution here. You can surely do that parsing subprocess.check_output result or in any other way, but let me give you an advice.
It is a good criterion of something going wrong if you find yourself parsing someone's output or logs to solve a standard problem, please consider other solutions, like offered below:
If the only thing you want is to see the contents of the directory use os.listdir like:
my_home_files = os.listdir(os.path.expanduser('~/my_dir')) # surely it's cross-platform
now you have a list of files in your my_home_files variable.
You can filter them in the way you want or use glob.glob to use metacharacters like that:
glob.glob("/home/me/handshake-*.txt") # will output everything matching the expression
# (say you have ids in your filenames).
After that you may want to check some stats of the file (like the date of last access etc.)
consider using os.stat:
os.stat(my_home_files[0]) # outputs stats of the first
# posix.stat_result(st_mode=33104, st_ino=140378115, st_dev=3306L, st_nlink=1, st_uid=23449, st_gid=59216, st_size=1442, st_atime=1421834474, st_mtime=1441831745, st_ctime=1441234474)
# see os.stat linked above to understand how to parse it
I'm trying to copy all my Livejournal posts to my new blog on blogger.com. I do so by using slightly modified example that ships with the gdata python client. I have a json file with all of my posts imported from Livejournal. Issue is that blogger.com has a daily limit for posting new blog entries per day — 50, so you can imagine that 1300+ posts I have will be copied in a month, since I can't programmatically enter captcha after 50 imports.
I recently learned that there's also batch operation mode somewhere in gdata, but I couldn't figure how to use it. Googling didn't really help.
Any advice or help will be highly appreciated.
Thanks.
Update
Just in case, I use the following code
#!/usr/local/bin/python
import json
import requests
from gdata import service
import gdata
import atom
import getopt
import sys
from datetime import datetime as dt
from datetime import timedelta as td
from datetime import tzinfo as tz
import time
allEntries = json.load(open("todays_copy.json", "r"))
class TZ(tz):
def utcoffset(self, dt): return td(hours=-6)
class BloggerExample:
def __init__(self, email, password):
# Authenticate using ClientLogin.
self.service = service.GDataService(email, password)
self.service.source = "Blogger_Python_Sample-1.0"
self.service.service = "blogger"
self.service.server = "www.blogger.com"
self.service.ProgrammaticLogin()
# Get the blog ID for the first blog.
feed = self.service.Get("/feeds/default/blogs")
self_link = feed.entry[0].GetSelfLink()
if self_link:
self.blog_id = self_link.href.split("/")[-1]
def CreatePost(self, title, content, author_name, label, time):
LABEL_SCHEME = "http://www.blogger.com/atom/ns#"
# Create the entry to insert.
entry = gdata.GDataEntry()
entry.author.append(atom.Author(atom.Name(text=author_name)))
entry.title = atom.Title(title_type="xhtml", text=title)
entry.content = atom.Content(content_type="html", text=content)
entry.published = atom.Published(time)
entry.category.append(atom.Category(scheme=LABEL_SCHEME, term=label))
# Ask the service to insert the new entry.
return self.service.Post(entry,
"/feeds/" + self.blog_id + "/posts/default")
def run(self, data):
for year in allEntries:
for month in year["yearlydata"]:
for day in month["monthlydata"]:
for entry in day["daylydata"]:
# print year["year"], month["month"], day["day"], entry["title"].encode("utf-8")
atime = dt.strptime(entry["time"], "%I:%M %p")
hr = atime.hour
mn = atime.minute
ptime = dt(year["year"], int(month["month"]), int(day["day"]), hr, mn, 0, tzinfo=TZ()).isoformat("T")
public_post = self.CreatePost(entry["title"],
entry["content"],
"My name",
",".join(entry["tags"]),
ptime)
print "%s, %s - published, Waiting 30 minutes" % (ptime, entry["title"].encode("utf-8"))
time.sleep(30*60)
def main(data):
email = "my#email.com"
password = "MyPassW0rd"
sample = BloggerExample(email, password)
sample.run(data)
if __name__ == "__main__":
main(allEntries)
I would recommend using Google Blog converters instead ( https://code.google.com/archive/p/google-blog-converters-appengine/ )
To get started you will have to go through
https://github.com/google/gdata-python-client/blob/master/INSTALL.txt - Steps for setting up Google GData API
https://github.com/pra85/google-blog-converters-appengine/blob/master/README.txt - Steps for using Blog Convertors
Once you have everything setup , you would have to run the following command (its the LiveJournal Username and password)
livejournal2blogger.sh -u <username> -p <password> [-s <server>]
Redirect its output into a .xml file. This file can now be imported into a Blogger blog directly by going to Blogger Dashboard , your blog > Settings > Other > Blog tools > Import Blog
Here remember to check the Automatically publish all imported posts and pages option. I have tried this once before with a blog with over 400 posts and Blogger did successfully import & published them without issue
Incase you have doubts the Blogger might have some issues (because the number of posts is quite high) or you have other Blogger blogs in your account. Then just for precaution sake , create a separate Blogger (Google) account and then try importing the posts. After that you can transfer the admin controls to your real Blogger account (To transfer , you will first have to send an author invite , then raise your real Blogger account to admin level and lastly remove the dummy account. Option for sending invite is present at Settings > Basic > Permissions > Blog Authors )
Also make sure that you are using Python 2.5 otherwise these scripts will not run. Before running livejournal2blogger.sh , change the following line (Thanks for Michael Fleet for this fix http://michael.f1337.us/2011/12/28/google-blog-converters-blogger2wordpress/ )
PYTHONPATH=${PROJ_DIR}/lib python ${PROJ_DIR}/src/livejournal2blogger/lj2b.py $*
to
PYTHONPATH=${PROJ_DIR}/lib python2.5 ${PROJ_DIR}/src/livejournal2blogger/lj2b.py $*
P.S. I am aware this is not the answer to your question but as the objective of this answer is same as your question (To import more than 50 posts in a day) , Thats why I shared it. I don't have much knowledge of Python or GData API , I setup the environment & followed these steps to answer this question (And I was able to import posts from LiveJournal to Blogger with it ).
# build feed
request_feed = gdata.base.GBaseItemFeed(atom_id=atom.Id(text='test batch'))
# format each object
entry1 = gdata.base.GBaseItemFromString('--XML for your new item goes here--')
entry1.title.text = 'first batch request item'
entry2 = gdata.base.GBaseItemFromString('--XML for your new item here--')
entry2.title.text = 'second batch request item'
# Add each blog item to the request feed
request_feed.AddInsert(entry1)
request_feed.AddInsert(entry2)
# Execute the batch processes through the request_feed (all items)
result_feed = gd_client.ExecuteBatch(request_feed)
I'm brand new at Python and I'm trying to write an extension to an app that imports GA information and parses it into MySQL. There is a shamfully sparse amount of infomation on the topic. The Google Docs only seem to have examples in JS and Java...
...I have gotten to the point where my user can authenticate into GA using SubAuth. That code is here:
import gdata.service
import gdata.analytics
from django import http
from django import shortcuts
from django.shortcuts import render_to_response
def authorize(request):
next = 'http://localhost:8000/authconfirm'
scope = 'https://www.google.com/analytics/feeds'
secure = False # set secure=True to request secure AuthSub tokens
session = False
auth_sub_url = gdata.service.GenerateAuthSubRequestUrl(next, scope, secure=secure, session=session)
return http.HttpResponseRedirect(auth_sub_url)
So, step next is getting at the data. I have found this library: (beware, UI is offensive) http://gdata-python-client.googlecode.com/svn/trunk/pydocs/gdata.analytics.html
However, I have found it difficult to navigate. It seems like I should be gdata.analytics.AnalyticsDataEntry.getDataEntry(), but I'm not sure what it is asking me to pass it.
I would love a push in the right direction. I feel I've exhausted google looking for a working example.
Thank you!!
EDIT: I have gotten farther, but my problem still isn't solved. The below method returns data (I believe).... the error I get is: "'str' object has no attribute '_BecomeChildElement'" I believe I am returning a feed? However, I don't know how to drill into it. Is there a way for me to inspect this object?
def auth_confirm(request):
gdata_service = gdata.service.GDataService('iSample_acctSample_v1.0')
feedUri='https://www.google.com/analytics/feeds/accounts/default?max-results=50'
# request feed
feed = gdata.analytics.AnalyticsDataFeed(feedUri)
print str(feed)
Maybe this post can help out. Seems like there are not Analytics specific bindings yet, so you are working with the generic gdata.
I've been using GA for a little over a year now and since about April 2009, i have used python bindings supplied in a package called python-googleanalytics by Clint Ecker et al. So far, it works quite well.
Here's where to get it: http://github.com/clintecker/python-googleanalytics.
Install it the usual way.
To use it: First, so that you don't have to manually pass in your login credentials each time you access the API, put them in a config file like so:
[Credentials]
google_account_email = youraccount#gmail.com
google_account_password = yourpassword
Name this file '.pythongoogleanalytics' and put it in your home directory.
And from an interactive prompt type:
from googleanalytics import Connection
import datetime
connection = Connection() # pass in id & pw as strings **if** not in config file
account = connection.get_account(<*your GA profile ID goes here*>)
start_date = datetime.date(2009, 12, 01)
end_data = datetime.date(2009, 12, 13)
# account object does the work, specify what data you want w/
# 'metrics' & 'dimensions'; see 'USAGE.md' file for examples
account.get_data(start_date=start_date, end_date=end_date, metrics=['visits'])
The 'get_account' method will return a python list (in above instance, bound to the variable 'account'), which contains your data.
You need 3 files within the app. client_secrets.json, analytics.dat and google_auth.py.
Create a module Query.py within the app:
class Query(object):
def __init__(self, startdate, enddate, filter, metrics):
self.startdate = startdate.strftime('%Y-%m-%d')
self.enddate = enddate.strftime('%Y-%m-%d')
self.filter = "ga:medium=" + filter
self.metrics = metrics
Example models.py: #has the following function
import google_auth
service = googleauth.initialize_service()
def total_visit(self):
object = AnalyticsData.objects.get(utm_source=self.utm_source)
trial = Query(object.date.startdate, object.date.enddate, object.utm_source, ga:sessions")
result = service.data().ga().get(ids = 'ga:<your-profile-id>', start_date = trial.startdate, end_date = trial.enddate, filters= trial.filter, metrics = trial.metrics).execute()
total_visit = result.get('rows')
<yr save command, ColumnName.object.create(data=total_visit) goes here>