i want to clone a repository, change a file and push these changed file back to the origin branch.
I can clone the repo with
repo = pygit2.clone_repository(repo_url, local_dir, checkout_branch="test_it")
but what do i need to do now to push the changes to the remote? I want only commit the changes for one specific file, even if more files are changed.
Hope someone can help me. TIA
First stage only file_path:
# stage 'file_path'
index = repository.index
index.add(file_path)
index.write()
Then do a commit:
# commit data
reference='refs/HEAD'
message = '...some commit message...'
tree = index.write_tree()
author = pygit2.Signature(user_name, user_mail)
commiter = pygit2.Signature(user_name, user_mail)
oid = repository.create_commit(reference, author, commiter, message, tree, [repository.head.get_object().hex])
and last push the repo as described in Unable to ssh push in pygit2
Related
I am using the following code to get the bug fixing commits in a list of GitHub repositories.
def get_commit_bug_fixing(self):
EMPTY_TREE_SHA = "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
detected_sml = []
selected_projects_by_version = pd.DataFrame()
curr_path = ''
bug_fixing_commits = self.selected_projects_commits[self.selected_projects_commits['is_bug_fixing'] == True]
for _, row in bug_fixing_commits.iterrows():
if row['path'] != curr_path:
curr_path = row['path']
g_repo = git.Git(curr_path)
g_repo.init()
if row['parent_sha'] == EMPTY_TREE_SHA:
continue
g_repo.checkout(row['parent_sha'])
if row['old_object']:
detected_sml += util.compute_file_metrics(row['path'], row['old_object'], row['parent_sha'])
else:
detected_sml += util.compute_file_metrics(row['path'], row['object'], row['parent_sha'])
sml_dict = dict(Counter([sml[4] for sml in detected_sml]))
pre_dict = {}
pre_dict['sml_name'] = list(sml_dict.keys())
pre_dict['sml_occs'] = list(sml_dict.values())
df = pd.DataFrame(pre_dict)
df['repoName'] = row['repoName']
df['repoOrg'] = row['repoOrg']
df['tag_commit_sha'] = row['tag_commit_sha']
df['tag'] = row['tag']
selected_projects_by_version = pd.concat([selected_projects_by_version, df])
selected_projects_by_version.to_csv('bug_fixing_commits.csv', index=False)
However, after getting to the g_repo.checkout(row['parent_sha']) line I get the following error:
git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
cmdline: git checkout 6ada1a2e125f9b40bc38f9d6b69c60e4fe3b7f4e
stderr: 'error: Your local changes to the following files would be overwritten by checkout:
...list of files path...
Please commit your changes or stash them before you switch branches.
Aborting'
How can I resolve this issue? I am not sure what am I doing wrong?
Looking at the error message, it seems as your are trying to checkout another commit while your current local repository has some uncommitted modifications.
The problem is not your code but your code algorithm. Your git operations are missing some steps.
Before doing the line
g_repo.checkout(row['parent_sha'])
You may verify if your repository is clean by using
git status
and if the repository is clean you may make the checkout otherwise, commit (to save) or discard (to delete) or stash (to save for later) the modifications in the repository before doing any checkout.
N.B: Even manually, generally, if you have uncommitted changes in your repository, git refused that you checkout another branch or commit.
I've seen the topic of commiting using PyGithub in many other questions here, but none of them helped me, I didn't understood the solutions, I guess I'm too newbie.
I simply want to commit a file from my computer to a test github repository that I created. So far I'm testing with a Google Collab notebook.
This is my code, questions and problems are in the comments:
from github import Github
user = '***'
password = '***'
g = Github(user, password)
user = g.get_user()
# created a test repository
repo = user.create_repo('test')
# problem here, ask for an argument 'sha', what is this?
tree = repo.get_git_tree(???)
file = 'content/echo.py'
# since I didn't got the tree, this also goes wrong
repo.create_git_commit('test', tree, file)
The sha is a 40-character checksum hash that functions as a unique identifier to the commit ID that you want to fetch (sha is used to identify each other Git Objects as well).
From the docs:
Each object is uniquely identified by a binary SHA1 hash, being 20 bytes in size, or 40 bytes in hexadecimal notation.
Git only knows 4 distinct object types being Blobs, Trees, Commits and Tags.
The head commit sha is accessible via:
headcommit = repo.head.commit
headcommit_sha = headcommit.hexsha
Or master branch commit is accessible via:
branch = repo.get_branch("master")
master_commit = branch.commit
You can see all your existing branches via:
for branch in user.repo.get_branches():
print(f'{branch.name}')
You can also view the sha of the branch you'd like in the repository you want to fetch.
The get_git_tree takes the given sha identifier and returns a github.GitTree.GitTree, from the docs:
Git tree object creates the hierarchy between files in a Git repository
You'll find a lot of more interesting information in the docs tutorial.
Code for repository creation and to commit a new file in it on Google CoLab:
!pip install pygithub
from github import Github
user = '****'
password = '****'
g = Github(user, password)
user = g.get_user()
repo_name = 'test'
# Check if repo non existant
if repo_name not in [r.name for r in user.get_repos()]:
# Create repository
user.create_repo(repo_name)
# Get repository
repo = user.get_repo(repo_name)
# File details
file_name = 'echo.py'
file_content = 'print("echo")'
# Create file
repo.create_file(file_name, 'commit', file_content)
When I run git log, I get the this line for each commit: "Author: name < email >". How do I get the exact same format for a commit in Python for a local repo? When I run the code below, I get just the author name.
from git import Repo
repo_path = 'mockito'
repo = Repo(repo_path)
commits_list = list(repo.iter_commits())
for i in range(5):
commit = commits_list[i]
print(commit.hexsha)
print(commit.author)
print(commit.committer)
It seems that gitpython's Commit objects do not have an attribute for the author email.
You can also use gitpython to call git commands directly. You can use the git show command, passing in the commit HASH (from commit.hexsha) and then a --format option that gives you just the author's name and email (you can of course pass other format options you need).
Using plain git:
$ git show -s --format='%an <%ae>' 4e13ccfbde2872c23aec4f105f334c3ae0cb4bf8
me <me#somewhere.com>
Using gitpython to use git directly:
from git import Repo
repo_path = 'myrepo'
repo = Repo(repo_path)
commits_list = list(repo.iter_commits())
for i in range(5):
commit = commits_list[i]
author = repo.git.show("-s", "--format=Author: %an <%ae>", commit.hexsha)
print(author)
According to the gitpython API documentation, a commit object—an instance of class git.objects.commit.Commit—has author and committer attributes that are instances of class git.util.Actor, which in turn has fields conf_email, conf_name, email, and name.
Hence (untested):
print(commit.author.name, commit.author.email)
will likely get you the two fields you want, though you may wish to format them in some way.
Edit: I'll defer to Gino Mempin's answer since I don't have gitpython installed to test this.
I want to use pygit2 to checkout a branch-name.
For example, if I have two branches: master and new and HEAD is at master, I would expect to be able to do:
import pygit2
repository = pygit2.Repository('.git')
repository.checkout('new')
or even
import pygit2
repository = pygit2.Repository('.git')
repository.lookup_branch('new').checkout()
but neither works and the pygit2 docs don't mention how to checkout a branch.
It seems you can do:
import pygit2
repo = pygit2.Repository('.git')
branch = repo.lookup_branch('new')
ref = repo.lookup_reference(branch.name)
repo.checkout(ref)
I had a lot of trouble with this and this is one of the only relevant StackOverflow posts regarding this, so I thought I'd leave a full working example of how to clone a repo from Github and checkout the specified branch.
def clone_repo(clone_url, clone_path, branch, auth_token):
# Use pygit2 to clone the repo to disk
# if using github app pem key token, use x-access-token like below
# if you were using a personal access token, use auth_method = 'x-oauth-basic' AND reverse the auth_method and token parameters
auth_method = 'x-access-token'
callbacks = pygit2.RemoteCallbacks(pygit2.UserPass(auth_method, auth_token))
pygit2_repo = pygit2.clone_repository(clone_url, clone_path, callbacks=callbacks)
pygit2_branch = pygit2_repo.branches['origin/' + branch]
pygit2_ref = pygit2_repo.lookup_reference(pygit2_branch.name)
pygit2_repo.checkout(pygit2_ref)
I'm working on a non-bare repository with pygit2
index = repo.index
index.read()
# write in test/test.txt
index.add('test/test.txt')
treeid = index.write_tree()
repo.create_commit(
'HEAD',
author, committer,
'test commit',
treeid,
[repo.head.oid]
)
This is successful, but when I perform a git status, I got this :
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: test/test.txt
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# test/
And after a git reset --hard, everything is fixed.
Is there a way to update correctly the index with pygit ?
You're only writing out a tree from your in-memory index and leaving the on-disc index unmodified, so after the commit it is at the same state as it was before you did anything.
You need to write out the index (index.write()) if you want your changes to be stored on disc.