Untracked dirs on commit with pygit2 - python

I'm working on a non-bare repository with pygit2
index = repo.index
index.read()
# write in test/test.txt
index.add('test/test.txt')
treeid = index.write_tree()
repo.create_commit(
'HEAD',
author, committer,
'test commit',
treeid,
[repo.head.oid]
)
This is successful, but when I perform a git status, I got this :
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# deleted: test/test.txt
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# test/
And after a git reset --hard, everything is fixed.
Is there a way to update correctly the index with pygit ?

You're only writing out a tree from your in-memory index and leaving the on-disc index unmodified, so after the commit it is at the same state as it was before you did anything.
You need to write out the index (index.write()) if you want your changes to be stored on disc.

Related

repo.checkout() : Your local changes to the following files would be overwritten by checkout

I am using the following code to get the bug fixing commits in a list of GitHub repositories.
def get_commit_bug_fixing(self):
EMPTY_TREE_SHA = "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
detected_sml = []
selected_projects_by_version = pd.DataFrame()
curr_path = ''
bug_fixing_commits = self.selected_projects_commits[self.selected_projects_commits['is_bug_fixing'] == True]
for _, row in bug_fixing_commits.iterrows():
if row['path'] != curr_path:
curr_path = row['path']
g_repo = git.Git(curr_path)
g_repo.init()
if row['parent_sha'] == EMPTY_TREE_SHA:
continue
g_repo.checkout(row['parent_sha'])
if row['old_object']:
detected_sml += util.compute_file_metrics(row['path'], row['old_object'], row['parent_sha'])
else:
detected_sml += util.compute_file_metrics(row['path'], row['object'], row['parent_sha'])
sml_dict = dict(Counter([sml[4] for sml in detected_sml]))
pre_dict = {}
pre_dict['sml_name'] = list(sml_dict.keys())
pre_dict['sml_occs'] = list(sml_dict.values())
df = pd.DataFrame(pre_dict)
df['repoName'] = row['repoName']
df['repoOrg'] = row['repoOrg']
df['tag_commit_sha'] = row['tag_commit_sha']
df['tag'] = row['tag']
selected_projects_by_version = pd.concat([selected_projects_by_version, df])
selected_projects_by_version.to_csv('bug_fixing_commits.csv', index=False)
However, after getting to the g_repo.checkout(row['parent_sha']) line I get the following error:
git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
cmdline: git checkout 6ada1a2e125f9b40bc38f9d6b69c60e4fe3b7f4e
stderr: 'error: Your local changes to the following files would be overwritten by checkout:
...list of files path...
Please commit your changes or stash them before you switch branches.
Aborting'
How can I resolve this issue? I am not sure what am I doing wrong?
Looking at the error message, it seems as your are trying to checkout another commit while your current local repository has some uncommitted modifications.
The problem is not your code but your code algorithm. Your git operations are missing some steps.
Before doing the line
g_repo.checkout(row['parent_sha'])
You may verify if your repository is clean by using
git status
and if the repository is clean you may make the checkout otherwise, commit (to save) or discard (to delete) or stash (to save for later) the modifications in the repository before doing any checkout.
N.B: Even manually, generally, if you have uncommitted changes in your repository, git refused that you checkout another branch or commit.

Creating a repository and commiting a file with PyGithub

I've seen the topic of commiting using PyGithub in many other questions here, but none of them helped me, I didn't understood the solutions, I guess I'm too newbie.
I simply want to commit a file from my computer to a test github repository that I created. So far I'm testing with a Google Collab notebook.
This is my code, questions and problems are in the comments:
from github import Github
user = '***'
password = '***'
g = Github(user, password)
user = g.get_user()
# created a test repository
repo = user.create_repo('test')
# problem here, ask for an argument 'sha', what is this?
tree = repo.get_git_tree(???)
file = 'content/echo.py'
# since I didn't got the tree, this also goes wrong
repo.create_git_commit('test', tree, file)
The sha is a 40-character checksum hash that functions as a unique identifier to the commit ID that you want to fetch (sha is used to identify each other Git Objects as well).
From the docs:
Each object is uniquely identified by a binary SHA1 hash, being 20 bytes in size, or 40 bytes in hexadecimal notation.
Git only knows 4 distinct object types being Blobs, Trees, Commits and Tags.
The head commit sha is accessible via:
headcommit = repo.head.commit
headcommit_sha = headcommit.hexsha
Or master branch commit is accessible via:
branch = repo.get_branch("master")
master_commit = branch.commit
You can see all your existing branches via:
for branch in user.repo.get_branches():
print(f'{branch.name}')
You can also view the sha of the branch you'd like in the repository you want to fetch.
The get_git_tree takes the given sha identifier and returns a github.GitTree.GitTree, from the docs:
Git tree object creates the hierarchy between files in a Git repository
You'll find a lot of more interesting information in the docs tutorial.
Code for repository creation and to commit a new file in it on Google CoLab:
!pip install pygithub
from github import Github
user = '****'
password = '****'
g = Github(user, password)
user = g.get_user()
repo_name = 'test'
# Check if repo non existant
if repo_name not in [r.name for r in user.get_repos()]:
# Create repository
user.create_repo(repo_name)
# Get repository
repo = user.get_repo(repo_name)
# File details
file_name = 'echo.py'
file_content = 'print("echo")'
# Create file
repo.create_file(file_name, 'commit', file_content)

GitPython get all commits in range between start sha1 and end sha1

I am using the GitPython library and was wondering how to get all commits on a branch in range of two commit sha-1's. I have the start one and end one. Is there any way to get list of them?
I have instantiated the repo object and was wondering if there was a way to query it and obtain a list of commits in the range of two shas?
Would be looking to do something similar to this command but return them as a list:
git log e0d8a4c3fec7ef2c352342c2ffada21fa07c1dc..63af686e626e0a5cbb0508367983765154e188ce --pretty=format:%h,%an,%s > commits.csv
Seems like there is Repo.iter_commits() method but can't see how to specify a range.
#shlomi33 here was my solution:
This is the method that I used to set a List of GitPyhton Commit Object. Within the class I have already instantiated the Repo object which this method relies on.
Edit: have also added the init method that is used to set up the Repo object instance.
def __init__(self, repo_url, project_name, branch_name) -> None:
"""
Clone the required repo in the repo folder and instantiate the Repo instance
:param repo_url: str of the git repository url
:param project_name: str of the project name
:param branch_name: str of the branch name
"""
self._repo_url = repo_url
self._project = project_name
self._branch = branch_name
self._project_dir = Path(self._repo_dir, self._project)
if self._project_dir.exists():
self._logger.info('Deleting existing repo: ' + str(self._project_dir))
shutil.rmtree(self._project_dir)
self._logger.info('Creating directory for repo: ' + str(self._project_dir))
os.makedirs(self._project_dir)
try:
self.repo = GitRepo.clone_from(self._repo_url, self._project_dir, branch=self._branch,
progress=self.__progress)
except GitCommandError:
self._logger.error('Failed to clone repo. Make sure you have git ssh access set up.')
raise
self._logger.info('Repository downloaded to: ' + str(self._repo_dir))
def set_commits(self, start_rev: str, end_rev: str):
"""
Sets a list of Commits for the range of the start rev and end rev
:param start_rev: Start commit revision SHA1
:param end_rev: End commit revision
"""
self._start_rev = start_rev
self._end_rev = end_rev
commits = self.repo.iter_commits(start_rev + '..' + end_rev)
self._commits = list(map(Commit, list(commits)))

How do i push to remote with pygit2?

i want to clone a repository, change a file and push these changed file back to the origin branch.
I can clone the repo with
repo = pygit2.clone_repository(repo_url, local_dir, checkout_branch="test_it")
but what do i need to do now to push the changes to the remote? I want only commit the changes for one specific file, even if more files are changed.
Hope someone can help me. TIA
First stage only file_path:
# stage 'file_path'
index = repository.index
index.add(file_path)
index.write()
Then do a commit:
# commit data
reference='refs/HEAD'
message = '...some commit message...'
tree = index.write_tree()
author = pygit2.Signature(user_name, user_mail)
commiter = pygit2.Signature(user_name, user_mail)
oid = repository.create_commit(reference, author, commiter, message, tree, [repository.head.get_object().hex])
and last push the repo as described in Unable to ssh push in pygit2

How do you checkout a branch with pygit2?

I want to use pygit2 to checkout a branch-name.
For example, if I have two branches: master and new and HEAD is at master, I would expect to be able to do:
import pygit2
repository = pygit2.Repository('.git')
repository.checkout('new')
or even
import pygit2
repository = pygit2.Repository('.git')
repository.lookup_branch('new').checkout()
but neither works and the pygit2 docs don't mention how to checkout a branch.
It seems you can do:
import pygit2
repo = pygit2.Repository('.git')
branch = repo.lookup_branch('new')
ref = repo.lookup_reference(branch.name)
repo.checkout(ref)
I had a lot of trouble with this and this is one of the only relevant StackOverflow posts regarding this, so I thought I'd leave a full working example of how to clone a repo from Github and checkout the specified branch.
def clone_repo(clone_url, clone_path, branch, auth_token):
# Use pygit2 to clone the repo to disk
# if using github app pem key token, use x-access-token like below
# if you were using a personal access token, use auth_method = 'x-oauth-basic' AND reverse the auth_method and token parameters
auth_method = 'x-access-token'
callbacks = pygit2.RemoteCallbacks(pygit2.UserPass(auth_method, auth_token))
pygit2_repo = pygit2.clone_repository(clone_url, clone_path, callbacks=callbacks)
pygit2_branch = pygit2_repo.branches['origin/' + branch]
pygit2_ref = pygit2_repo.lookup_reference(pygit2_branch.name)
pygit2_repo.checkout(pygit2_ref)

Categories

Resources