I'm trying to do the equivalent of git fetch -a using the dulwich library within python.
Using the docs at https://www.dulwich.io/docs/tutorial/remote.html I created the following script:
from dulwich.client import LocalGitClient
from dulwich.repo import Repo
import os
home = os.path.expanduser('~')
local_folder = os.path.join(home, 'temp/local'
local = Repo(local_folder)
remote = os.path.join(home, 'temp/remote')
remote_refs = LocalGitClient().fetch(remote, local)
local_refs = LocalGitClient().get_refs(local_folder)
print(remote_refs)
print(local_refs)
with an existing git repository at ~/temp/remote and a newly initialised repo at ~/temp/local
remote_refs shows everything I would expect, but local_refs is an empty dictionary and git branch -a on the local repo returns nothing.
Am I missing something obvious?
This is on dulwich 0.12.0 and Python 3.5
EDIT #1
Following a discussion on the python-uk irc channel, I updated my script to include the use of determine_wants_all:
from dulwich.client import LocalGitClient
from dulwich.repo import Repo
home = os.path.expanduser('~')
local_folder = os.path.join(home, 'temp/local'
local = Repo(local_folder)
remote = os.path.join(home, 'temp/remote')
wants = local.object_store.determine_wants_all
remote_refs = LocalGitClient().fetch(remote, local, wants)
local_refs = LocalGitClient().get_refs(local_folder)
print(remote_refs)
print(local_refs)
but this had no effect :-(
EDIT #2
Again, following discussion on the python-uk irc channel, I tried running dulwich fetch from within the local repo. It gave the same result as my script i.e. the remote refs were printed to the console correctly, but git branch -a showed nothing.
EDIT - Solved
A simple loop to update the local refs did the trick:
from dulwich.client import LocalGitClient
from dulwich.repo import Repo
import os
home = os.path.expanduser('~')
local_folder = os.path.join(home, 'temp/local')
local = Repo(local_folder)
remote = os.path.join(home, 'temp/remote')
remote_refs = LocalGitClient().fetch(remote, local)
for key, value in remote_refs.items():
local.refs[key] = value
local_refs = LocalGitClient().get_refs(local_folder)
print(remote_refs)
print(local_refs)
LocalGitClient.fetch() does not update refs, it just fetches objects and then returns the remote refs so you can use that to update the target repository refs.
Related
I run polyglot sentiment detection. When I upload it to the server I cannot run the downloader.download("TASK:sentiment2") command, so I downloaded the sentiment2 folder and saved it in the same folder as the python file.
I tried to set downloader.download_dir = os.path.join(os.getcwd(),'polyglot_data') pointing at the sentiment2 folder location as it says in the polyglot documentation but it doesnt work.
How do I override downloader directory so it will access the sentiment2 local folder when it executes the sentiment analysis?
Please see the full code below. This code works on my computer and localhost but returns zero when I run it on the server.
from polyglot.downloader import downloader
#downloader.download("TASK:sentiment2")
from polyglot.text import Text
downloader.download_dir = os.path.join(os.getcwd(),'polyglot_data')
def get_text_sentiment(text):
result = 0
ttext = Text(text)
for w in ttext.words:
try:
result += w.polarity
except ValueError:
pass
if result:
return result/ len(ttext.words)
else:
return 0
text = "he is feeling proud with ❤"
print(get_text_sentiment(text))
my localhost returns - 0.1666
the server returns - 0.0
after looking at the polyglot git init function. Its an env. issue. data_path = os.environ.get('POLYGLOT_DATA_PATH', data_path)
So I removed the downloader.download_dir = os.path.join(os.getcwd(),'polyglot_data')
and simply define the local POLYGLOT_DATA_PATH in the env. path.
It worked.
Context
We're trying to do a GitLab runner job that, on a certain tag, modifies a version header file and add a release branch/tag to this changeset.
The GitLab runner server is on my machine, launched as a service by my user (that is properly registered to our GitLab server).
The GitLab runner job basically launches a python script that uses gitpython to du the job, there are just a few changes in runner yml file (added before_script part to be able to have upload permission, got it from there: https://stackoverflow.com/a/55344804/11159476), here is full .gitlab-ci.yml file:
variables:
GIT_SUBMODULE_STRATEGY: recursive
stages: [ build, publish, release ]
release_tag:
stage: build
before_script:
- git config --global user.name ${GITLAB_USER_NAME}
- git config --global user.email ${GITLAB_USER_EMAIL}
script:
- python .\scripts\release_gitlab_runner.py
only:
# Trigger on specific regex...
- /^Src_V[0-9]+\.[0-9]+\.[0-9]+$/
except:
# .. only for tags then except branches, see doc (https://docs.gitlab.com/ee/ci/yaml/#regular-expressions): "Only the tag or branch name can be matched by a regular expression."
- branches
Also added trick in the python URL when pushing (push with user:personal_access_token#repo_URL instead of default runner URL, got it from same answer as above, and token has been generated from company gitlab => user "Settings" => "Access Tokens" => "Add a personal access token" with all rights and never expiring), and here is, not the actual scripts\release_gitlab_runner.py python script but one simplified to have a git flow as much standard as possible for what we want (fetch all, create local branch with random name so that it does not exist, modify a file, stage, commit and finally push):
# -*-coding:utf-8 -*
import uuid
import git
import sys
import os
# Since we are in <git root path>/scripts folder, git root path is this file's path parent path
GIT_ROOT_PATH = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
try:
# Get user login and URL from GITLAB_USER_LOGIN and CI_REPOSITORY_URL gitlab environment variables
gitlabUserLogin = os.environ["GITLAB_USER_LOGIN"]
gitlabFullURL = os.environ["CI_REPOSITORY_URL"]
# Push at "https://${GITLAB_USER_NAME}:${PERSONAL_ACCESS_TOKEN}#gitlab.companyname.net/demo/demo.git")
# generatedPersonalAccessToken has been generated with full rights from https://gitlab.companyname.net/profile/personal_access_tokens and set in a variable not seen here
gitlabPushURL = "https://{}:{}#{}".format(gitlabUserLogin, generatedPersonalAccessToken, gitlabFullURL.split("#")[-1])
print("gitlabFullURL is [{}]".format(gitlabFullURL))
print("gitlabPushURL is [{}]".format(gitlabPushURL))
branchName = str(uuid.uuid1())
print("Build git.Repo object with [{}] root path".format(GIT_ROOT_PATH))
repo = git.Repo(GIT_ROOT_PATH)
print("Fetch all")
repo.git.fetch("-a")
print("Create new local branch [{}]".format(branchName))
repo.git.checkout("-b", branchName)
print("Modify file")
versionFile = os.path.join(GIT_ROOT_PATH, "public", "include" , "Version.h")
patchedVersionFileContent = ""
with open(versionFile, 'r') as versionFileContent:
patchedVersionFileContent = versionFileContent.read()
patchedVersionFileContent = re.sub("#define VERSION_MAJOR 0", "#define VERSION_MAJOR {}".format(75145), patchedVersionFileContent)
with open(versionFile, 'w') as versionFileContent:
versionFileContent.write(patchedVersionFileContent)
print("Stage file")
repo.git.add("-u")
print("Commit file")
repo.git.commit("-m", "New version file in new branch {}".format(branchName))
print("Push new branch [{}] remotely".format(branchName))
# The error is at below line:
repo.git.push(gitlabPushURL, "origin", branchName)
sys.exit(0)
except Exception as e:
print("Exception: {}".format(e))
sys.exit(-1)
Problem
Even with the trick to have rights, when we try to push from GitLab runner following error is raised:
Cmd('git') failed due to: exit code(1)
cmdline: git push https://user:token#gitlab.companyname.net/demo/repo.git origin 85a3fa6e-690a-11ea-a07d-e454e8696d31
stderr: 'error: src refspec origin does not match any
error: failed to push some refs to 'https://user:token#gitlab.companyname.net/demo/repo.git''
What works
If I open a Git Bash, I successfully run manual commands:
git fetch -a
git checkout -b newBranch
vim public/include/Version.h
=> At this point file has been modified
git add -u
git commit -m "New version file in new branch"
git push origin newBranch
Here if we fetch all from elsewhere we can see newBranch with version file modifications
And same if we run script content (without URL modification) from a python command line (assuming all imports as in script have been performed):
GIT_ROOT_PATH = "E:\\path\\to\\workspace\\repo"
branchName = str(uuid.uuid1())
repo = git.Repo(GIT_ROOT_PATH)
repo.git.fetch("-a")
repo.git.checkout("-b", branchName)
versionFile = os.path.join(GIT_ROOT_PATH, "public", "include" , "Version.h")
patchedVersionFileContent = ""
with open(versionFile, 'r') as versionFileContent:
patchedVersionFileContent = versionFileContent.read()
patchedVersionFileContent = re.sub("#define VERSION_MAJOR 0", "#define VERSION_MAJOR {}".format(75145), patchedVersionFileContent)
with open(versionFile, 'w') as versionFileContent:
versionFileContent.write(patchedVersionFileContent)
repo.git.add("-u")
repo.git.commit("-m", "New version file in new branch {}".format(branchName))
repo.git.push("origin", branchName)
Conclusion
I can't find what I do wrong when running from GitLab runner, is there something I'm missing ?
The only thing that I can see different when running from GitLab runner is that after fetch I can see I'm on a detached head (listing repo.git.branch('-a').split('\n') gives for example ['* (HEAD detached at 560976b)', 'branchName', 'remotes/origin/otherExistingBranch', ...]), but this should not be a problem since I create a new branch where to push, right ?
Git said that you used the wrong refspec. When you need to push in other remote you have to make it first gitlab = repo.create_remote("gitlab", gitlabPushURL) and push to it like repo.push("gitlab", branchName).
Edit from #gluttony to not break on next git run with "remote already exists":
remote_name = "gitlab"
if remote_name not in repo.remotes:
repo.create_remote(remote_name, gitlabPushURL)
I created new repository in my Github repository.
Using the gitpython library I'm able to get this repository. Then I create new branch, add new file, commit and try to push to the new branch.
Please check be code below:
import git
import random
import os
repo_name = 'test'
branch_name = 'feature4'
remote_repo_addr_git = 'git#repo:DevOps/z_sandbox1.git'
no = random.randint(0,1000)
repo = git.Repo.clone_from(remote_repo_addr_git, repo_name)
new_branch = repo.create_head(branch_name)
repo.head.set_reference(new_branch)
os.chdir(repo_name)
open("parasol" + str(no), "w+").write(str(no)) # this is added
print repo.active_branch
repo.git.add(A=True)
repo.git.commit(m='okej')
repo.git.push(u='origin feature4')
Everything working fine until last push method. I got this error:
stderr: 'fatal: 'origin feature4' does not appear to be a git repository
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.'
I'm able to run this method from command line and it's working fine:
git puth -u origin feature4
But it doesn't work in Python.
This worked for me:
repo.git.push("origin", "feature4")
Useful documentation for fetch/pull/push operations with gitpython:
https://gitpython.readthedocs.io/en/stable/reference.html?highlight=index.fetch#git.remote.Remote.fetch
from git import GitCommandError, Repo
repo_name = 'test'
branch_name = 'feature4'
remote_repo_addr_git = 'git#repo:DevOps/z_sandbox1.git'
# clone repo
repo = git.Repo.clone_from(remote_repo_addr_git, repo_name)
# refspec is a sort of mapping between remote:local references
refspec = f'refs/heads/{branch_name}:refs/heads/{branch_name}'
# get branch
try:
# if exists pull the branch
# the refspec here means: grab the {branch_name} branch head
# from the remote repo and store it as my {branch_name} branch head
repo.remotes.origin.pull(refspec)
except GitCommandError:
# if not exists create it
repo.create_head(branch_name)
# checkout branch
branch = repo.heads[branch_name]
branch.checkout()
# modify files
with open(f'{repo_name}/hello.txt', 'w') as file:
file.write('hello')
# stage & commit & push
repo.index.add('**')
repo.index.commit('added good manners')
# refspec here means: publish my {branch_name} branch head
# as {branch_name} remote branch
repo.remotes.origin.push(refspec)
When I'm trying to add files to bare repo:
import git
r = git.Repo("./bare-repo")
r.working_dir("/tmp/f")
print(r.bare) # True
r.index.add(["/tmp/f/foo"]) # Exception, can't use bare repo <...>
I only understood that I can add files only by Repo.index.add.
Is using bare repo with git-python module even possible? Or I need to use subprocess.call with git --work-tree=... --git-dir=... add ?
You can not add files into bare repositories. They are for sharing, not for working. You should clone bare repository to work with it. There is a nice post about it: www.saintsjd.com/2011/01/what-is-a-bare-git-repository/
UPDATE (16.06.2016)
Code sample as requested:
import git
import os, shutil
test_folder = "temp_folder"
# This is your bare repository
bare_repo_folder = os.path.join(test_folder, "bare-repo")
repo = git.Repo.init(bare_repo_folder, bare=True)
assert repo.bare
del repo
# This is non-bare repository where you can make your commits
non_bare_repo_folder = os.path.join(test_folder, "non-bare-repo")
# Clone bare repo into non-bare
cloned_repo = git.Repo.clone_from(bare_repo_folder, non_bare_repo_folder)
assert not cloned_repo.bare
# Make changes (e.g. create .gitignore file)
tmp_file = os.path.join(non_bare_repo_folder, ".gitignore")
with open(tmp_file, 'w') as f:
f.write("*.pyc")
# Run git regular operations (I use cmd commands, but you could use wrappers from git module)
cmd = cloned_repo.git
cmd.add(all=True)
cmd.commit(m=".gitignore was added")
# Push changes to bare repo
cmd.push("origin", "master", u=True)
del cloned_repo # Close Repo object and cmd associated with it
# Remove non-bare cloned repo
shutil.rmtree(non_bare_repo_folder)
This question should be related to:
How to get the current branch name in Git?
Get git current branch/tag name
How to get the name of the current git branch into a variable in a shell script?
How to programmatically determine the current checked out Git branch
But I am wondering how to do that through pygit2?
To get the conventional "shorthand" name:
from pygit2 import Repository
Repository('.').head.shorthand # 'master'
In case you don't want to or can't use pygit2
May need to alter path - this assumes you are in the parent directory of .git
from pathlib import Path
def get_active_branch_name():
head_dir = Path(".") / ".git" / "HEAD"
with head_dir.open("r") as f: content = f.read().splitlines()
for line in content:
if line[0:4] == "ref:":
return line.partition("refs/heads/")[2]
From
PyGit Documentation
Either of these should work
#!/usr/bin/python
from pygit2 import Repository
repo = Repository('/path/to/your/git/repo')
# option 1
head = repo.head
print("Head is " + head.name)
# option 2
head = repo.lookup_reference('HEAD').resolve()
print("Head is " + head.name)
You'll get the full name including /refs/heads/. If you don't want that strip it out or use shorthand instead of name.
./pygit_test.py
Head is refs/heads/master
Head is refs/heads/master
You can use GitPython:
from git import Repo
local_repo = Repo(path=settings.BASE_DIR)
local_branch = local_repo.active_branch.name