Python/Git/Azure Repos - Read remote file without Pulling/Downloading [duplicate] - python

How do I checkout just one file from a git repo?

Originally, I mentioned in 2012 git archive (see Jared Forsyth's answer and Robert Knight's answer), since git1.7.9.5 (March 2012), Paul Brannan's answer:
git archive --format=tar --remote=origin HEAD:path/to/directory -- filename | tar -O -xf -
But: in 2013, that was no longer possible for remote https://github.com URLs.
See the old page "Can I archive a repository?"
The current (2018) page "About archiving content and data on GitHub" recommends using third-party services like GHTorrent or GH Archive.
So you can also deal with local copies/clone:
You could alternatively do the following if you have a local copy of the bare repository as mentioned in this answer,
git --no-pager --git-dir /path/to/bar/repo.git show branch:path/to/file >file
Or you must clone first the repo, meaning you get the full history:
in the .git repo
in the working tree.
But then you can do a sparse checkout (if you are using Git1.7+),:
enable the sparse checkout option (git config core.sparsecheckout true)
adding what you want to see in the .git/info/sparse-checkout file
re-reading the working tree to only display what you need
To re-read the working tree:
$ git read-tree -m -u HEAD
That way, you end up with a working tree including precisely what you want (even if it is only one file)
Richard Gomes points (in the comments) to "How do I clone, fetch or sparse checkout a single directory or a list of directories from git repository?"
A bash function which avoids downloading the history, which retrieves a single branch and which retrieves a list of files or directories you need.
With Git 2.40 (Q1 2023), the logic to see if we are using the "cone" mode by checking the sparsity patterns has been tightened to avoid mistaking a pattern that names a single file as specifying a cone.
See commit 5842710 (03 Jan 2023) by William Sprent (williams-unity).
(Merged by Junio C Hamano -- gitster -- in commit ab85a7d, 16 Jan 2023)
dir: check for single file cone patterns
Signed-off-by: William Sprent
Acked-by: Victoria Dye
The sparse checkout documentation states that the cone mode pattern set is limited to patterns that either recursively include directories or patterns that match all files in a directory.
In the sparse checkout file, the former manifest in the form:
/A/B/C/
while the latter become a pair of patterns either in the form:
/A/B/
!/A/B/*/
or in the special case of matching the toplevel files:
/*
!/*/
The 'add_pattern_to_hashsets()' function contains checks which serve to disable cone-mode when non-cone patterns are encountered.
However, these do not catch when the pattern list attempts to match a single file or directory, e.g. a pattern in the form:
/A/B/C
This causes sparse-checkout to exhibit unexpected behaviour when such a pattern is in the sparse-checkout file and cone mode is enabled.
Concretely, with the pattern like the above, sparse-checkout, in non-cone mode, will only include the directory or file located at '/A/B/C'.
However, with cone mode enabled, sparse-checkout will instead just manifest the toplevel files but not any file located at '/A/B/C'.
Relatedly, issues occur when supplying the same kind of filter when partial cloning with '--filter=sparse:oid=<oid>'.
'upload-pack' will correctly just include the objects that match the non-cone pattern matching.
Which means that checking out the newly cloned repo with the same filter, but with cone mode enabled, fails due to missing objects.
To fix these issues, add a cone mode pattern check that asserts that every pattern is either a directory match or the pattern '/*'.
Add a test to verify the new pattern check and modify another to reflect that non-directory patterns are caught earlier.

First clone the repo with the -n option, which suppresses the default checkout of all files, and the --depth 1 option, which means it only gets the most recent revision of each file
git clone -n git://path/to/the_repo.git --depth 1
Then check out just the file you want like so:
cd the_repo
git checkout HEAD name_of_file

If you already have a copy of the git repo, you can always checkout a version of a file using a git log to find out the hash-id (for example 3cdc61015724f9965575ba954c8cd4232c8b42e4) and then you simply type:
git checkout hash-id path-to-file
Here is an actual example:
git checkout 3cdc61015724f9965575ba954c8cd4232c8b42e4 /var/www/css/page.css

Normally it's not possible to download just one file from git without downloading the whole repository as suggested in the first answer.
It's because Git doesn't store files as you think (as CVS/SVN do), but it generates them based on the entire history of the project.
But there are some workarounds for specific cases. Examples below with placeholders for user, project, branch, filename.
GitHub
wget https://raw.githubusercontent.com/user/project/branch/filename
GitLab
wget https://gitlab.com/user/project/raw/branch/filename
GitWeb
If you're using Git on the Server - GitWeb, then you may try in example (change it into the right path):
wget "http://example.com/gitweb/?p=example;a=blob_plain;f=README.txt;hb=HEAD"
GitWeb at drupalcode.org
Example:
wget "http://drupalcode.org/project/ads.git/blob_plain/refs/heads/master:/README.md"
googlesource.com
There is an undocumented feature that allows you to download base64-encoded versions of raw files:
curl "https://chromium.googlesource.com/chromium/src/net/+/master/http/transport_security_state_static.json?format=TEXT" | base64 --decode
In other cases check if your Git repository is using any web interfaces.
If it's not using any web interface, you may consider to push your code to external services such as GitHub, Bitbucket, etc. and use it as a mirror.
If you don't have wget installed, try curl -O (url) alternatively.

Minimal Guide
git checkout -- <filename>
Ref: https://git-scm.com/docs/git-checkout
Dup: Undo working copy modifications of one file in Git?

git checkout branch_or_version -- path/file
example: git checkout HEAD -- main.c

Now we can! As this is the first result on google, I thought I'd update this to the latest standing. With the advent of git 1.7.9.5, we have the git archive command which will allow you to retrieve a single file from a remote host.
git archive --remote=git://git.foo.com/project.git HEAD:path/in/repo filename | tar -x
See answer in full here https://stackoverflow.com/a/5324532/290784

Here is the complete solution for pulling and pushing only a particular file inside git repository:
First you need to clone git repository with a special hint –no checkout
git clone --no-checkout <git url>
The next step is to get rid of unstaged files in the index with the command:
git reset
Now you are allowed to start pulling files you want to change with the command:
git checkout origin/master <path to file>
Now the repository folder contains files that you may start editing right away. After editing you need to execute plain and familar sequence of commands.
git add <path to file>
git commit -m <message text>
git push

Working in GIT 1.7.2.2
For example you have a remote some_remote with branches branch1, branch32
so to checkout a specific file you call this commands:
git checkout remote/branch path/to/file
as an example it will be something like this
git checkout some_remote/branch32 conf/en/myscript.conf
git checkout some_remote/branch1 conf/fr/load.wav
This checkout command will copy the whole file structure conf/en and conf/fr into the current directory where you call these commands (of course I assume you ran git init at some point before)

Very simple:
git checkout from-branch-name -- path/to/the/file/you/want
This will not checkout the from-branch-name branch. You will stay on whatever branch you are on, and only that single file will be checked out from the specified branch.
Here's the relevant part of the manpage for git-checkout
git checkout [-p|--patch] [<tree-ish>] [--] <pathspec>...
When <paths> or --patch are given, git checkout does not switch
branches. It updates the named paths in the working tree from the
index file or from a named <tree-ish> (most often a commit). In
this case, the -b and --track options are meaningless and giving
either of them results in an error. The <tree-ish> argument can be
used to specify a specific tree-ish (i.e. commit, tag or tree) to
update the index for the given paths before updating the working
tree.
Hat tip to Ariejan de Vroom who taught me this from this blog post.

git clone --filter from Git 2.19
This option will actually skip fetching most unneeded objects from the server:
git clone --depth 1 --no-checkout --filter=blob:none \
"file://$(pwd)/server_repo" local_repo
cd local_repo
git checkout master -- mydir/myfile
The server should be configured with:
git config --local uploadpack.allowfilter 1
git config --local uploadpack.allowanysha1inwant 1
There is no server support as of v2.19.0, but it can already be locally tested.
TODO: --filter=blob:none skips all blobs, but still fetches all tree objects. But on a normal repo, this should be tiny compared to the files themselves, so this is already good enough. Asked at: https://www.spinics.net/lists/git/msg342006.html Devs replied a --filter=tree:0 is in the works to do that.
Remember that --depth 1 already implies --single-branch, see also: How do I clone a single branch in Git?
file://$(path) is required to overcome git clone protocol shenanigans: How to shallow clone a local git repository with a relative path?
The format of --filter is documented on man git-rev-list.
An extension was made to the Git remote protocol to support this feature.
Docs on Git tree:
https://github.com/git/git/blob/v2.19.0/Documentation/technical/partial-clone.txt
https://github.com/git/git/blob/v2.19.0/Documentation/rev-list-options.txt#L720
https://github.com/git/git/blob/v2.19.0/t/t5616-partial-clone.sh
Test it out
#!/usr/bin/env bash
set -eu
list-objects() (
git rev-list --all --objects
echo "master commit SHA: $(git log -1 --format="%H")"
echo "mybranch commit SHA: $(git log -1 --format="%H")"
git ls-tree master
git ls-tree mybranch | grep mybranch
git ls-tree master~ | grep root
)
# Reproducibility.
export GIT_COMMITTER_NAME='a'
export GIT_COMMITTER_EMAIL='a'
export GIT_AUTHOR_NAME='a'
export GIT_AUTHOR_EMAIL='a'
export GIT_COMMITTER_DATE='2000-01-01T00:00:00+0000'
export GIT_AUTHOR_DATE='2000-01-01T00:00:00+0000'
rm -rf server_repo local_repo
mkdir server_repo
cd server_repo
# Create repo.
git init --quiet
git config --local uploadpack.allowfilter 1
git config --local uploadpack.allowanysha1inwant 1
# First commit.
# Directories present in all branches.
mkdir d1 d2
printf 'd1/a' > ./d1/a
printf 'd1/b' > ./d1/b
printf 'd2/a' > ./d2/a
printf 'd2/b' > ./d2/b
# Present only in root.
mkdir 'root'
printf 'root' > ./root/root
git add .
git commit -m 'root' --quiet
# Second commit only on master.
git rm --quiet -r ./root
mkdir 'master'
printf 'master' > ./master/master
git add .
git commit -m 'master commit' --quiet
# Second commit only on mybranch.
git checkout -b mybranch --quiet master~
git rm --quiet -r ./root
mkdir 'mybranch'
printf 'mybranch' > ./mybranch/mybranch
git add .
git commit -m 'mybranch commit' --quiet
echo "# List and identify all objects"
list-objects
echo
# Restore master.
git checkout --quiet master
cd ..
# Clone. Don't checkout for now, only .git/ dir.
git clone --depth 1 --quiet --no-checkout --filter=blob:none "file://$(pwd)/server_repo" local_repo
cd local_repo
# List missing objects from master.
echo "# Missing objects after --no-checkout"
git rev-list --all --quiet --objects --missing=print
echo
echo "# Git checkout fails without internet"
mv ../server_repo ../server_repo.off
! git checkout master
echo
echo "# Git checkout fetches the missing file from internet"
mv ../server_repo.off ../server_repo
git checkout master -- d1/a
echo
echo "# Missing objects after checking out d1/a"
git rev-list --all --quiet --objects --missing=print
GitHub upstream.
Output in Git v2.19.0:
# List and identify all objects
c6fcdfaf2b1462f809aecdad83a186eeec00f9c1
fc5e97944480982cfc180a6d6634699921ee63ec
7251a83be9a03161acde7b71a8fda9be19f47128
62d67bce3c672fe2b9065f372726a11e57bade7e
b64bf435a3e54c5208a1b70b7bcb0fc627463a75 d1
308150e8fddde043f3dbbb8573abb6af1df96e63 d1/a
f70a17f51b7b30fec48a32e4f19ac15e261fd1a4 d1/b
84de03c312dc741d0f2a66df7b2f168d823e122a d2
0975df9b39e23c15f63db194df7f45c76528bccb d2/a
41484c13520fcbb6e7243a26fdb1fc9405c08520 d2/b
7d5230379e4652f1b1da7ed1e78e0b8253e03ba3 master
8b25206ff90e9432f6f1a8600f87a7bd695a24af master/master
ef29f15c9a7c5417944cc09711b6a9ee51b01d89
19f7a4ca4a038aff89d803f017f76d2b66063043 mybranch
1b671b190e293aa091239b8b5e8c149411d00523 mybranch/mybranch
c3760bb1a0ece87cdbaf9a563c77a45e30a4e30e
a0234da53ec608b54813b4271fbf00ba5318b99f root
93ca1422a8da0a9effc465eccbcb17e23015542d root/root
master commit SHA: fc5e97944480982cfc180a6d6634699921ee63ec
mybranch commit SHA: fc5e97944480982cfc180a6d6634699921ee63ec
040000 tree b64bf435a3e54c5208a1b70b7bcb0fc627463a75 d1
040000 tree 84de03c312dc741d0f2a66df7b2f168d823e122a d2
040000 tree 7d5230379e4652f1b1da7ed1e78e0b8253e03ba3 master
040000 tree 19f7a4ca4a038aff89d803f017f76d2b66063043 mybranch
040000 tree a0234da53ec608b54813b4271fbf00ba5318b99f root
# Missing objects after --no-checkout
?f70a17f51b7b30fec48a32e4f19ac15e261fd1a4
?8b25206ff90e9432f6f1a8600f87a7bd695a24af
?41484c13520fcbb6e7243a26fdb1fc9405c08520
?0975df9b39e23c15f63db194df7f45c76528bccb
?308150e8fddde043f3dbbb8573abb6af1df96e63
# Git checkout fails without internet
fatal: '/home/ciro/bak/git/test-git-web-interface/other-test-repos/partial-clone.tmp/server_repo' does not appear to be a git repository
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
# Git checkout fetches the missing directory from internet
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (1/1), 45 bytes | 45.00 KiB/s, done.
remote: Enumerating objects: 1, done.
remote: Counting objects: 100% (1/1), done.
remote: Total 1 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (1/1), 45 bytes | 45.00 KiB/s, done.
# Missing objects after checking out d1
?f70a17f51b7b30fec48a32e4f19ac15e261fd1a4
?8b25206ff90e9432f6f1a8600f87a7bd695a24af
?41484c13520fcbb6e7243a26fdb1fc9405c08520
?0975df9b39e23c15f63db194df7f45c76528bccb
Conclusions: all blobs except d1/a are missing. E.g. f70a17f51b7b30fec48a32e4f19ac15e261fd1a4, which is d1/b, is not there after checking out d1/.
Note that root/root and mybranch/mybranch are also missing, but --depth 1 hides that from the list of missing files. If you remove --depth 1, then they show on the list of missing files.

Two variants on what's already been given:
git archive --format=tar --remote=git://git.foo.com/project.git HEAD:path/to/directory filename | tar -O -xf -
and:
git archive --format=zip --remote=git://git.foo.com/project.git HEAD:path/to/directory filename | funzip
These write the file to standard output.

You can do it by
git archive --format=tar --remote=origin HEAD | tar xf -
git archive --format=tar --remote=origin HEAD <file> | tar xf -

Say the file name is 123.txt, this works for me:
git checkout --theirs 123.txt
If the file is inside a directory A, make sure to specify it correctly:
git checkout --theirs "A/123.txt"

In git you do not 'checkout' files before you update them - it seems like this is what you are after.
Many systems like clearcase, csv and so on require you to 'checkout' a file before you can make changes to it. Git does not require this. You clone a repository and then make changes in your local copy of repository.
Once you updated files you can do:
git status
To see what files have been modified. You add the ones you want to commit to index first with (index is like a list to be checked in):
git add .
or
git add blah.c
Then do git status will show you which files were modified and which are in index ready to be commited or checked in.
To commit files to your copy of repository do:
git commit -a -m "commit message here"
See git website for links to manuals and guides.

If you need a specific file from a specific branch from a remote Git repository the command is:
git archive --remote=git://git.example.com/project.git refs/heads/mybranch path/to/myfile |tar xf -
The rest can be derived from #VonC's answer:
If you need a specific file from the master branch it is:
git archive --remote=git://git.example.com/project.git HEAD path/to/myfile |tar xf -
If you need a specific file from a tag it is:
git archive --remote=git://git.example.com/project.git mytag path/to/myfile |tar xf -

this works for me. use git with some shell command
git clone --no-checkout --depth 1 git.example.com/project.git && cd project && git show HEAD:path/to/file_you_need > ../file_you_need && cd .. && rm -rf project

Another solution, similar to the one using --filter=blob:none is to use --filter=tree:0 (you can read an explanation about the differences here).
This method is usually faster than the blob-one because it doesn't download the tree structure, but has a drawback. Given you are delaying the retrieval of the tree, you will have a penalty when you enter into the repo directory (depending on the size and structure of your repo it may be many times larger compared with a simple shallow-clone).
If that's the case for you, you can fix it by not entering into the repo:
git clone -n --filter=tree:0 <repo_url> tgt_dir
git -C tgt_dir checkout <branch> -- <filename>
cat tgt_dir/<filename> # or move it to another place and delete tgt_dir ;)
Take into consideration that if you have to checkout multiple files, the tree population will also impact your performance, so I recommend this for a single file and only if the repo is large enough to be worth it all these actions.

It sounds like you're trying to carry over an idea from centralized version control, which git by nature is not - it's distributed. If you want to work with a git repository, you clone it. You then have all of the contents of the work tree, and all of the history (well, at least everything leading up to the tip of the current branch), not just a single file or a snapshot from a single commit.
git clone /path/to/repo
git clone git://url/of/repo
git clone http://url/of/repo

I am adding this answer as an alternative to doing a formal checkout or some similar local operation. Assuming that you have access to the web interface of your Git provider, you might be able to directly view any file at a given desired commit. For example, on GitHub you may use something like:
https://github.com/hubotio/hubot/blob/ed25584f/src/adapter.coffee
Here ed25584f is the first 8 characters from the SHA-1 hash of the commit of interest, followed by the path to the source file.
Similary, on Bitbucket we can try:
https://bitbucket.org/cofarrell/stash-browse-code-plugin/src/06befe08
In this case, we place the commit hash at the end of the source URL.

In codecommit (git version of Amazon AWS) you can do this:
aws codecommit \
get-file --repository-name myrepo \
--commit-specifier master \
--file-path path/myfile \
--output text \
--query fileContent |
base64 --decode > myfile

I don’t see what worked for me listed out here so I will include it should anybody be in my situation.
My situation, I have a remote repository of maybe 10,000 files and I need to build an RPM file for my Linux system. The build of the RPM includes a git clone of everything. All I need is one file to start the RPM build. I can clone the entire source tree which does what I need but it takes an extra two minutes to download all those files when all I need is one. I tried to use the git archive option discussed and I got “fatal: Operation not supported by protocol.” It seems I have to get some sort of archive option enabled on the server and my server is maintained by bureaucratic thugs that seem to enjoy making it difficult to get things done.
What I finally did was I went into the web interface for bitbucket and viewed the one file I needed. I did a right click on the link to download a raw copy of the file and selected “copy shortcut” from the resulting popup. I could not just download the raw file because I needed to automate things and I don’t have a browser interface on my Linux server.
For the sake of discussion, that resulted in the URL:
https://ourArchive.ourCompany.com/projects/ThisProject/repos/data/raw/foo/bar.spec?at=refs%2Fheads%2FTheBranchOfInterest
I could not directly download this file from the bitbucket repository because I needed to sign in first. After a little digging, I found this worked:
On Linux:
echo "myUser:myPass123"| base64
bXlVc2VyOm15UGFzczEyMwo=
curl -H 'Authorization: Basic bXlVc2VyOm15UGFzczEyMwo=' 'https://ourArchive.ourCompany.com/projects/ThisProject/repos/data/raw/foo/bar.spec?at=refs%2Fheads%2FTheBranchOfInterest' > bar.spec
This combination allowed me to download the one file I needed to build everything else.

if you have a file, locally changed (the one which messing with git pull) just do:
git checkout origin/master filename
git checkout - switch branches or restore working tree files, (here we switching nothing, just overwriting file
origin/master - your current branch or you can use specific revision-number for example: cd0fa799c582e94e59e5b21e872f5ffe2ad0154b,
filename with path from project main directory (where directory .git lives)
so if you have structure:
`.git
public/index.html
public/css/style.css
vendors
composer.lock`
and want reload index.html - just use public/index.html

Yes you can this by this command which download one specific file
wget -o <DesiredFileName> <Git FilePath>\?token\=<personalGitToken>
example
wget -o javascript-test-automation.md https://github.com/akashgupta03/awesome-test-automation/blob/master/javascript-test-automation.md\?token\=<githubPersonalTone>

git checkout <other-branch> -- <single-file> works for me on git.2.37.1.
However, the file is (git-magically) staged for commit and I can not see git diff properly.
I then run git restore --staged db/structure.sql to unstage it.
That way I DO have the file in the exact version that I want and I can see the difference with other versions of that file.

If you have edited a local version of a file and wish to revert to the original version maintained on the central server, this can be easily achieved using Git Extensions.
Initially the file will be marked for commit, since it has been modified
Select (double click) the file in the file tree menu
The revision tree for the single file is listed.
Select the top/HEAD of the tree and right click save as
Save the file to overwrite the modified local version of the file
The file now has the correct version and will no longer be marked for commit!
Easy!

If you only need to download the file, no need to check out with Git.
GitHub Mate is much easier to do so, it's a Chrome extension, enables you click the file icon to download it. also open source

Related

How can I implement git like cli with python click with commands subcommands and options.?

Want to achieve like this using click framework in python:
Each options have aliases and suboptions have short form and long form(-n, --dry-run)
git --help
work on the current change (see also: git help everyday)
add Add file contents to the index
mv Move or rename a file, a directory, or a symlink
restore Restore working tree files
rm Remove files from the working tree and from the index
sparse-checkout Initialize and modify the sparse-checkout
git add -h
usage: git add [<options>] [--] <pathspec>...
-n, --dry-run dry run
-v, --verbose be verbose
-i, --interactive interactive picking

Best way to push the project folder into GitHub

I have a python project and i am trying to write a script to push the project folder into git
What are the steps that should be taken
I have tried
https://www.freecodecamp.org/news/automate-project-github-setup-mac/
but cant seem to fix it
Firstly, you need to create a repo on the Github UI.
After creating the repo, Github will give you a URL to that repo.
Assuming you have SSH keys setup, you'll then be able to use that URL to push your commits like so:
On a project directory:
git init # initialize a repo; create a .git folder with git internal data
git add file1.py file2.sh file3.js README.md # choose which files you'd like to upload
git commit -m "Add project files" # create a commit
git remote add origin "github repo url"
git push # upload commit to Github
Automating this "first push" for several projects is relatively easy.
Just be careful with what files you're going to git add. Git is not designed to version binaries like images, pdfs, etc.
I use this code. Assuming you have SSH keys setup.
import os
from datetime import datetime
current_time = datetime.now().strftime("%H:%M:%S")
os.system("git init")
os.system("git remote rm origin")
os.system("git remote add origin git#github.com:user/repository.git")
os.system('git config --global user.email "email#example.com"')
os.system('git config --global user.name "username"')
os.system("git rm -r --cached .")
os.system("git add .")
git_commit_with_time = f'git commit -m "update:{current_time}"'
os.system(git_commit_with_time)
os.system("git push -f --set-upstream origin master")
You can also customize it according to you.
You can use something else instead of the current_time in the commit.
I have been using this for many months and I found it the best way.
Because since I have automated the task, I have not had any problem till date.

How to gitignore particular files [duplicate]

I put a file that was previously being tracked by Git onto the .gitignore list. However, the file still shows up in git status after it is edited. How do I force Git to completely forget the file?
.gitignore will prevent untracked files from being added (without an add -f) to the set of files tracked by Git. However, Git will continue to track any files that are already being tracked.
To stop tracking a file, we must remove it from the index:
git rm --cached <file>
To remove a folder and all files in the folder recursively:
git rm -r --cached <folder>
The removal of the file from the head revision will happen on the next commit.
WARNING: While this will not remove the physical file from your local machine, it will remove the files from other developers' machines on their next git pull.
The series of commands below will remove all of the items from the Git index (not from the working directory or local repository), and then will update the Git index, while respecting Git ignores. PS. Index = Cache
First:
git rm -r --cached .
git add .
Then:
git commit -am "Remove ignored files"
Or as a one-liner:
git rm -r --cached . && git add . && git commit -am "Remove ignored files"
git update-index does the job for me:
git update-index --assume-unchanged <file>
Note: This solution is actually independent of .gitignore as gitignore is only for untracked files.
Update, a better option
Since this answer was posted, a new option has been created and that should be preferred. You should use --skip-worktree which is for modified tracked files that the user don't want to commit anymore and keep --assume-unchanged for performance to prevent git to check status of big tracked files. See https://stackoverflow.com/a/13631525/717372 for more details...
git update-index --skip-worktree <file>
To cancel
git update-index --no-skip-worktree <file>
git ls-files -c --ignored --exclude-standard -z | xargs -0 git rm --cached
git commit -am "Remove ignored files"
This takes the list of the ignored files, removes them from the index, and commits the changes.
Move it out, commit, and then move it back in.
This has worked for me in the past, but there is probably a 'gittier' way to accomplish this.
I always use this command to remove those untracked files.
One-line, Unix-style, clean output:
git ls-files --ignored --exclude-standard | sed 's/.*/"&"/' | xargs git rm -r --cached
It lists all your ignored files, replaces every output line with a quoted line instead to handle paths with spaces inside, and passes everything to git rm -r --cached to remove the paths/files/directories from the index.
The copy/paste (one-liner) answer is:
git rm --cached -r .; git add .; git status; git commit -m "Ignore unwanted files"
This command will NOT change the content of the .gitignore file. It will just ignore the files that have already been committed to a Git repository, but now we have added them to .gitignore.
The command git status; is to review the changes and could be dropped.
Ultimately, it will immediately commit the changes with the message "Ignore unwanted files".
If you don't want to commit the changes, drop the last part of the command (git commit -m "Ignore unwanted files")
Use this when:
You want to untrack a lot of files, or
You updated your .gitignore file
Source: Untrack files already added to Git repository based on .gitignore
Let’s say you have already added/committed some files to your Git repository and you then add them to your .gitignore file; these files will still be present in your repository index. This article we will see how to get rid of them.
Step 1: Commit all your changes
Before proceeding, make sure all your changes are committed, including your .gitignore file.
Step 2: Remove everything from the repository
To clear your repository, use:
git rm -r --cached .
rm is the remove command
-r will allow recursive removal
–cached will only remove files from the index. Your files will still be there.
The rm command can be unforgiving. If you wish to try what it does beforehand, add the -n or --dry-run flag to test things out.
Step 3: Readd everything
git add .
Step 4: Commit
git commit -m ".gitignore fix"
Your repository is clean :)
Push the changes to your remote to see the changes effective there as well.
If you cannot git rm a tracked file because other people might need it (warning, even if you git rm --cached, when someone else gets this change, their files will be deleted in their filesystem). These are often done due to config file overrides, authentication credentials, etc. Please look at https://gist.github.com/1423106 for ways people have worked around the problem.
To summarize:
Have your application look for an ignored file config-overide.ini and use that over the committed file config.ini (or alternately, look for ~/.config/myapp.ini, or $MYCONFIGFILE)
Commit file config-sample.ini and ignore file config.ini, have a script or similar copy the file as necessary if necessary.
Try to use gitattributes clean/smudge magic to apply and remove the changes for you, for instance smudge the config file as a checkout from an alternate branch and clean the config file as a checkout from HEAD. This is tricky stuff, I don't recommend it for the novice user.
Keep the config file on a deploy branch dedicated to it that is never merged to master. When you want to deploy/compile/test you merge to that branch and get that file. This is essentially the smudge/clean approach except using human merge policies and extra-git modules.
Anti-recommentation: Don't use assume-unchanged, it will only end in tears (because having git lie to itself will cause bad things to happen, like your change being lost forever).
I accomplished this by using git filter-branch. The exact command I used was taken from the man page:
WARNING: this will delete the file from your entire history
git filter-branch --index-filter 'git rm --cached --ignore-unmatch filename' HEAD
This command will recreate the entire commit history, executing git rm before each commit and so will get rid of the specified file. Don't forget to back it up before running the command as it will be lost.
What didn't work for me
(Under Linux), I wanted to use the posts here suggesting the ls-files --ignored --exclude-standard | xargs git rm -r --cached approach. However, (some of) the files to be removed had an embedded newline/LF/\n in their names. Neither of the solutions:
git ls-files --ignored --exclude-standard | xargs -d"\n" git rm --cached
git ls-files --ignored --exclude-standard | sed 's/.*/"&"/' | xargs git rm -r --cached
cope with this situation (get errors about files not found).
So I offer
git ls-files -z --ignored --exclude-standard | xargs -0 git rm -r --cached
git commit -am "Remove ignored files"
This uses the -z argument to ls-files, and the -0 argument to xargs to cater safely/correctly for "nasty" characters in filenames.
In the manual page git-ls-files(1), it states:
When -z option is not used, TAB, LF, and backslash characters in
pathnames are represented as \t, \n, and \\, respectively.
so I think my solution is needed if filenames have any of these characters in them.
Do the following steps for a file/folder:
Remove a File:
need to add that file to .gitignore.
need to remove that file using the command (git rm --cached file name).
need to run (git add .).
need to (commit -m) "file removed".
and finally, (git push).
For example:
I want to delete the test.txt file. I accidentally pushed to GitHub and want to remove it. Commands will be as follows:
First, add "test.txt" in file .gitignore
git rm --cached test.txt
git add .
git commit -m "test.txt removed"
git push
Remove Folder:
need to add that folder to file .gitignore.
need to remove that folder using the command (git rm -r --cached folder name).
need to run (git add .).
need to (commit -m) "folder removed".
and finally, (git push).
For example:
I want to delete the .idea folder/directory. I accidentally pushed to GitHub and want to remove it. The commands will be as follows:
First, add .idea in file .gitignore
git rm -r --cached .idea
git add .
git commit -m ".idea removed"
git push
Update your .gitignore file – for instance, add a folder you don't want to track to .gitignore.
git rm -r --cached . – Remove all tracked files, including wanted and unwanted. Your code will be safe as long as you have saved locally.
git add . – All files will be added back in, except those in .gitignore.
Hat tip to #AkiraYamamoto for pointing us in the right direction.
Do the following steps serially, and you will be fine.
Remove the mistakenly added files from the directory/storage. You can use the "rm -r" (for Linux) command or delete them by browsing the directories. Or move them to another location on your PC. (You maybe need to close the IDE if running for moving/removing.)
Add the files / directories to the .gitignore file now and save it.
Now remove them from the Git cache by using these commands (if there is more than one directory, remove them one by one by repeatedly issuing this command)
git rm -r --cached path-to-those-files
Now do a commit and push by using the following commands. This will remove those files from Git remote and make Git stop tracking those files.
git add .
git commit -m "removed unnecessary files from Git"
git push origin
I think, that maybe Git can't totally forget about a file because of its conception (section "Snapshots, Not Differences").
This problem is absent, for example, when using CVS. CVS stores information as a list of file-based changes. Information for CVS is a set of files and the changes made to each file over time.
But in Git every time you commit, or save the state of your project, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. So, if you added file once, it will always be present in that snapshot.
These two articles were helpful for me:
git assume-unchanged vs skip-worktree and How to ignore changes in tracked files with Git
Basing on it I do the following, if the file is already tracked:
git update-index --skip-worktree <file>
From this moment all local changes in this file will be ignored and will not go to remote. If the file is changed on remote, conflict will occur, when git pull. Stash won't work. To resolve it, copy the file content to the safe place and follow these steps:
git update-index --no-skip-worktree <file>
git stash
git pull
The file content will be replaced by the remote content. Paste your changes from the safe place to the file and perform again:
git update-index --skip-worktree <file>
If everyone, who works with the project, will perform git update-index --skip-worktree <file>, problems with pull should be absent. This solution is OK for configurations files, when every developer has their own project configuration.
It is not very convenient to do this every time, when the file has been changed on remote, but it can protect it from overwriting by remote content.
Using the git rm --cached command does not answer the original question:
How do you force git to completely forget about [a file]?
In fact, this solution will cause the file to be deleted in every other instance of the repository when executing a git pull!
The correct way to force Git to forget about a file is documented by GitHub here.
I recommend reading the documentation, but basically:
git fetch --all
git filter-branch --force --index-filter 'git rm --cached --ignore-unmatch full/path/to/file' --prune-empty --tag-name-filter cat -- --all
git push origin --force --all
git push origin --force --tags
git for-each-ref --format='delete %(refname)' refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now
Just replace full/path/to/file with the full path of the file. Make sure you've added the file to your .gitignore file.
You'll also need to (temporarily) allow non-fast-forward pushes to your repository, since you're changing your Git history.
Move or copy the file to a safe location, so you don't lose it. Then 'git rm' the file and commit.
The file will still show up if you revert to one of those earlier commits, or another branch where it has not been removed. However, in all future commits, you will not see the file again. If the file is in the Git ignore, then you can move it back into the folder, and Git won't see it.
The answer from Matt Frear was the most effective IMHO. The following is just a PowerShell script for those on Windows to only remove files from their Git repository that matches their exclusion list.
# Get files matching exclusionsfrom .gitignore
# Excluding comments and empty lines
$ignoreFiles = gc .gitignore | ?{$_ -notmatch "#"} | ?{$_ -match "\S"} | % {
$ignore = "*" + $_ + "*"
(gci -r -i $ignore).FullName
}
$ignoreFiles = $ignoreFiles| ?{$_ -match "\S"}
# Remove each of these file from Git
$ignoreFiles | % { git rm $_}
git add .
The accepted answer does not "make Git "forget" about a file..." (historically). It only makes Git ignore the file in the present/future.
This method makes Git completely forget ignored files (past/present/future), but it does not delete anything from the working directory (even when re-pulled from remote).
This method requires usage of file /.git/info/exclude (preferred) or a pre-existing .gitignore in all the commits that have files to be ignored/forgotten. 1
All methods of enforcing Git ignore behavior after-the-fact effectively rewrite history and thus have significant ramifications for any public/shared/collaborative repositories that might be pulled after this process. 2
General advice: start with a clean repository - everything committed, nothing pending in working directory or index, and make a backup!
Also, the comments/revision history of this answer (and revision history of this question) may be useful/enlightening.
#Commit up-to-date .gitignore (if not already existing)
#This command must be run on each branch
git add .gitignore
git commit -m "Create .gitignore"
#Apply standard Git ignore behavior only to the current index, not the working directory (--cached)
#If this command returns nothing, ensure /.git/info/exclude AND/OR .gitignore exist
#This command must be run on each branch
git ls-files -z --ignored --exclude-standard | xargs -0 git rm --cached
#Commit to prevent working directory data loss!
#This commit will be automatically deleted by the --prune-empty flag in the following command
#This command must be run on each branch
git commit -m "ignored index"
#Apply standard git ignore behavior RETROACTIVELY to all commits from all branches (--all)
#This step WILL delete ignored files from working directory UNLESS they have been dereferenced from the index by the commit above
#This step will also delete any "empty" commits. If deliberate "empty" commits should be kept, remove --prune-empty and instead run git reset HEAD^ immediately after this command
git filter-branch --tree-filter 'git ls-files -z --ignored --exclude-standard | xargs -0 git rm -f --ignore-unmatch' --prune-empty --tag-name-filter cat -- --all
#List all still-existing files that are now ignored properly
#If this command returns nothing, it's time to restore from backup and start over
#This command must be run on each branch
git ls-files --other --ignored --exclude-standard
Finally, follow the rest of this GitHub guide (starting at step 6) which includes important warnings/information about the commands below.
git push origin --force --all
git push origin --force --tags
git for-each-ref --format="delete %(refname)" refs/original | git update-ref --stdin
git reflog expire --expire=now --all
git gc --prune=now
Other developers that pull from the now-modified remote repository should make a backup and then:
#fetch modified remote
git fetch --all
#"Pull" changes WITHOUT deleting newly-ignored files from working directory
#This will overwrite local tracked files with remote - ensure any local modifications are backed-up/stashed
git reset FETCH_HEAD
Footnotes
1 Because /.git/info/exclude can be applied to all historical commits using the instructions above, perhaps details about getting a .gitignore file into the historical commit(s) that need it is beyond the scope of this answer. I wanted a proper .gitignore file to be in the root commit, as if it was the first thing I did. Others may not care since /.git/info/exclude can accomplish the same thing regardless where the .gitignore file exists in the commit history, and clearly rewriting history is a very touchy subject, even when aware of the ramifications.
FWIW, potential methods may include git rebase or a git filter-branch that copies an external .gitignore into each commit, like the answers to this question.
2 Enforcing Git ignore behavior after-the-fact by committing the results of a stand-alone git rm --cached command may result in newly-ignored file deletion in future pulls from the force-pushed remote. The --prune-empty flag in the following git filter-branch command avoids this problem by automatically removing the previous "delete all ignored files" index-only commit. Rewriting Git history also changes commit hashes, which will wreak havoc on future pulls from public/shared/collaborative repositories. Please understand the ramifications fully before doing this to such a repository. This GitHub guide specifies the following:
Tell your collaborators to rebase, not merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging.
Alternative solutions that do not affect the remote repository are git update-index --assume-unchanged </path/file> or git update-index --skip-worktree <file>, examples of which can be found here.
In my case I needed to put ".envrc" in the .gitignore file.
And then I used:
git update-index --skip-worktree .envrc
git rm --cached .envrc
And the file was removed.
Then I committed again, telling that the file was removed.
But when I used the command git log -p, the content of the file (which was secret credentials of the Amazon S3) was showing the content which was removed and I don't want to show this content ever on the history of the Git repository.
Then I used this command:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch .envrc' HEAD
And I don't see the content again.
I liked JonBrave's answer, but I have messy enough working directories that commit -a scares me a bit, so here's what I've done:
git config --global alias.exclude-ignored '!git ls-files -z --ignored --exclude-standard | xargs -0 git rm -r --cached && git ls-files -z --ignored --exclude-standard | xargs -0 git stage && git stage .gitignore && git commit -m "new gitignore and remove ignored files from index"'
Breaking it down:
git ls-files -z --ignored --exclude-standard | xargs -0 git rm -r --cached
git ls-files -z --ignored --exclude-standard | xargs -0 git stage
git stage .gitignore
git commit -m "new gitignore and remove ignored files from index"
remove ignored files from the index
stage .gitignore and the files you just removed
commit
The BFG is specifically designed for removing unwanted data like big files or passwords from Git repositories, so it has a simple flag that will remove any large historical (not-in-your-current-commit) files: '--strip-blobs-bigger-than'
java -jar bfg.jar --strip-blobs-bigger-than 100M
If you'd like to specify files by name, you can do that too:
java -jar bfg.jar --delete-files *.mp4
The BFG is 10-1000x faster than git filter-branch and is generally much easier to use - check the full usage instructions and examples for more details.
Source: Reduce repository size
If you don't want to use the CLI and are working on Windows, a very simple solution is to use TortoiseGit. It has the "Delete (keep local)" Action in the menu which works fine.
This is no longer an issue in the latest Git (v2.17.1 at the time of writing).
The .gitignore file finally ignores tracked-but-deleted files. You can test this for yourself by running the following script. The final git status statement should report "nothing to commit".
# Create an empty repository
mkdir gitignore-test
cd gitignore-test
git init
# Create a file and commit it
echo "hello" > file
git add file
git commit -m initial
# Add the file to gitignore and commit
echo "file" > .gitignore
git add .gitignore
git commit -m gitignore
# Remove the file and commit
git rm file
git commit -m "removed file"
# Reintroduce the file and check status.
# .gitignore is now respected - status reports "nothing to commit".
echo "hello" > file
git status
This is how I solved my issue:
git filter-branch --tree-filter 'rm -rf path/to/your/file' HEAD
git push
In this, we are basically trying to rewrite the history of that particular file in previous commits also.
For more information, you can refer to the man page of filter-branch here.
Source: Removing sensitive data from a repository - using filter-branch
Source: Git: How to remove a big file wrongly committed
In case of already committed DS_Store:
find . -name .DS_Store -print0 | xargs -0 git rm --ignore-unmatch
Ignore them by:
echo ".DS_Store" >> ~/.gitignore_global
echo "._.DS_Store" >> ~/.gitignore_global
echo "**/.DS_Store" >> ~/.gitignore_global
echo "**/._.DS_Store" >> ~/.gitignore_global
git config --global core.excludesfile ~/.gitignore_global
Finally, make a commit!
Especially for the IDE-based files, I use this:
For instance, for the slnx.sqlite file, I just got rid off it completely like the following:
git rm {PATH_OF_THE_FILE}/slnx.sqlite -f
git commit -m "remove slnx.sqlite"
Just keep that in mind that some of those files store some local user settings and preferences for projects (like what files you had open). So every time you navigate or do some changes in your IDE, that file is changed and therefore it checks it out and show as uncommitted changes.
If anyone is having a hard time on Windows and you want to ignore the entire folder, go to the desired 'folder' on file explorer, right click and do 'Git Bash Here' (Git for Windows should have been installed).
Run this command:
git ls-files -z | xargs -0 git update-index --assume-unchanged
For me, the file was still available in the history and I first needed to squash the commits that added the removed files: https://gist.github.com/patik/b8a9dc5cd356f9f6f980
Combine the commits. The example below combines the last 3 commits
git reset --soft HEAD~3
git commit -m "New message for the combined commit"
Push the squashed commit
If the commits have been pushed to the remote:
git push origin +name-of-branch
In my case here, I had several .lock files in several directories that I needed to remove. I ran the following and it worked without having to go into each directory to remove them:
git rm -r --cached **/*.lock
Doing this went into each folder under the 'root' of where I was at and excluded all files that matched the pattern.

Git Notification of Untracked and Changes files before initial commit

I have a project which contains multiple packages, package_a/ and package_b/. Each package contains a class or two in appropriate sub-directories. I am okay with both of these packages existing on the master branch. However, I run into trouble when I do the following procedure:
I added, committed and pushed package_a/ to the master branch. This was pushed upstream (I believe that is the term) when I pushed by entering git push -u origin master during my push. This is reflected when I log into my github account. Everything is great so far.
I add package_b/. After adding package_b/ but before committing and pushing it, I enter the directory and see a bunch of files that were added, which is to be expected since it contains sub-directories containing classes, and an init.py file. However, I also see a sub-directory that has modified content, hence the changes were not staged for commit.
Essentially:
new file: package_b/__init__.py
new file: package_b/class_b
Changes not staged for commit:
modified: package_b/class_b
How can there be modified content since I have not yet committed anything in package_b? package_a does not import anything from package_b and vice versa, in case that is relevant.
Very new to git here and trying to establish good practices early. I appreciate the feedback!
Git appears as the following:
Peter Altamura#AltamuraDesktop MINGW64 /f/liteSaber (master)
$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
containers/
nothing added to commit but untracked files present (use "git add" to track)
Peter Altamura#AltamuraDesktop MINGW64 /f/liteSaber (master)
$ git checkout -b gd_api
Switched to a new branch 'gd_api'
Peter Altamura#AltamuraDesktop MINGW64 /f/liteSaber (gd_api)
$ git add containers/
Peter Altamura#AltamuraDesktop MINGW64 /f/liteSaber (gd_api)
$ git status
On branch gd_api
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
new file: containers/__init__.py
new file: containers/base_obj.py
new file: containers/game/__init__.py
new file: containers/game/game.py
new file: containers/gameday
new file: containers/player/__init__.py
new file: containers/player/player.py
new file: containers/player/team.py
new file: containers/toolkit/__init__.py
new file: containers/toolkit/reference.py
new file: containers/toolkit/tools.py
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
(commit or discard the untracked or modified content in submodules)
modified: containers/gameday (modified content, untracked content)
$ find -name .git
./.git
./containers/.git
./containers/gameday/.git
What git status is showing you are the changes between what is in your local directory and what you have staged. So it looks like added class_b to the index with git add (but did not commit), and then changed class_b in some way. So what this is telling you is that if you run git commit, only the contents of class_b at the time you called git add will be committed.
Keep in mind that git add is not just for adding new files to the repo, but also for adding changes to an existing file to the repo (or more accurately, overwriting the existing file in the index with you local copy). It is not the best named command, given what it does.

Put part of a git submodule to different location

I would like to use git submodule to include another Python library into my application. The only problem is, that I need the actual module to go into a special location from which I can load it. I already thought about a script that would simply copy the contents I need from the submodule to the location it needs to be, but I was hoping that there is already an existing solution.
It would be nice if the contents from the submodule I need in a different location would automatically be synced when the submodule is updated. Let me finally give a short illustration to the problem I have:
git submodule add git#github.com:nr-python/nr.async.git vendor/nr.async
cp vendor/nr.async/src libs/nr.async.egg
Now if I update the submodule, I would need to copy the contents again.
cd vendor/nr.async && git pull origin master && cd ../..
rm -r libs/nr.async && cp vendor/nr.async/src libs/nr.async.egg
You can put the worktree for a repo anywhere you like -- and you can put the repo for a worktree anywhere you like.
To configure the worktree for a repo, do
git config core.worktree /path/to/worktree
To configure the repo path for a worktree, put the repo where you want (and named anything you want) and do
echo gitdir: /path/to/repo >/path/to/worktree/.git
You can also override the paths git will find via the filesystem using environment variables GIT_DIR and GIT_WORK_TREE.

Categories

Resources