Gitlab: remove notebooks from language percentage

Gitlab: remove notebooks from language percentage - python

At the top of a Gitlab project, there is a bar showing the percentage of each language used inside the project.
In my repository I have dozens of large python files and one little notebook with a few lines of code, but it shows that the project contains mostly notebooks. This is not a bug, it's just related to the fact that plots in particular generates tons of raw lines in the .ipynb files.
I want to avoid this behavior, e.g. by telling Gitlab not to count the lines of this file. I found some solutions for Github, but not for Gitlab.
NB: I don't want to create an extra repository to host one little notebook, even though it would solve this.

Add to (or create) your .gitattributes file with the following content:
*.ipynb -linguist-detectable
This will tell linguist to ignore these files when calculating the languages. Similar attributes should also work, like linguist-vendored or linguist-generated.
Also note, per the documentation changes to the .gitattributes file must be committed to the root of the default branch of the project to take effect.

Related

What package is causing Atom to reformat my Python files on-save?

Every time I save a Python file on my Atom text editor, it gets automatically reformatted.
Usually this is a good thing, but sometimes it's extremely frustrating, in certain situations where the formatter makes bad choices.
What package is causing this behavior and how can I disable it?

I have confirmed that atom-ide-ui was indeed the culprit. Note that the repo for that project is now archived.

Rename folder in git without changing the contributors

I have a problem: we are using a package that is not maintained for a while now. So we forked it in order to maintain it ourselves. The package already exists lets say it is named package_a. Most of the code and the __init__ are in the package_a/ folder.
Now we want to make our own package that will include our maintained code and we want to name is package_b. So far so good but the problems is that package_b wants to have the code and the __init__ in package_b/ folder and github changes the contributions for all files when a folder is renamed. And I would like that credit for contributions stays where it is due, the 10k+ lines of code didn't just appear in my local repo out of thin air. Any suggestions how we can have package named package_b but keep the code in the original folder package_a/?
I am thinking along the lines of trying with some clever way of importing package_a into package_b or something along the line but I hope for a definite answer.

Instead of copying the code or trying to import A into B, extract the common code into a 3rd package which both A and B import. Or perhaps a subclass. This doesn't solve your contribution problem, but it does avoid making a big maintenance hassle by copying and pasting 10,000 lines of code.
Git doesn't record copies and renames, but it can recognize when they happen. To give Git the best chance of recognizing a copy, do only the copy in its own commit. Make no changes to the content. Then in a second commit make any necessary changes to the copied code.
In normal Git you can nudge git log and git blame to honor copies and renames with -C. Git doesn't do this by default because it's more expensive.
Github will do what Github will do.
Regardless of who Github says who wrote what line their contributions will still be in the project history. That's how it goes. You make your contribution and then others put their own work on top of it. This is normal. Their contributions remain in the history.
"History sheer" is also normal, that's when a change touches many lines but is otherwise insignificant. For example, if you were to restyle the code that would cause a history sheer. git blame will say that was the last commit to touch the code. git blame -w mitigates this somewhat, and Github has an "ignore whitespace" option. History sheer is normal and so is learning to skip over it.
The tools work for us. Don't bend yourself for the benefit of the tools.
If you want to make a special shout out to your contributors, make a contributor's section to your README.md.

How to organize and categorize small projects in GitHub?

I'm new to development and trying to upload small projects that I've worked on to my GitHub profile. These projects are not dependent on each other.
My issue is that some of them are small single-file projects. Sort of like mini challenges that I've solved. So I'm thinking of grouping them together under one repo called "Python programming", for example.
Is this a good practice?
If yes, how should I go about it in Git, and
how can I still have a README file showing up for each mini project.
If no, what would you recommend doing?

GitHub will render a README file for every folder you visit, so when using just a single repository, one solution would be to still create one sub folder for each “subproject” that as such can have its own README file.
But before going that route, you should think about if those small projects actually belong together. That’s ultimately what should decide whether you want to put them all in the same repository or whether you want to split it up into several repositories.
Some things to consider for that decision:
If the projects do not depend on another, do they still relate to another? For example, are those projects part of a bigger programming challenge like Project Euler and you’re just collecting all your solutions? Then a single repository might make more sense.
What is the chance for individual projects to grow into bigger things? Many things start very small but can eventually grow into real things that justify their own repository. At that point, you might even get others to contribute.
Does it make sense for those individual files to share a history? Are the files even going to be edited once they are “done”? I.e. is this just a collection of finished things, or are they actually ongoing experiments?
Ultimately, it comes down to your personal choice. But GitHub, as the repository hoster, should not be driving your decision. You should create Git repositories locally as it makes sense to you. If that means you just have a single one, that’s fine. If that means you create lots of them, that’s also fine.
Unfortunately, the GitHub UI is not really made for small one-off projects. The repository list is just to unorganized for that. If you decide to use small projects, I advise you to add some prefix for categorization within your GitHub profile, so you know what this is about.
A good alternative for one-off projects, especially when it’s just a single (or a few) files are Gists. Gists are born as a way to share code snippets but under the hood, every Gist is actually a full Git repository. Of course, Gists do not offer the tools normal repositories on GitHub have (e.g. issues, pull requests, wikis). But for what you describe, you probably need neither of those. Then, Gists are a fine way to share simple things without adding full repositories to your profile. And you can still clone them (the remote URL is git#gist.github.com:/<gist-id>.git) and have a full history and support for multiple files if you need those.

Commonly, you'll see that the top level of the repo contains the README file, maybe a setup.py and some other extraneous information, and perhaps a tests folder. Then there will be a folder that shares a name with the repo. Inside of that folder is the code that's intended to be core content of the module/package/script.
It's also not unusual to see different organization, particularly with very small projects of single-file scripts.
For the specific case you mention, do whatever you like. What you propose sounds totally reasonable to me. I would not want to have a separate repo for all the challenges I solve!
I usually use a gist for trivial items I don't necessarily want to make a repo for, including coding challenges. So I would offer that as an alternative. Do whatever suits you best.

Basics of setting up a Spyder workspace and projects

I have searched for a basic tutorial regarding workspaces and projects in the Spyder IDE. What I want to understand is the basic concepts of how to use the workspace and projects to organize my code. It seems that this is perhaps basic programming skills and that is the reason why I have issues finding any kind of overview. This page seems to be related, but is actually about Eclipse and rather sparse. The Pythonxy tutorial and the documentation for Spyder does not go into any detail. Neither does the Anaconda documentation.
The questions I have are:
When should I set up a new workspace (if ever)?
When do I create a new project?
How does the PYTHONPATH depend on my workspace and project settings? Is it the same in all cases or can I customize it per workspace/project?
Are there other settings apart from the PYTHONPATH that I should configure?
How specific are the answers above to Spyder? Would it be the same for other IDEs, like Eclipse?
I am running Spyder on 64-bit Windows 7, as part of the Anaconda package.

Update Oct 2016: Spyder 3 now has project facilities similar to that of other IDEs (especially Rstudio).
Now you if you have a folder with scripts, you can go to
Projects > New Projects > Existing Directory
to import it. The selected directory will be set as the base directory for the project.

I use spyder for data analysis and I have just started using the project workspace. I believe that it allows you to write better code due to the organization. As a previous post stated that "This can be helpful in web development", which is true because web development requires good software engineering due to the complexity of the files and how they interact with each other. This organization/structure can be used in data analysis as well.
Often, data analysts that use Anaconda have an engineering or science background, not necessarily software engineering or computer science. This means that good software engineering principles may be missing (myself included). Setting up a workspace does one critical thing that I believe is missing from the discussion. It adds the workspace to the system path. Set up a project and then try
import sys
print sys.path
You will see your project's directory added to the PYTHONPATH . This means I can break up my project and import functions from different files within my project. This is highly beneficial when analysis becomes complex or you want to create some type of larger model that will be used on a regular basis. I can create all of my functions in one file, maybe functions for plots in another and then import them in a separate script file.
in myScript.py
from myFunctions import func1
from myFunctions import func2
from myPlots import histPlot
This is a much cleaner approach to data analysis and allows you to focus on one specific task at a time.
In python 3 there is the %autoreload capability so you can work on your functions and then go back to your script file and it will reload them each time if you find errors. I haven't tried this yet bc the majority of my work is in 2.7, but this would seem to add even greater flexibility when developing.
So when should you do this? I think it is always a good idea, I just started using this setup and I will never go back!

In my experience, setting up a workspace in Spyder is not always necessary.
A workspace is a space on your computer where you create and save all the files you work in. Workspaces usually help in managing your project files.
Once you create a workspace in Spyder, a pane called "Project Explorer" opens up inside Spyder. There you see in real-time the files of your project. For instance, if you generate a file with Python, it will show in that pane.
The pane let's you keep the files organized, filter them etc. This can be useful for web development for example because helps you keep your content organized.
I use Python to handle files (e.g. csv) and work with data (data analysis), and I find no use in the workspace feature.
Moreover, if you delete a file in the Project Explorer pane, the file cannot be found in the Windows recycle bin.

One critical piece of information that appears to be missing from the Spyder documentation is how to create a new workspace in the first place. When no workspace exists after installing Spyder, creating your first project automatically initiates the creation of a workspace (at least in the Anaconda 3 distribution). However, it is not as obvious how to create a new workspace when a workspace already exists.
This is the only method I have found for creating a new workspace:
(1) Select the Project explorer window in Spyder. If this window or tab doesn't appear anywhere in the Spyder application, use View > Panes > Project explorer to enable the window.
(2) Click on the folder icon in the upper-right corner of the Project explorer window. This icon brings up a dialog that can create a new workspace. The dialog allows selection of a directory for the .spyderworkspace file.

Multiple directories and/or subdirectories in IPython Notebook session?

The IPython documentation pages suggest that opening several different sessions of IPython notebook is the only way to interact with saved notebooks in different directories or subdirectories, but this is not explicitly confirmed anywhere.
I am facing a situation where I might need to interact with hundreds of different notebooks, which are classified according to different properties and stored in subdirectories of a main directory. I have set that main directory (let's call it /main) in the ipython_notebook_config.py configuration file to be the default directory.
When I launch IPython notebook, indeed it displays any saved notebooks that are within /main (but not saved notebooks within subdirectories within /main).
How can I achieve one single IPython dashboard that shows me the notebooks within /main and also shows subdirectories, lets me expand a subdirectory and choose from its contents, or just shows all notebooks from all subdirectories?
Doing this by launching new instances of IPython every time is completely out of the question.
I'm willing to tinker with source code if I have to for this ability. It's an extremely basic sort of feature, we need it, and it's surprising that it's not just the default IPython behavior. For any amount of saved notebooks over maybe 10 or 15, this feature is necessary.

The IPython documentation pages suggest that opening several different sessions of IPython notebook is the only way to interact with saved notebooks in different directories or subdirectories, but this is not explicitly confirmed anywhere.
Yes, this is a current (temporary) limitation of the Notebook server. Multi-directory support is very high on the notebook todo list (unfortunately that list is long, and devs are few and have day jobs), it is just not there yet. By 0.14 (Fall, probably), you should have no reason to be running more than one nb server, but for now that's the only option for multiple directories. All that is missing for a simple first draft is:
Associating individual notebooks with directories (fairly trivial), and
Web UI for simple filesystem navigation (slightly less trivial).
I'm willing to tinker with source code if I have to for this ability
The limiting factor, if you want to poke around in the source, is the NotebookManager, which is associated with a particular directory. If you tweak the list_notebooks() method to handle subdirectories, you are 90% there.
I was curious about this as well, so I tossed together an quick example here that allows you to at least read/run/edit/save notebooks in subdirs (walk depth is limited to 2, but easy to change). Any new notebooks will be in the top-level dir, and there is no UI for moving them around.

The interface and architecture design issues for multiple directory support (and more generally for "project" support) for iPython notebook are important to get right. A design is described in
IPEP 16: Notebook multi directory dashboard and URL mapping
and is being discussed at IPEP 16: Notebook multi directory dashboard and URL mapping · Issue #3166 · ipython/ipython

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.