Suppose a directory structure as:
├── parent_1
│ ├── child_1
│ │ ├── sub_child_1
│ │ │ └── file_1.py
│ │ └── file_2.py
│ └── file_3.py
├── parent_2
│ └── child_2
│ └── file_4.py
└── file_5.py
I want to get two arrays:
parents = ["parent_1", "parent_2"]
children = ["child_1", "child_2"]
Note that files and sub_child_1 are not included.
Using suggestions such as this, I can write:
parents = []
children = []
for root, dir, files in os.walk(path, topdown=True):
depth = root[len(path) + len(os.path.sep):].count(os.path.sep)
if depth == 0:
parents.append(dir)
elif depth == 1:
children.append(dir)
However, this is a bit wordy and I was wondering if there is a cleaner way of doing this.
Update 1
I also tried a listdir-based approach:
parents = [f for f in listdir(root) if isdir(join(root, f))]
children = []
for p in parents:
children.append([f for f in listdir(p) if isdir(join(root, p, f))])
You can clear the directories returned by os.walk to prevent it from traversing deeper when it has reached your desired depth:
for root, dirs, _ in os.walk(path, topdown=True):
if root == path:
continue
parents.append(root)
children.extend(dirs)
dirs.clear()
Related
I am creating a cookiecutter template and would like to add a folder (and the files it contains) only if a variable has a given value. For example cookiecutter.json:
{
"project_slug":"project_folder"
"i_want_this_folder":['y','n']
}
and my template structure looks like:
template
└── {{ cookiecutter.project_slug }}
├── config.ini
├── data
│ └── data.csv
├── {% if cookiecutter.i_want_this_folder == 'y' %}my_folder{% endif %}
└── some_files
However, when running cookiecutter template and choose 'n' I get an error
Error: "~/project_folder" directory already exists
Is my syntax for the folder name correct?
I was facing the same issue having the option to add or no folders with different contents (all folders can exist at the same time). The structure of the project is the following:
├── {{cookiecutter.project_slug}}
│ │
│ ├── folder_1_to_add_or_no
│ │ ├── file1.py
│ │ ├── file2.py
│ │ └── file3.txt
│ │
│ ├── folder_2_to_add_or_no
│ │ ├── image.png
│ │ ├── data.csv
│ │ └── file.txt
│ │
│ └── folder_3_to_add_or_no
│ ├── file1.py
│ └── some_dir
│
├── hooks
│ └── post_gen_project.py
│
└── cookiecutter.json
where the cookiecutter.json contains the following
{
"project_owner": "some-name",
"project_slug": "some-project",
"add_folder_one": ["yes", "no"],
"add_folder_two": ["yes", "no"],
"add_folder_three": ["yes", "no"],
}
as each directory folder_X_to_add_or_no contains different files, the trick is to remove those folders that the answer is "no", you can do this through a hook. Inside the post_gen_project.py file
# post_gen_project.py
import os
import shutil
from pathlib import Path
# Current path
path = Path(os.getcwd())
# Source path
parent_path = path.parent.absolute()
def remove(filepath):
if os.path.isfile(filepath):
os.remove(filepath)
elif os.path.isdir(filepath):
shutil.rmtree(filepath)
folders_to_add = [
'folder_one',
'folder_two',
'folder_three'
]
for folder in folders_to_add:
# Check if user wants the folder
cookiecutter_var = '{{cookiecutter.' + f'{folder}' + '}}'
add_folder = cookiecutter_var == 'yes'
# User does not want folder so remove it
if not add_folder:
folder_path = os.path.join(
parent_path,
'{{cookiecutter.project_slug}}',
'folder'
)
remove(folder_path)
Now the folders the user choose not to add will be removed.
Select add_folder_one:
1 - yes
2 - no
Choose from 1, 2 [1]:
References
This answer is based on briancapello answer on this github issue
This is my project directory
├── feed
│ ├── admin.py
│ ├── apps.py
│ ├── __init__.py
│ ├── __init__.pyc
│ ├── migrations
│ │ ├── 0001_initial.py
│ │ ├── 0002_auto_20180722_1431.py
│ │
│ ├── models.py
│ ├── __pycache__
│ │
│ ├── templates
│ │ └── feed
│ │ ├── base.html
│ │ ├── footer.html
│ │ ├── header.html
│ │ ├── hindustantimes.html
│ │ ├── index.html
│ │ ├── ndtv.html
│ │ ├── News_Home.html
│ │ ├── republic.html
│ │ └── tredns.html
│ ├── tests.py
│ ├── twitter
│ │ ├── __init__.py
│ │ ├── __pycache__
│ │ │ └── twitter_credentials.cpython-36.pyc
│ │ ├── trends.py
│ │ ├── tweets.py
│ │ └── twitter_credentials.py
│ ├── urls.py
│ └── views.py
├── __init__.py
├── manage.py
├── os
├── pironews
│ ├── __init__.py
│ |
│ ├── __pycache__
│ │ ├── __init__.cpython-36.pyc
│ │ ├── settings.cpython-36.pyc
│ │ ├── urls.cpython-36.pyc
│ │ └── wsgi.cpython-36.pyc
│ ├── settings.py
│ |
│ ├── urls.py
│ └── wsgi.py
└── __pycache__
I want to import the trends.py file in my views:
I am doing the following in the views file
from pironews.feed.twitter.trends import main
def twitter_trend(request):
output = main()
print(output)
return HttpResponse(output, content_type='text/plain')
This is my trends.py file
from __future__ import print_function
import sys # used for the storage class
import requests
import pycurl # used for curling
import base64 # used for encoding string
import urllib.parse # used for enconding
from io import StringIO# used for curl buffer grabbing
import io
import re
import json # used for decoding json token
import time # used for stuff to do with the rate limiting
from time import sleep # used for rate limiting
from time import gmtime, strftime # used for gathering time
import twitter_credentials
OAUTH2_TOKEN = 'https://api.twitter.com/oauth2/token'
class Storage:
def __init__(self):
self.contents = ''
self.line = 0
def store(self, buf):
self.line = self.line + 1
self.contents = "%s%i: %s" % (self.contents, self.line, buf)
def __str__(self):
return self.contents
def getYear():
return strftime("%Y", gmtime())
def getMonth():
return strftime("%m", gmtime())
def getDay():
return strftime("%d", gmtime())
def getHour():
return strftime("%H", gmtime())
def getMinute():
return strftime("%M", gmtime())
def generateFileName():
return getYear()+"-"+getMonth()+"-"+getDay()+""
# grabs the rate limit remaining from the headers
def grab_rate_limit_remaining(headers):
limit = ''
h = str(headers).split('\n')
for line in h:
if 'x-rate-limit-remaining:' in line:
limit = line[28:-1]
return limit
# grabs the time the rate limit expires
def grab_rate_limit_time(headers):
x_time = ''
h = str(headers).split('\n')
for line in h:
if 'x-rate-limit-reset:' in line:
x_time = line[24:-1]
return x_time
# obtains the bearer token
def get_bearer_token(consumer_key,consumer_secret):
# enconde consumer key
consumer_key = urllib.parse.quote(consumer_key)
# encode consumer secret
consumer_secret = urllib.parse.quote(consumer_secret)
# print(type(consumer_secret))
# create bearer token
bearer_token = consumer_key+':'+consumer_secret
# base64 encode the token
base64_encoded_bearer_token = base64.b64encode(bearer_token.encode('utf-8'))
# set headers
headers = {
"Authorization": "Basic " + base64_encoded_bearer_token.decode('utf-8') + "",
"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
"Content-Length": "29"}
response = requests.post(OAUTH2_TOKEN, headers=headers, data={'grant_type': 'client_credentials'})
to_json = response.json()
return to_json['access_token']
def grab_a_tweet(bearer_token, tweet_id):
# url
url = "https://api.twitter.com/1.1/trends/place.json"
formed_url ='?id='+tweet_id+'&result_type=popular' #include_entities=true
headers = [
str("GET /1.1/statuses/show.json"+formed_url+" HTTP/1.1"),
str("Host: api.twitter.com"),
str("User-Agent: jonhurlock Twitter Application-only OAuth App Python v.1"),
str("Authorization: Bearer "+bearer_token+"")
]
buf = io.BytesIO()
tweet = ''
retrieved_headers = Storage()
pycurl_connect = pycurl.Curl()
pycurl_connect.setopt(pycurl_connect.URL, url+formed_url) # used to tell which url to go to
pycurl_connect.setopt(pycurl_connect.WRITEFUNCTION, buf.write) # used for generating output
pycurl_connect.setopt(pycurl_connect.HTTPHEADER, headers) # sends the customer headers above
pycurl_connect.setopt(pycurl_connect.HEADERFUNCTION, retrieved_headers.store)
#pycurl_connect.setopt(pycurl_connect.VERBOSE, True) # used for debugging, really helpful if you want to see what happens
pycurl_connect.perform() # perform the curl
tweet += buf.getvalue().decode('UTF-8') # grab the data
pycurl_connect.close() # stop the curl
#print retrieved_headers
pings_left = grab_rate_limit_remaining(retrieved_headers)
reset_time = grab_rate_limit_time(retrieved_headers)
current_time = time.mktime(time.gmtime())
return {'tweet':tweet, '_current_time':current_time, '_reset_time':reset_time, '_pings_left':pings_left}
def main():
consumer_key = twitter_credentials.CONSUMER_KEY # put your apps consumer key here
consumer_secret = twitter_credentials.CONSUMER_SECRET # put your apps consumer secret here
bearer_token = get_bearer_token(consumer_key,consumer_secret)
tweet = grab_a_tweet(bearer_token,'23424848') # grabs a single tweet & some extra bits
print(type(tweet['tweet']))
print(tweet['_current_time'])
json_obj = json.loads(tweet['tweet'])
for i in json_obj:
for j in i['trends']:
print(j['name'])
But when I run the server I am getting following error
PiroProject/pironews/feed/views.py", line 20, in <module>
from pironews.feed.twitter.trends import main
ModuleNotFoundError: No module named 'pironews.feed'
What should I do now?
There are two things which needs to be done.
You need to do from .trends import main in the ____init___.py file twitter folder
You can then import in the view.py file from .twitter import main
Try changing from pironews.feed.twitter.trends import main to from twitter.trends import main.
I've the following dir structure:
root
└── env
├── team_1
│ ├── policies
│ │ └── file.yaml
│ └── roles
└── team_2
├── policies
└── roles
and I need to read all the files under a team directory and merge them to create one unique file.
This is my attempt:
env_path = os.path.join('root', env)
if os.path.exists(env_path):
for team_dir in os.listdir(env_path):
for root, dirs, files in os.walk(team_dir):
print(root, dirs, files)
The problem is that os.walk doesn't return anything when I pass team_dir. I should use os.path.join(env_path, team_dir) but at that point it returns the entire tree which I don't want. How can youreturn from os.walk the subdirs of already a subdir?
you have to use os.path.join(env_path, team_dir) or os.walk won't find anything.
But if you don't want all the hierarchy, just remove the start of the string:
for team_dir in os.listdir(env_path):
for root, dirs, files in os.walk(os.path.join(env_path, team_dir)):
for f in files+dirs:
print(os.path.join(root,f)[len(env_path)+1:]) # strip start of path + separator
So, I found a tutorial on creating a file browser using Gtk.TreeView but I'm facing a problem, when I select a file inside a folder I cant get the file's full path. I can get the model path but I don't know what to do with it.
This is my project tree:
.
├── browser
│ ├── index.html
│ └── markdown.min.js
├── compiler.py
├── ide-preview.png
├── __init__.py
├── main.py
├── __pycache__
│ ├── compiler.cpython-35.pyc
│ └── welcomeWindow.cpython-35.pyc
├── pyide-settings.json
├── README.md
├── resources
│ └── icons
│ ├── git-branch.svg
│ ├── git-branch-uptodate.svg
│ └── git-branch-waitforcommit.svg
├── test.py
├── WelcomeWindow.glade
└── welcomeWindow.py
When I click on main.py the path is 4, but if I click on browser/markdown.min.js I get 0:1.
In my code I check if the path's length (I split the path by ':') is bigger than 1, if not I open the file normally, if it is... This is where I'm stuck. Anyone can help?
Here is my TreeSelection on changed function:
def onRowActivated(self, selection):
# print(path.to_string()) # Might do the job...
model, row = selection.get_selected()
if row is not None:
# print(model[row][0])
path = model.get_path(row).to_string()
pathArr = path.split(':')
fileFullPath = ''
if not os.path.isdir(os.path.realpath(model[row][0])):
# self.openFile(os.path.realpath(model[row][0]))
if len(pathArr) <= 1:
self.openFile(os.path.realpath(model[row][0]))
else:
# Don't know what to do!
self.languageLbl.set_text('Language: {}'.format(self.sbuff.get_language().get_name()))
else:
print('None')
Full code is available at https://github.com/raggesilver/PyIDE/blob/master/main.py
Edit 1: Just to be more specific, my problem is that when I get the name of the file from the TreeView, I can't get the path before it, so I get index.html instead of browser/index.html.
I found a solution to my problem, the logic was to iterate through the path (e.g.: 4:3:5:0) backwards and get the last parent's name and then prepend to the path variable. So we have:
def onRowActivated(self, selection):
model, row = selection.get_selected()
if row is not None:
fullPath = ''
cur = row
while cur is not None:
fullPath = os.path.join(model[cur][0], fullPath)
cur = model.iter_parent(cur)
# do whatever with fullPath
I have a dir structure like the following:
[me#mypc]$ tree .
.
├── set01
│ ├── 01
│ │ ├── p1-001a.png
│ │ ├── p1-001b.png
│ │ ├── p1-001c.png
│ │ ├── p1-001d.png
│ │ └── p1-001e.png
│ ├── 02
│ │ ├── p2-001a.png
│ │ ├── p2-001b.png
│ │ ├── p2-001c.png
│ │ ├── p2-001d.png
│ │ └── p2-001e.png
I would like to write a python script to rename all *a.png to 01.png, *b.png to 02.png, and so on. Frist I guess I have to use something similar to find . -name '*.png', and the most similar thing I found in python was os.walk. However, in os.walk I have to check every file, if it's png, then I'll concatenate it with it's root, somehow not that elegant. I was wondering if there is a better way to do this? Thanks in advance.
For a search pattern like that, you can probably get away with glob.
from glob import glob
paths = glob('set01/*/*.png')
You can use os.walk to traverse the directory tree.
Maybe this works?
import os
for dpath, dnames, fnames in os.walk("."):
for i, fname in enumerate([os.path.join(dpath, fname) for fname in fnames]):
if fname.endswith(".png"):
#os.rename(fname, os.path.join(dpath, "%04d.png" % i))
print "mv %s %s" % (fname, os.path.join(dpath, "%04d.png" % i))
For Python 3.4+ you may want to use pathlib.glob() with a recursive pattern (e.g., **/*.png) instead:
Recursively iterate through all subdirectories using pathlib
https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob
https://docs.python.org/3/library/pathlib.html#pathlib.Path.rglob
Check out genfind.py from David M. Beazley.
# genfind.py
#
# A function that generates files that match a given filename pattern
import os
import fnmatch
def gen_find(filepat,top):
for path, dirlist, filelist in os.walk(top):
for name in fnmatch.filter(filelist,filepat):
yield os.path.join(path,name)
# Example use
if __name__ == '__main__':
lognames = gen_find("access-log*","www")
for name in lognames:
print name
These days, pathlib is a convenient option.