Naming a list in python using a String name - python

Does anyone know of a way to name a list in python using a String. I am writing a script that iterates through a directory and parses each file and and generates lists with the contents of the file. I would like to use the filename to name each array. I was wondering if there was a way to do it similar to the exec() method but using lists instead of just a normal variable

If you really want to do it this way, then for instance:
import os
directory = os.getcwd() # current directory or any other you would like to specify
for name in os.listdir(directory):
globals()[name] = []
Each of the lists can be now referenced with the name of the file. Of course, this is a suboptimal approach, normally you should use other data structures, such as dictionaries, to perform your task.

You would be better off using a dictionary. Store the file name as the key value of the dictionary and place the contents inside the corresponding value for the key.
It's like
my_dict = {'file1.txt':'This is the contents of file1','file2.txt':'This is the content of file2'}

Related

Python: How to read variable file names in a loop

I have various JSON files on my drive that I would like to process in a loop, but how can I read them in a loop?
Basically I have a list where all filenames are included and all files are also in one folder.
The objective is to create lists out the json files in a loop.
TestList = ["cats", "dogs"]
for i in TestList:
with open ("{i}.json") as {i}_file:
print({i}_file)
Unfortunately I get syntax errors no matter how I try it.
Thank you so much in advance for your support!
Use:
TestList = ["cats", "dogs"]
for i in TestList:
with open(f"{i}.json") as fp:
print(fp.read())
First, if you use "{i}.json", add the prefix f to define this string as f-strings.
Then your variable {i}_file can't be dynamically evaluated to create the variables cats_file and dogs_file. You have to use a static name.

Use, in another project, a variable saved in a project (python)

I have a project named project1, where I take a big .txt file (around 1 GB). I make a list that has each line of the text as elements, with the following code:
txt = open('<path>', 'r', encoding="utf8")
lista = list(txt)
And then I edit the items in the list, which is not important for my question.
I need to use the variable lista in another project (project2), but i don't want to import it in the following way
from project1 import lista
because by doing that I have to run all the code in project1 to get the text in the .txt file and to edit the list.
So my goal is to use lista without having to run code that takes time, since lista will always be the same.
IMPORTANT NOTES
I can't just print it in project1, copy the output and paste it in project2 to use it as a variable, because the list is way too long.
One way I thought about that was to save lista as a string in a .txt file (let's call it lista.txt), open the .txt file in project2 and, in some way, tell python that the string in lista.txt is actually a list. Example to understand better:
In project 1
file_text = open('<path>\\lista.txt', 'w', encoding="utf8")
lista = ['<string_1>', '<string_2>', ..., '<string_n>']
file_text.write(f'{lista}')
file_text.close()
In project 2
file_text = open('<path>\\lista.txt', 'r', encoding="utf8")
list_as_string = file_text
def string_to_list(input_string):
#way to transform the list_as_string into the original "lista" variable, which is a list
#return list
string_to_list(list_as_string)
IMPORTANT: The way that I described looks to complex to me, so it was just an idea, but I'm sure there are better ways (maybe there is a way to save a python variable as a file that keeps information like its type and the directly import it in a project as a variable of that type, in this case a list)
May I suggest that you use txt.readlines() instead of list(txt) in order to get the lines unless every line in the file contains a single character. In Json/Pickle; dump/dumps enable you to save an object to an open file (you could save the list to a file) or obtain the source/bytes, respectively, that would be saved in a writable-file-object; load/loads allows to restore the content from the corresponding dump. Personally I would just make a new list using the file's path or encapsulate the code in the other script to make it less slow on import.

Looping through files using lists

I have a folder with pseudo directory (/usr/folder/) of files that look like this:
target_07750_20181128.tsv.gz
target_07750_20181129.tsv.gz
target_07751_20181130.tsv.gz
target_07751_20181203.tsv.gz
target_07751_20181204.tsv.gz
target_27103_20181128.tsv.gz
target_27103_20181129.tsv.gz
target_27103_20181130.tsv.gz
I am trying to join the above tsv files to one xlsx file on store code (found in the file names above).
I am reading say file.xlsx and reading that in as a pandas dataframe.
I have extracted store codes from file.xlsx so I have the following:
stores = instore.store_code.astype(str).unique()
output:
07750
07751
27103
So my end goal is to loop through each store in stores and find which filename that corresponds to in directory. Here is what I have so far but I can't seem to get the proper filename to print:
import os
for store in stores:
print(store)
if store in os.listdir('/usr/folder/'):
print(os.listdir('/usr/folder/'))
The output I'm expecting to see for say store_code in loop = '07750' would be:
07750
target_07750_20181128.tsv.gz
target_07750_20181129.tsv.gz
Instead I'm only seeing the store codes returned:
07750
07751
27103
What am I doing wrong here?
The reason your if statement fails is that it checks if "07750" etc is one of the filenames in the directory, which it is not. What you want is to see if "07750" is contained in one of the filenames.
I'd go about it like this:
from collections import defaultdict
store_files = defaultdict(list)
for filename in os.listdir('/usr/folder/'):
store_number = <some string magic to extract the store number; you figure it out>
store_files[store_number].append(filename)
Now store_files will be a dictionary with a list of filenames for each store number.
The problem is that you're assuming a substring search -- that's not how in works on a list. For instance, on the first iteration, your if looks like this:
if "07750" in ["target_07750_20181128.tsv.gz",
"target_07750_20181129.tsv.gz",
"target_07751_20181130.tsv.gz",
... ]:
The string "07755" is not an element of that list. It does appear as a substring, but in doesn't work that way on a list. Instead, try this:
for filename in os.listdir('/usr/folder/'):
if '_' + store + '_' in filename:
print(filename)
Does that help?

Python ftplib: How to store results of `FTP.retrlines` in a list?

I'd like to retrieve files' name of a directory and I use the method ftplib.retrlines('NLST' + path).
It prints all files' names in the directory path. But I want to store those files' names in a container, e.g., a list, instead of printing them in the console. How to do that ?
The second (optional) argument to the FTP.retrlines is a callback.
FTP.retrlines(command[, callback])
You can use it like:
lines = []
sess.retrlines('NLST ' + path, lines.append)
See also Creating list from retrlines in Python.
You can use FTP.nlst() method. It returns the file names as a list.
>>> FTP.nlst('path')
['x','y','z']

Use of global dictionary to create variables

I need to create an unknown number of python variables, based on a list of file in a folder.
I found that I could use the global dictionary to create and initialize those variables:
# libraries import
import os.path
import glob
import numpy as np
# list of all the text files in the folder
list = glob.glob("*.txt")
# creation of the variables based on the name of each file
for file in list:
shortname = os.path.splitext(file)[0]
globals()[shortname] = np.loadtxt(file)
However, I was wondering if it was a good practice to access the global dictionary for variable assignment in python (when we do not know the number and name of the variables in advance) or if there was an alternative method preferable.
You should use a dedicated dictionary for this:
files = {f: np.loadtxt(f) for f in glob.glob("*.txt")}
Generally, you should not mix data and variable or attribute names. Your code could shadow just any Python built-in if a file with the same name exists.
No, you probably shouldn't be using globals for this. Instead, create a dictionary or class and store the values in that.

Categories

Resources