Read paths in csv and open in unix - python

I need to read paths in a csv and go to these file paths in unix. I am wondering if there is any way to do that using unix or python commands. I am unsure of how to proceed and I am not able to find much resources in the net either.
The number of rows in the excel.csv is a lot and similar to below. I need to open the excel.csv and then read the first line and go to this file path. Once this file is opened using the file path, I need to be able to read the file and extract out certain information. I tried using python for this but I am unable to find much information and I am wondering if I can use unix commands to solve this. I am clueless on how to proceed for this one so I would appreciate any reference or help using either python or unix commands. Thank you!
/folder/file1
/folder/file2
/folder/file3

It shouldn't be very difficult to do this in Python, as reading csv files is part of the standard library. Your code could look something like this:
with open('data.csv', newline='') as fh:
# depending if the first row describes the header or not,
# you can also use the simple csv.reader here
for row in csv.DictReader(fh, strict=True):
# again, if you use the simple csv.reader, you'll have to access
# the column via index instead
file_path = row['path']
with open(file_path, 'r') as fh2:
data = fh2.read()
# do something with data

Related

Archive files directly from memory in Python

I'm writing this program where I get a number of files, then zip them with encryption using pyzipper, and also I'm using io.BitesIO() to write these files to it so I keep them in-memory. So now, after some other additions, I want to get all of these in-memory files and zip them together in a single encrypted zip file using the same pyzipper.
The code looks something like this:
# Create the in-memory file object
in_memory = BytesIO()
# Create the zip file and open in write mode
with pyzipper.AESZipFile(in_memory, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zip_file:
# Set password
zip_file.setpassword(b"password")
# Save "data" with file_name
zip_file.writestr(file_name, data)
# Go to the beginning
in_memory.seek(0)
# Read the zip file data
data = in_memory.read()
# Add the data to a list
files.append(data)
So, as you may guess the "files" list is an attribute from a class and the whole thing above is a function that does this a number of times and then you get the full files list. For simplicity's sake, I removed most of the irrelevant parts.
I get no errors for now, but when I try to write all files to a new zip file I get an error. Here's the code:
with pyzipper.AESZipFile(test_name, "w", compression=pyzipper.ZIP_LZMA, encryption=pyzipper.WZ_AES) as zfile:
zfile.setpassword(b"pass")
for file in files:
zfile.write(file)
I get a ValueError because of os.stat:
File "C:\Users\vulka\AppData\Local\Programs\Python\Python310\lib\site-packages\pyzipper\zipfile.py", line 820, in from_file
st = os.stat(filename)
ValueError: stat: embedded null character in path
[WHAT I TRIED]
So, I tried using mmap for this purpose but I don't think this can help me and if it can - then I have no idea how to make it work.
I also tried using fs.memoryfs.MemoryFS to temporarily create a virtual filessystem in memory to store all the files and then get them back to zip everything together and then save it to disk. Again - failed. I got tons of different errors in my tests and TBH, there's very little information out there on this fs method and even if what I'm trying to do is possible - I couldn't figure it out.
P.S: I don't know if pyzipper (almost 1:1 zipfile with the addition of encryption) supports nested zip files at all. This could be the problem I'm facing but if it doesn't I'm open to any suggestions for a new approach to doing this. Also, I don't want to rely on a 3rd party software, even if it is open source! (I'm talking about the method of using 7zip to do all the archiving and ecryption, even though it shouldn't even be possible to use it without saving the files to disk in the first place, which is the main thing I'm trying to avoid)

Python Parse GitHub File

I am looking for more information regarding parsing files in Python, specifically from someone's GitHub. For instance, if Person A has a GitHub account with a file with the contents:
name = Person A
my script = scriptA.sh
script output = yarn output
my other files = fileA, fileB
I would want to be able to access this information and store with my Python script.
This is not for a class or anything so I am struggling to find good, clear, beginner level information going over concepts like this. Does anyone have any advice here? I have basic parsing and Python understanding, but I want to advance this. Here is some pseudocode I am using to try to brainstorm.
def parseAddFile(filename):
with open(filename, 'r') as file:
lines = file.readlines() # this should read all the lines of the file
dict_of_contents={} # this will then put the contents into a dictionary
for parameter in line
name =
etc...
Just looking to grow my knowledge, not asking for answers only.

How to read .mod files using Python

I have a file with extension .mod.
That file contains fields and data under each field, just like every csv files do.
I need to read this .mod file using Python.
Please suggest me a way out in Python using Pandas or any other package that could help me with this.
Thanks!
On Windows 10, using Python 3.6, I could successfully open the file and read the first line:
with open('09nested.mod') as f:
print(f.readlines()[0])
// File: 09nested.mod
>>>

Skip first row in python

I am using
for file in fileList:
f.write(open(file).read())
I am combining files if a folder to one csv. However I dont need X amount of headers in the one file.
Is there a way to use this and have it write everything but the first row (header) coming from the files in the files?
Use python csv module
Or something like that:
for file_name in file_list:
file_obj = open(file_name)
file_obj.read()
f.write(file_obj.read())
This solution doesn't load whole file into memory, so when you use file_obj.readlines(), whole file content is load into memory
Note, that it isn't good practice to name variables with builtin names
for file in fileList:
mylines = open(file).readlines()
f.write("".join(mylines[1:]))
This should point you in the right direction. Please don't do your homework on stackoverflow.
If it's a cvs file, look into python csv lib.

How to concatenate several Javascript files into one file using Python

I would like to know how I can use Python to concatenate multiple Javascript files into just one file.
I am building a component based engine in Javascript, and I want to distribute it using just one file, for example, engine.js.
Alternatively, I'd like the users to get the whole source, which has a hierarchy of files and directories, and with the whole source they should get a build.py Python script, that can be edited to include various systems and components in it, which are basically .js files in components/ and systems/ directories.
How can I load files which are described in a list (paths) and combine them into one file?
For example:
toLoad =
[
"core/base.js",
"components/Position.js",
"systems/Rendering.jd"
]
The script should concatenate these in order.
Also, this is a Git project. Is there a way for the script to read the version of the program from Git and then write it as a comment at the beginning?
This will concatenate your files:
def read_entirely(file):
with open(file, 'r') as handle:
return handle.read()
result = '\n'.join(read_entirely(file) for file in toLoad)
You may then output them as necessary, or write them using code similar to the following:
with open(file, 'w') as handle:
handle.write(result)
How about something like this?
final_script = ''
for script_name in toLoad:
with open(script_name, 'r') as f:
final_script += ('\n' + f.read())
with open('engine.js', 'w') as f:
f.write(final_script)
You can do it yourself, but this is a real problem that real tools are solving more sophisticatedly. Consider "JavaScript Minification", e.g. using http://developer.yahoo.com/yui/compressor/

Categories

Resources