Is there any way to retrieve file name using Python? [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In a Linux directory, I have several numbered files, such as "day1" and "day2". My goal is to write a code that retrieves the number from the files and add 1 to the file that has the biggest number and create a new file. So, for example, if there are files, 'day1', 'day2' and 'day3', the code should read the list of files and add 'day4'. To do so, at least I need to know how to retrieve the numbers on the file name.

I'd use os.listdir to get all the file names, remove the "day" prefix, convert the remaining characters to integers, and take the maximum.
From there, it's just a matter of incrementing the number and appending it to the same prefix:
import os
max_file = max([int(f[3:]) for f in os.listdir('some_directory')])
new_file = 'day' + str(max_file + 1)

Get all files with the os module/package (don't have the exact command handy) and then use regex(package) to get the numbers. If you don't want to look into regex you could remove the letters from your string with replace() and convert that string with int().

Glob would be good for this. It is kind of regex, but specially for file search and simpler. Basically you just use * as a wildcard, and you can select numbers too. Just google what it exactly is. It can be pretty powerful and is native to the bash shell for example.
for glob import glob
from pathlib import Path
pattern = "day"
last_file_number = max(map(lambda f: int(f[len(pattern):]), glob(pattern + "[0-9]*")))
Path("%s%d" % (pattern, last_file_number + 1)).touch()
You can also see that I use pathlib here. This is a library to deal with the file system in an OOP manner. Some people like, some don't.
So, a little disclaimer: Glob is not as powerful as regex. Here daydream for example won't be matched, but day0dream would still be matched. You can also try day*[0-9], but then daydream0 would still be matched. Off course you can also use day[0-9] if you know you stay below double digits. So, if your use case requires this, you can use glob and filter down with regex.

Related

Escape characters when joining strings [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
Trying to read a .csv file content from folder. Example code:
Files_in_folder = os.listdir(r"\\folder1\folder2")
Filename_list = []
For filename in files_in_folder:
If "sometext" in filename:
Filename_list.append(filename)
Read_this_file = "\\folder1\folder2"+max(filename_list)
Data = pandas.read_csv(Read_this_file,sep=',')
Fetching the max filename works, but the Data variable fails:
FileNotFoundError: no such file or directory.
I am able to access the folder as we see in my first line of code, but when I combine two strings, putting the r in front doesn't work, any ideas?
You need to add \ to your path when concatenating:
read_this_file = '\\folder1\\folder2\\' + max(filename_list)
But a better way to avoid that problem is to use
os.path.join("\\folder1\\folder2", max(filename_list))
for a working code, use this:
files_in_folder = os.listdir("folder1/folder2/")
filename_list = []
for filename in files_in_folder:
if "sometext" in filename:
filename_list.append(filename)
read_this_file = "folder1/folder2/"+max(filename_list)
data = pd.read_csv(read_this_file,sep=',')
Explanation:
When you put r before a string, the character following a backslash is included in the string without change, and all backslashes are left in the string.
In your example, if you try to print "\folder1\folder2" Python will read the '\f' part as a special character (just as it would for \n for example).

How to remove nonprintable characters in csv file? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have some invalid characters in my file that I'm trying to remove. But I ran into a strange problem with one of them.
When I try to use the replace function then I'm getting an error SyntaxError: EOL while scanning string literal.
I found that I was dealing with \x1d which is a group separator. I have this code to remove it:
import pandas as pd
df = pd.read_csv('C:/Users/tkp/Desktop/Holdings_Download/dws/example.csv',index_col=False, sep=';', encoding='utf-8')
print(df['col'][0])
df = df['col'][0].encode("utf-8").replace(b"\x1d", b"").decode()
df = pd.DataFrame([x.split(';') for x in df.split('\n')])
print(df[0][0])
Output:
Is there another way to do this? Because it seems to me that I couldn't do it any worse this.
Notice that you are getting a SyntaxError. This means that Python never gets as far as actually running your program, because it can't figure out what the program is!
To be honest, I'm not quite sure why this happens in this case, but using "exotic" characters in string constants is always a bit iffy, because it makes you dependent on what the character encoding of the source code is, and puts you at the mercy of all sorts of buggy editors. Therefore, I would recommend using the '\uXXXX' syntax to explicitly write the Unicode number for the character you wish to replace. (It looks like what you have here is U+2194 DOUBLE ARROW, so '\u2194' should do it.)
Having said that, I would first verify that this is actually the problem, by changing the '↔' bit to something more mundane, like 'x' and seeing whether that causes the same error. If it does, then your problem is somewhere else...
You have to specify the encoding for which this character is defined in the charset.
df = df.replace('#', '', encoding='utf-8')

How to shorten a path (string) to just the file [duplicate]

This question already has answers here:
Extract file name from path, no matter what the os/path format
(22 answers)
Closed 3 years ago.
I'm currently making an autosave function for a program based in python, and I have very little knowledge of python. I remember learning how to cut, but this is a bit more of an advanced cut. Right now, I have it printing me the path file in string format (no I cannot use os.path or anything like that) and what I want, is for it to remove the entire path except for NAME.pse(The name will change as well). Here is an example path and ultimately what I'd like it to look like, but I would like for it to work with any path that it prints out so it has compatibility with anyone's computer in any file structure, along with any name of the session file (the .pse):
C:/Users/Install/OneDrive/B&BLab/Coding/TestingCell/PyMol.pse => PyMol.pse
You can use the split() function to split the string at all / characters. This will return a list, then just take the last element of that list:
myString = "C:/Users/Install/OneDrive/B&BLab/Coding/TestingCell/PyMol.pse"
myFile = myString.split('/')[-1]
However, Python does provide a function for this. Check out this answer.
If you want only the filename:
print("".join(stringa.split('/')[-1:]))
And if you want also the containing folder(s):
print("/".join(stringa.split('/')[-2:]))

Sort voluminous file text by date using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm new with python and I have to sort by date a voluminous file text with lot of line like these:
CCC!LL!EEEE!EW050034!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!GH676589!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!IJ6758004!2016-04-01T04:39:54.000Z!7!1!1!1
Can someone help me please ?
Thank you all !
Have you considered using the *nix sort program? in raw terms, it'll probably be faster than most Python scripts.
Use -t \! to specify that columns are separated by a ! char, -k n to specify the field, where n is the field number, and -o outputfile if you want to output the result to a new file.
Example:
sort -t \! -k 5 -o sorted.txt input.txt
Will sort input.txt on its 5th field, and output the result to sorted.txt
I would like to convert the time to timestamp then sort.
first convert the date to list.
rawData = '''CCC!LL!EEEE!EW050034!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!GH676589!2016-04-01T04:39:54.000Z!7!1!1!1
CCC!LL!EEEE!IJ6758004!2016-04-01T04:39:54.000Z!7!1!1!1'''
a = rawData.split('\n')
>>> import dateutil.parser,time
>>> sorted(a,key= lambda line:time.mktime(dateutil.parser.parse(line.split('!')[4]).timetuple()))
['CCC!LL!EEEE!EW050034!2016-04-01T04:39:54.000Z!7!1!1!1 ', ' CCC!LL!EEEE!GH676589!2016-04-01T04:39:54.000Z!7!1!1!1', ' CCC!LL!EEEE!IJ6758004!2016-04-01T04:39:54.000Z!7!1!1!1']
Take a look into regular expression module, I've used it a couple of times and it looks lretty simple to do what you want with this module
https://docs.python.org/2/library/re.html Here is the docs but try googling for regular expression python examples to make it more clear, good luck.

Parsing a series of fixed-width files [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a series (~30) of files that are made up of rows like:
xxxxnnxxxxxxxnnnnnnnxxxnn
Where x is a char and n is a number, and each group is a different field.
This is fixed for each file so would be pretty easy to split and read with a struct or slice; however I was wondering if there's an effective way of doing it for a lot of files (with each file having different fields and lengths) without hard-coding it.
One idea I had was creating an XML file with the schema for each file, and then I could dynamically add new ones where required and the code would be more portable, however I wanted to check there are no simpler/more standard ways of doing this.
I will be outputting the data into either Redis or an ORM if this helps, and each file will only be processed once (although other files with different structures will be added at later dates).
Thanks
You could use itertools.groupby, with str.isdigit for instance (or isalpha):
>>> line = "aaa111bbb22cccc345defgh67"
>>> [''.join(i[1]) for i in itertools.groupby(line,str.isdigit)]
['aaa', '111', 'bbb', '22', 'cccc', '345', 'defgh', '67']
I think #fredtantini's answer contains a good suggestion — and here's a fleshed out way of applying it to your problem coupled with a minor variation of the code in my answer to a related question titled Efficient way of parsing fixed width files in Python:
from itertools import groupby
from struct import Struct
isdigit = str.isdigit
def parse_fields(filename):
with open(filename) as file:
# determine the layout of fields from the first line of the file
firstline = file.readline().rstrip()
fieldwidths = (len(''.join(i[1])) for i in groupby(firstline, isdigit))
fmtstring = ''.join('{}s'.format(fw) for fw in fieldwidths)
parse = Struct(fmtstring).unpack_from
file.seek(0) # rewind
for line in file:
yield parse(line)
for row in parse_fields('somefile.txt'):
print(row)

Categories

Resources