Get File name from a dash delimited string in python lambda - python

I have a bunch of file name like this in s3
1623130500-1623130500-Photo-verified-20210631-0-22.csv.gz
1623130500-1623130500-Add-to-cart-20210631-0-4.csv.gz
with lambda python code can I separate only Photo-verified / Add-to-cart from the above?
I need a solution which give me file name on runtime from above kind of string

I think you are asking how to extract either Photo-verified or Add-to-cart from the above strings.
You can split on - and then extract the portion you want. Basically, you don't want the first two parts or the last 3 parts, so use:
filename.split('-')[2:-3]
That will return a list with:
['Photo', 'verified']
You could then join() them together using:
'-'.join(filename.split('-')[2:-3])
This would give:
Photo-verified
On the second string, it would give:
Add-to-cart

Related

Exclude strings containing a particular string without using filter function - Python regex

I was trying to exclude strings containing a particular string without using filter function. For example:
/abc/mno/pqr/uvw.py
/abc/mno/rst/uvw.py
/abc/mno/pqr/uvw.c
/abc/mno/vwx/rst.c
/abc/mno/pqr/xyz.java
The expected output is all .py and .java paths except the one containing substring '/rst/' or '/vwx/'. That is,
/abc/mno/pqr/uvw.py
/abc/mno/pqr/xyz.java
I tried:
x = re.findall("^(?!.*/rst/|/vwx/.*)\.py|^(?!.*/rst/|/vwx/.*)\.java", txt)
But I did not get the expected output.
Try (regex101):
^(?!.*/rst/.*|.*/vwx/.*).*(?:\.py|\.java)$

How to check if a line contains a string in Python

I'm trying to check if a subString exists in a string using regular expression.
RE : re_string_literal = '^"[a-zA-Z0-9_ ]+"$'
The thing is, I don't want to match any substring. I'm reading data from a file:
Now one of the lines have this text:
cout<<"Hello"<<endl;
I just want to check if there's a string inside the line and if yes, store it in a list.
I have tried the re.match method but it only works if we have to match a pattern, but in this case, I just want to check if a string exists or not, if yes, store it somewhere.
re_string_lit = '^"[a-zA-Z0-9_ ]+"$'
text = 'cout<<"Hello World!"<<endl;'
re.match(re_string_lit,text)
It doesn't output anything.
In simple words,
I just want to extract everything inside ""
If you just want to extract everything inside "" then string splitting would be much simpler way of doing things.
>>> a = 'something<<"actualString">>something,else'
>>> b = a.split('"')[1]
>>> b
'actualString'
The above example would only work for not more than 2 instances of double quotes ("), but you could make it work by iterating over every substring extracted using split method and applying a much simpler Regular Expression.
This worked for me:
re.search('"(.+?)"', 'cout<<"Hello"<<endl')

How to extract specific characters from a string that can vary

I'm trying to extract the specific part of the name of the file that can have varying number of '_'. I previously used partition/rpartition to strip everything before and after underscore bars, but I didn't take into account the possibilities of different underscore bar numbers.
The purpose of the code is to extract specific characters in between underscore bars.
filename = os.path.basename(files).partition('_')[2].rpartition('_')[0].rpartition('_')[0].rpartition('_')[0]
The above is my current code. A typical name of the file looks like:
P0_G12_190325184517_t20190325_5
or it can also have
P0_G12_190325184517_5
From what I understand, my current code's rpartition needs to match the number of underscore bars of the file for the first file, but the same code doesn't work for the second file obviously.
I want to extract
G12
this part can also be just two characters like G1 so two to three characters from the above types of filenames.
You can use:
os.path.basename(files).split('_')[1]
You could either use split to create a list with the separate parts, like this:
files.split('_')
Or you could use regex:
https://regex101.com/r/jiUNLV/1
And do like this:
import re
pattern = r'.*_(\w{2,3})_\d+.*'
match = re.match(pattern, files)
if match:
print(match.group(1))

Edit file names in Python according to certain rules

I have a great number of files whose names are structured as follows:
this_is_a_file.extension
I got to strip them of what begins with the last underscore (included), preserving the extension, and save the file with the new name into another directory.
Note that these names have variable length, so I cannot leverage single characters' position.
Also, they have a different number of underscores, otherwise I'd have applied something similar: split a file name
How can I do it?
You could create a function that splits the original filename along underscores, and splits the last segment along periods. Then you can join it all back together again like so:
def myJoin(filename):
splitFilename=filename.split('_')
extension=splitFilename[-1].split('.')
splitFilename.pop(-1)
return('_'.join(splitFilename)+'.'+extension[-1])
Some examples to show it working:
>>> p="this_is_a_file.extension"
>>> myJoin(p)
'this_is_a.extension'
>>> q="this_is_a_file_with_more_segments.extension"
>>> myJoin(q)
'this_is_a_file_with_more.extension'

How to use regular expression to retrieve specific text in Python

So I'm trying to retrieve all male members from a name list, it looks something like this: A B(male) C D E(male) F(male) G
All strings are separated with space. The name list is saved as a txt file: name.txt
I would like to have Python to read in name.txt and retrieve all males from the list, then print them out (in this case B E and F).
How do I use regular expression to achieve that? Thanks!
I am just giving the regex expression, regex = r"(\w+)\(male\)"
It's apparently some data. Why are you storing and retrieving it from a text file?
If it's some temp data being stored in a text file maybe change the formatting and specify both 'Male' and 'Female' and also one entry per line so you can loop through the file?
That'll be more systematic.
So all you'll have to do is look for a string match for 'Male' in every line and select that line to print.

Categories

Resources