I am struggling to write a bash script or python script to find a string from a file.
For example, I need to search for usrname4 and if it found then I need fetch its group. In this case, it is group1. Since the file format is tricky i am looking after some hints.
The file contents are in below format.
group1 (-,usrname1,abc.co.uk)\
(-,usrname1,xyz.co.uk)\
(-,usrname2,abc.co.uk)\
(-,usrname2,xyz.co.uk)\
(-,usrname3,abc.co.uk)\
(-,usrname3,xyz.co.uk)\
(-,usrname4,abc.co.uk)\
(-,usrname4,xyz.co.uk)\
(-,usrname5,abc.co.uk)\
(-,usrname5,xyz.co.uk)\
(-,usrname6,abc.co.uk)\
(-,usrname6,xyz.co.uk)\
(-,usrname7,abc.co.uk)\
(-,usrname7,xyz.co.uk)\
group2 (-,usrname8,abc.co.uk)\
(-,usrname8,xyz.co.uk)\
(-,usrname9,abc.co.uk)\
(-,usrname9,xyz.co.uk)\
(-,usrname10,abc.co.uk)\
(-,usrname10,xyz.co.uk)\
(-,usrname11,abc.co.uk)\
(-,usrname11,xyz.co.uk)\
(-,usrname12,abc.co.uk)\
(-,usrname12,xyz.co.uk)\
(-,usrname13,abc.co.uk)\
(-,usrname13,xyz.co.uk)\
(-,usrname14,abc.co.uk)\
(-,usrname14,xyz.co.uk)\
I added the following specifications:
A group can be found by looking for the a line that starts without a space
A group name is without spaces
When username occurs more than once, only look at the first one
Search for the last group mentioned before the match
First select all groups and all matched lines.
From that set look for the last line before the first match, that must be the group.
usr=usrname4
grep -Eo "^[^ ]+|,${usr}," file | grep -B1 ",${usr}," | head -1
Related
I am a newbie with both python and jupyter notebook.
After searching I found that if I wanted to execute multiple lines for sqlContext I had to use triple """ like this:
sqlContext.sql("""select year,month,count(distinct station) as count
from tempReadingsTable
where year>=1950 and year<=2014 and value>=10
group by year,month
order by count desc
""").show()
Now, I am trying to find the same for this:
schMax = schMax.groupBy('year').
agg(fun.max('value').alias('value')).
join(sch['year','value']).
drop_duplicates(['year']).
select(['year','station','value']).
orderBy(['value'],ascending=[0])
Unless I run it all in one line it fails! how can I prevent that? I want to be able to execute all the lines separately...
You can use \ at the end of a line to have python continue reading the next line as part of the previous line (removing white spaces if needed).
Although I think it's more readable if you put the . on the start of each new line. It's more apparent that it's part of the previous statement since normal statements never begin with . in python.
schMax = schMax.groupBy('year')\
.agg(fun.max('value').alias('value'))\
.join(sch['year','value'])\
.drop_duplicates(['year'])\
.select(['year','station','value'])\
.orderBy(['value'],ascending=[0])
I am trying to query a column from a database with contains/ilike, they are producing different results. Any idea why?
My current code;
search = 'nel'
find = Clients.query.filter(Clients.lastName.ilike(search)).all()
# THE ABOVE LINE PRODUCES 0 RESULTS
find = Clients.query.filter(Clients.lastName.contains(search)).all()
# THE ABOVE LINE PRODUCES THE DESIRED RESULTS
for row in find:
print(row.lastName)
My concern is am I missing something? I have read that 'contains' does not always work either. Is there a better way to do what I am doing?
For ilike and like, you need to include wildcards in your search like this:
Clients.lastName.ilike(r"%{}%".format(search))
As the Postgres docs say:
LIKE pattern matching always covers the entire string. Therefore, to match a sequence anywhere within a string, the pattern must start and end with a percent sign.
The other difference is that contains is case-sensitive, while ilike is insensitive.
I'm trying to filter logs based on the domain name. For example I only want the results of domain: bh250.example.com.
When I use the following query:
http://localhost:9200/_search?pretty&size=150&q=domainname=bh250.example.com
the first 3 results have a domain name: bh250.example.com where the 4th having bh500.example.com
I have read several documentations on how to query to Elasticsearch but I seem to miss something. I only want results having 100% match with the parameter.
UPDATE!! After question from Val
queryFilter = Q("match", domainname="bh250.example.com")
search=Search(using=dev_client, index="logstash-2016.09.21").query("bool", filter=queryFilter)[0:20]
You're almost there, you just need to make a small change:
http://localhost:9200/_search?pretty&size=150&q=domainname:"bh250.example.com"
^ ^
| |
use colon instead of equal... and double quotes
I am trying to make a custom right click command for nautilus.
I managed to find a useful content here.
What I don't understand is what does these two lines essentially mean ?
IFS_BAK=$IFS
IFS="
"
And these are present at the bottom too. What do they mean ?
Please help.
IFS_BAK is essentially creating a backup of existing value of IFS variable.
The next line then assigns a new value to IFS i.e specific/required the script.
More info on Internal Field Separator (IFS) can be found here: https://unix.stackexchange.com/questions/16192/what-is-ifs-in-context-of-for-looping
https://unix.stackexchange.com/questions/184863/what-is-the-meaning-of-ifs-n-in-bash-scripting
https://unix.stackexchange.com/questions/26784/understanding-ifs
Okay, I got it.
It is called an 'Internal Field Separator', a special variable in shell.
If you set IFS to | (i.e. IFS=| ), | will be treated as delimiters between words/fields when splitting a line of input.
In the first line:
IFS_BAK=$IFS
the initial 'IFS' value is stored in the variable 'IFS_BAK' and the value of IFS is set to 'new line' by
IFS="
"
so that the entire line is treated as a 'single input'.
Later, at the end of the program, the IFS value is restored to what it was originally.
I have never used procmail before but I believe (from my R&D) that it is likely my best choice to crack my riddle. Our system receives an email, out of which I need 3 values, which are:
either a 4-digit or 5-digit integer from the SUBJECT line. (we will refer to as "N")
email alias from REPLY-TO line (we will refer to as "R")
determine the type of email it is, by which I mean to say a "case" or a "project". (we will refer to as "T") This value would be parsed out of the SUBJECT line.
If any one could help me with that recipe, I would be most appreciative.
The next thing I need to do is:
send these 3 values to a Python script (can I do this directly from procmail? pipe? something else?)
delete the email messages
I need to accept these emails from only 4 domain names, such as:
(#sjobeck.com|#cases.example.com|#messages.example.com|#bounces.example.com)
Last, is to pipe these 3 values in to the second script, and some advice as to the best syntax to do so. Any advice here is most appreciative. Would this be something like this:
this-recipe $N $T $R | second-script.py
Or exactly how would that look? Or is this not a procmail issue and a Python issue? (if it is, that's fine, I'll handle it over there.)
Thanks so much!
Jason
Procmail can extract those values, or you can just pass the whole message to Python on stdin.
Assuming you want the final digits and you require there to be 4 or 5, something like this:
R=`formail -zxReply-to: | sed 's/.*<//;s/>.*//'`
:0
* ^From:.*#(helpicantfindgoogle\.com|searchengineshateme\.net|disabled\.org)\>
* ^Subject:(.*[^0-9])?\/[0-9][0-9][0-9][0-9][0-9]?$
| scriptname.py --reply-to "$R" --number "$MATCH"
This illustrates two different techniques for extracting a header value; the Reply-To header is extracted by invoking formail (this will extract just the email terminus, as per your comment; if you mean something else by "alias" then please define it properly) while the trailing 4- or 5-number integer from the Subject is grabbed my matching it in the condition with the special operator \/.
Update: Added an additional condition to only process email where the From: header indicates a sender in one of the domains helpicantfindgoogle.com, searchengineshateme.net, or disabled.org.
As implied by the pipe action, your script will be able to read the triggering message on its standard input, but if you don't need it, just don't read standard input.
If delivery is successful, Procmail will stop processing when this recipe finishes. Thus you should not need to explicitly discard a matching message. (If you want to keep going, use :0c instead of just :0.)
As an efficiency tweak (if you receive a lot of email, and only a small fraction of it needs to be passed to this script, for example) you might want to refactor to only extract the Reply-To: when the conditions match.
:0
* ^From:.*#(helpicantfindgoogle.com|searchengineshateme\.net|disabled\.org)\>
* ^Subject:(.*[^0-9])?\/[0-9][0-9][0-9][0-9][0-9]?$
{
R=`formail -zxReply-To: | sed 's/.*<//;s/>.*//'`
:0
| scriptname.py --reply-to "$R" --number "$MATCH"
}
The block (the stuff between { and }) will only be entered when both the conditions are met. The extraction of the number from the Subject: header into $MATCH works as before; if the From: condition matched and the Subject: condition matched, the extracted number will be in $MATCH.