I need to get some data from a file .py
Inside the file we have a list like this one
authorized=["somenick", "someid", 45345090, "deadeye", 324234 ]
I want to split every item inside the list authorized like:
Somenick
Someid
45345090
324234
deadeye
But I'm also using all this information for a script which work with telethon...
basically I need to retrieve those info from that list and send it via client_messages(chat, text)
So I will need to define the Text too and the text should be:
text =''' Somenick \nSomeid \n45345090 \n324234 \ndeadeye '''
so my problem at the moment with the actual code are:
async def botadminlist(e):
ciao = open('admins.py', 'r')
for line in ciao:
x= line.split()
for i in x:
y = str(i)
m16 = await helper.control_panel.send_message(config.chat , y)
But it will send 5 messages with every item.. I want only one message with all information like:
text =''' Somenick \nSomeid \n45345090 \n324234 \ndeadeye '''
so I can have a nice output into telegram chat.
With 5 times I mean like for every item in the list will send a new message like
somenick is a message
some id is another message
etc..
I want all information inside the list being into a single variable called text with \n after every item.
DO NOT try to solve using those information, are for example. but in the list I will have int and str as the example. and more the 50 items probably.
You can use ast to turn the list representation into an actual list and join to do your formating.
import ast
file = open('test.txt','r')
lst = ast.literal_eval(file.readline().strip().split('=')[1])
print('\n'.join(list(map(str,last))))
output
somenick
someid
45345090
deadeye
324234
Related
I am trying to extract messages of a single person(customer) in a text conversation to perform text analysis.
I have the data frame in a table with text, chatid as columns. How do I extract it and apply it to data frame?
Here is the
text:
"Mike: Hello.
Sam: Hello, How can I help you?
Mike: I need information on product.
Sam: Yes, I can help you."
I need only Mike text here like "Hello. I need information on product". Need to do that on each chatid and text.
I have the code to do it one text. I extract via regular expression and split each name and message into list. But, how do i apply this to entire data frame ?
#split name and message to list
text = re.compile(r'([A-Z][a-z]+:)').split(text)[1:]
#map all messages to one person as dictionary key value
map_ = {}
for i in range(1,len(text),2):
map_[text[i-1]] = map_.get(text[i-1],'') + text[i]
#shows only mike messages
map_['Mike:']
The above works on one text.
I was able to do the split on data frame as below but cannot map all messages to one person.
#split name and message to list
df ['new_text'] = df[['text']].applymap(lambda text: re.compile(r'([A-Z][a-z]+ ?[A-Z]?:)').split(text)[1:] if pd.notnull(text) else '')
#giving error
#map all messages to one person as dictionary key value
map_ = []
for i in range(1,len(df ['text']),2):
map_[df ['text'][i-1]] = map_.get(df ['text'][i-1],'') + df ['text'][i]
I have a text file with a string that has a letter (beginning with "A" that is assigned to a random country). I import that line from the text file to be used with my code where I have a list of countries and rates. I strip the string so that I am left with the country and then I want to be able to locate the country in the string on a list of list that I created. The problem is that when I run my for loop to find the name of the country in the string in the list of lists, where each junior list has the name of the country, GDP and a rate, the for loop runs and can't find the country in the string, even though they are the same type and same spelling. Let me post my code and output below.
When I created the txt file or csv file, this is what I used:
f = open("otrasvariables2020.txt", "w")
f.write(str(mis_letras_paises) + "\n")
f.write(str(mis_paises) + "\n") #(This is the string I need)
f.write(str(mis_poblaciones) + "\n")
f.close() #to be ready to use it later
Let me post some of the output.
import linecache
with open("otrasvariables2020.txt") as otras_variables:
mis_paises = (linecache.getline("otrasvariables2020.txt",2))
#Here I get the line of text I need, I clean the string and create a
#list with 5 countries.
lista_mis_paises = mis_paises.translate({ord(i): None for i \
in "[]-\'"}).split(", ")
for i in lista_mis_paises:
if "\n" in i:
print(i)
i.replace("\n", "")
for i in lista_mis_paises:
if len(i) <= 2:
lista_mis_paises.pop(lista_mis_paises.index(i))
Final part of the question: So, ultimately what I want is to find in the array the junior list of the country in the list/string I imported from the text file. Once I locate that junior list I can use the rates and other values there for calculations I need to do. Any ideas what's wrong? The outcome should be the following: Afganistán and other 4 countries should be found in the list of lists, which, for Afganistán, happens to be the 1st item, so I should now be able to create another list of lists but with just the 5 countries instead of the 185 countries I began with.
If the concern you have is to strip special characters you don't want to use, I'll do something like that:
countries = linecache.getline("otrasvariables2020.txt",2).strip('[]-\'"').rstrip('\n').split(', ')
Note: with open("otrasvariables2020.txt") as otras_variables: is not used in the code you shared above, so can be removed.
Hope it helps.
I am writing a program to extract text from a website and write it into a text file. Each entry in the text file should have 3 values separated by a tab. The first value is hard-coded to XXXX, the 2nd value should initialize to the first item on the website with , and the third value is the next item on the website with a . The logic I'm trying to introduce is looking for the first and write the associated string into the text file. Then find the next and write the associated string into the text file. Then, look for the next p class. If it's "style4", start a new line, if it's another "style5", write it into the text file with the first style5 entry but separated with a comma (alternatively, the program could just skip the next style5.
I'm stuck on the part of the program in bold. That is, getting the program to look for the next p class and evaluate it against style4 and style5. Since I was having problems with finding and evaluating the p class tag, I chose to pull my code out of the loop and just try to accomplish the first iteration of the task for starters. Here's my code so far:
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.kcda.org/KCDA_Awarded_Contracts.htm').read())
next_vendor = soup.find('p', {'class': 'style4'})
print next_vendor
next_commodity = next_vendor.find_next('p', {'class': 'style5'})
print next_commodity
next = next_commodity.find_next('p')
print next
I'd appreciate any help anybody can provide! Thanks in advance!
I am not entirely sure how you are expecting your output to be. I am assuming that you are trying to get the data in the webpage in the format:
Alphabet \t Vendor \t Category
You can do this:
# The basic things
import urllib2
from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib2.urlopen('http://www.kcda.org/KCDA_Awarded_Contracts.htm').read())
Get the td of interest:
table = soup.find('table')
data = table.find_all('tr')[-1]
data = data.find_all('td')[1:]
Now, we will create a nested output dictionary with alphabets as the keys and an inner dict as the value. The inner dict has vendor name as key and category information as it's value
output_dict = {}
current_alphabet = ""
current_vendor = ""
for td in data:
for p in td.find_all('p'):
print p.text.strip()
if p.get('class')[0] == 'style6':
current_alphabet = p.text.strip()
vendors = {}
output_dict[current_alphabet] = vendors
continue
if p.get('class')[0] == 'style4':
print "Here"
current_vendor = p.text.strip()
category = []
output_dict[current_alphabet][current_vendor] = category
continue
output_dict[current_alphabet][current_vendor].append(p.text.strip())
This gets the output_dict in the format:
{ ...
u'W': { u'WTI - Weatherproofing Technologies': [u'Roofing'],
u'Wenger Corporation': [u'Musical Instruments and Equipment'],
u'Williams Scotsman, Inc': [u'Modular/Portable Buildings'],
u'Witt Company': [u'Interactive Technology']
},
u'X': { u'Xerox': [u"Copiers & MFD's", u'Printers']
}
}
Skipping the earlier parts for brevity. Now it is just a matter of accessing this dictionary and writing out to a tab separated file.
Hope this helps.
Agree with #shaktimaan. Using a dictionary or list is a good approach here. My attempt is slightly different.
import requests as rq
from bs4 import BeautifulSoup as bsoup
import csv
url = "http://www.kcda.org/KCDA_Awarded_Contracts.htm"
r = rq.get(url)
soup = bsoup(r.content)
primary_line = soup.find_all("p", {"class":["style4","style5"]})
final_list = {}
for line in primary_line:
txt = line.get_text().strip().encode("utf-8")
if txt != "\xc2\xa0":
if line["class"][0] == "style4":
key = txt
final_list[key] = []
else:
final_list[key].append(txt)
with open("products.csv", "wb") as ofile:
f = csv.writer(ofile)
for item in final_list:
f.writerow([item, ", ".join(final_list[item])])
For the scrape, we isolate style4 and style5 tags right away. I did not bother going for the style6 or the alphabet headers. We then get the text inside each tag. If the text is not a whitespace of sorts (this is all over the tables, probably obfuscation or bad mark-up), we then check if it's style4 or style5. If it's the former, we assign it as a key to a blank list. If it 's the latter, we append it to the blank list of the most recent key. Obviously the key changes every time we hit a new style4 only so it's a relatively safe approach.
The last part is easy: we just use ", ".join on the value part of the key-value pair to concatenate the list as one string. We then write it to a CSV file.
Due to the dictionary being unsorted, the resulting CSV file will not be sorted alphabetically. Screenshot of result below:
Changing it to a tab-delimited file is up to you. That's simple enough. Hope this helps!
I am running a server with cherrypy and python script. Currently, there is a web page containing data of a list of people, which i need to get. The format of the web page is as follow:
www.url1.com, firstName_1, lastName_1
www.url2.com, firstName_2, lastName_2
www.url3.com, firstName_3, lastName_3
I wish to display the list of names on my own webpage, with each name hyperlinked to their corresponding website.
I have read the webpage into a list with the following method:
#cherrypy.expose
def receiveData(self):
""" Get a list, one per line, of currently known online addresses,
separated by commas.
"""
method = "whoonline"
fptr = urllib2.urlopen("%s/%s" % (masterServer, method))
data = fptr.readlines()
fptr.close()
return data
But I don't know how to break the list into a list of lists at where the comma are. The result should give each smaller list three elements; URL, First Name, and Last Name. So I was wondering if anyone could help.
Thank you in advance!
You can iterate over fptr, no need to call readlines()
data = [line.split(', ') for line in fptr]
You need the split(',') method on each string:
data = [ line.split(',') for line in fptr.readlines() ]
lists = []
for line in data:
lists.append([x.strip() for x in line.split(',')])
If you data is a big 'ole string (potentially with leading or trailing spaces), do it this way:
lines=""" www.url1.com, firstName_1, lastName_1
www.url2.com, firstName_2 , lastName_2
www.url3.com, firstName_3, lastName_3 """
data=[]
for line in lines.split('\n'):
t=[e.strip() for e in line.split(',')]
data.append(t)
print data
Out:
[['www.url1.com', 'firstName_1', 'lastName_1'], ['www.url2.com', 'firstName_2',
'lastName_2'], ['www.url3.com', 'firstName_3', 'lastName_3']]
Notice the leading and trailing spaces are removed.
I'm creating a mail "bot" for one of my web services that will periodically collect a queue of e-mail messages to be sent from a PHP script and send them via Google's SMTP servers. The PHP script returns the messages in this format:
test#example.com:Full Name:shortname\ntest2#example.com:Another Full Name:anothershortname\ntest#example.com:Foo:bar
I need to "convert" that into something like this:
{
"test#example.com": [
[
"Full Name",
"shortname"
],
[
"Foo",
"bar"
]
],
"test2#example.com": [
[
"Another Full Name",
"anothershortname"
]
]
}
Notice I need to have only one key per e-mail, even if there are multiple instances of an address. I know I can probably do it with two consecutive loops, one to build the first level of the dictionary and the second to populate it, but there should be a way to do it in one shot. This is my code so far:
raw = "test#example.com:Full Name:shortname\ntest2#example.com:Another Full Name:anothershortname\ntest#example.com:Foo:bar"
print raw
newlines = raw.split("\n")
print newlines
merged = {}
for message in newlines:
message = message.split(":")
merged[message[0]].append([message[1], message[2]])
print merged
I'm getting a KeyError on the last line of the loop, which I take to mean the key has to exist before appending anything to it (appending to a nonexistent key will not create that key).
I'm new to Python and not really familiar with lists and dictionaries yet, so your help is much appreciated!
May work as:
for message in newlines:
message = message.split(":")
temp = []
temp.append(message[1])
temp.append(message[2])
merged[message[0]] = temp
Actually maybe:
for message in newlines:
message = message.split(":")
temp = []
temp.append(message[1])
temp.append(message[2])
if message[0] not in merged:
merged[message[0]] = []
merged[message[0]].append(temp)
I see that you've already accepted an answer, but maybe you're anyhow interested that what you're doing can be easily achieved with defaultdict:
from collections import defaultdict
raw = "test#example.com:Full Name:shortname\ntest2#example.com:Another Full Name:anothershortname\ntest#example.com:Foo:bar"
merged = defaultdict(list)
for line in raw.split('\n'):
line = line.split(':')
merged[line[0]].append(line[1:])
You are right about the error. So you have to check if the key is present. 'key' in dict returns True if 'key' is found in dict, otherwise False. Implementing this, here's your full code (with the debugging print statements removed):
raw = "test#example.com:Full Name:shortname\ntest2#example.com:Another Full Name:anothershortname\ntest#example.com:Foo:bar"
newlines = raw.split("\n")
merged = {}
for message in newlines:
message = message.split(":")
if message[0] in merged:
merged[message[0]].append([message[1], message[2]])
else:
merged[message[0]]=[[message[1], message[2]]]
print merged
Notice the extra brackets for the nested list on the second last line.
Just check for presence of key, if it is not present, create the key,
if it is present, then append the data to existing list.
if(messsage[0] in merged):
merged[message[0]] = [message[1],[message[2]]
else:
merged[message[0]].append([[message[1], message[2]])