Extracting data using python - python

I have a text file like this
app galaxy store Galaxy Store
app text editor Text Editor
app smartthings SmartThings
app samsung pay Samsung Pay
app pdssgentool PdssGenTool
app pdss-sample pdss-sample
app encodertool EncoderTool
app play store Play Store
app play music Play Music
app keep notes Keep Notes
where the format is like
app(tab)name(tab)name_starting_with_caps
From here I only want to extract the name part and store it in a list. Can anyone help me how to do that in python. For example list=[galaxy store, text editor, ...]

Use csv.reader with delimiter='\t':
import csv
with open('yourfile') as f:
csv_reader = csv.reader(f, delimiter='\t')
names = [row[1] for row in csv_reader]

Related

"a"/"append" only once to yaml file

I'm trying to use yaml.dump to send some data from an .csv file to .yml file!
Everything works and I can send data successfully BUT the script is meant to be run many times... And this adds the same data to the .yml file every time i run the script.
My code in python:
# Reading whole csv file with panda library.
df = pandas.read_csv('Keywords.csv', sep=';')
for index, row in df.iterrows(): # Iterates the csv file.
pack_name = row.NAME # Name of the pack
print(pack_name)
def dumpFunction():
with open(f'packs/{pack_name}/config.yml', 'a') as outfile: # HERE I USE APPEND
yaml.dump_all(
df.loc[(df['NAME'] == pack_name)].to_dict(orient='records'), # Send keywords to the right pack.
outfile,
sort_keys=False,
indent=4
)
if pack_name and os.path.exists(f'packs/{pack_name}'): # Check if the pack is available, row.NAME is the pack name.
dumpFunction()
else:
pathlib.Path(f"packs/{pack_name}").mkdir() # If pack not exist I will make one new
dumpFunction()
print(f'{pack_name} was made!')
My .csv file (shortened) - semicolon separated:
NAME;KEYWORDS
.NET;apm, .net, language agent
.NET core;apm, .net
.NET MVC Web API;apm, .net
ActiveRecord;apm, ruby
Acts_as_solr;apm, ruby
My .yml file after I run script 3 times:
NAME: .NET
KEYWORDS: apm, .net, language agent
NAME: .NET
KEYWORDS: apm, .net, language agent
NAME: .NET
KEYWORDS: apm, .net, language agent
I only want it once like this in .yml file even if I run the script 10 times:
NAME: .NET
KEYWORDS: apm, .net, language agent
Try this one:
I didn't have enough data in the CSV to test it but it should work.
I have referenced this answer Python3 - write back to same stream in yaml
import os
import pathlib
import pandas
import yaml
df = pandas.read_csv('Keywords.csv', sep=';')
for index, row in df.iterrows(): # Iterates the csv file.
pack_name = row.NAME # Name of the pack
print(pack_name)
def dumpFunction():
with open(f'packs/{pack_name}/config.yml', 'w+') as outfile: # HERE I USE APPEND
f = outfile.read()
old_yaml_content = yaml.load_all(outfile, Loader=yaml.FullLoader)
outfile.seek(0)
yaml.dump_all(
df.loc[(df['NAME'] == pack_name)].to_dict(orient='records'), # Send keywords to the right pack.
outfile,
sort_keys=False,
indent=4
)
outfile.truncate()
if pack_name and os.path.exists(f'packs/{pack_name}'): # Check if the pack is available, row.NAME is the pack name.
dumpFunction()
else:
pathlib.Path(f"packs/{pack_name}").mkdir() # If pack not exist I will make one new
dumpFunction()
print(f'{pack_name} was made!')

Parsing a sample sheet file with data in column in Python 2.7

I have a file that I want parse data that are in column not by row. The format of xls file is like this:
PROJECT NAME Testing
PROJECT PI Tester
Primary Contact Name Tester
Primary Contact Email testing#tester.com
DATA SUBMISSION DATE 3/29/19
I am currently using this script,
def read_csv(file, json_file):
csv_rows = []
with open(file) as csvfile:
reader = csv.DictReader(csvfile)
title = reader.fieldnames
for row in reader:
csv_rows.extend([{title[i]:row[title[i]] for i in range(len(title))}])
write_json(csv_rows, json_file)
And it parses the data the way I want, if I set the table as:
PROJECT NAME PROJECT PI Primary Contact Name
Testing Tester Tester
I researched about this a lot, but could not find anything about it parses by column instead of by rows.

Python script to save bookmarks into a json file

I actually wanted my bookmarks for a text classifier .It needs data in .json format .So i want to know a python script which will retrieve data from the bookmarks directory and store it in a .json file.(I am using ubuntu)
Google Chrome already saves bookmarks in a form of JSON. Your question does not define what is desired outcome so here is a simple code to access and print the whole file of your saved bookmarks on Google Chrome Windows operating system. You will need to do some adjustments to the code as it is designed to run on Windows rather than Ubuntu as I do not have access to it at this moment.
import getpass
import json
user = getpass.getuser()
loc = "C:/Users/{}/AppData/Local/Google/Chrome/User Data/Default/Bookmarks.bak".format(user)
f = open(loc, encoding="utf8")
data = json.load(f)
print(data)
Edit:
import getpass
import json
user = getpass.getuser()
loc = "C:/Users/{}/AppData/Local/Google/Chrome/User Data/Default/Bookmarks.bak".format(user)
with open(loc, encoding="utf8") as f:
data = json.load(f)
for y in range(0,100):
try:
for x in data["roots"]["bookmark_bar"]["children"][y]["children"]:
print(x["url"])
except:
pass

Open text file stored as record in Django db

I need to process .txt file which has .csv structure and is stored as field in database. Main functionality of app is processing this files and generating an output.
Every once in a while I need to upload new version but keep the record of old one. Those are tiny files, rarely exceeding 300kb. I also need additional fields with uploader's name, date, version etc, that's why I'm keeping it as record in DB rather than in local files.
A file record is stored in DB with type models.FileField()
How can I access this record not as a field but as a file object and open it like a usual .txt?
What I've tried but it didn't work:
listofschedules = ScheduleFile.objects.all
file = listofschedules[0].csvSchedule
with open(file, 'rt', encoding='windows 1250') as csv_input:
reader = csv.reader(csv_input, delimiter=';')
print(reader) ...
You need to call all and then use the file's path, as exemplified in the documentation. Or for this simple test you could use first:
file = ScheduleFile.objects.first().csvSchedule
with open(file.path, 'rt', encoding='windows 1250') as csv_input:
reader = csv.reader(csv_input, delimiter=';')
print(reader)

generating a CSV file online on Google App Engine

I am using Google App Engine (python), I want my users to be able to download a CSV file generated using some data from the datastore (but I don't want them to download the whole thing, as I re-order the columns and stuff).
I have to use the csv module, because there can be cells containing commas. But the problem that if I do that I will need to write a file, which is not allowed on Google App Engine
What I currently have is something like this:
tmp = open("tmp.csv", 'w')
writer = csv.writer(tmp)
writer.writerow(["foo", "foo,bar", "bar"])
So I guess what I would want to do is either to handle cells with commas.. or to use the csv module without writing a file as this is not possible with GAE..
I found a way to use the CSV module on GAE! Here it is:
self.response.headers['Content-Type'] = 'application/csv'
writer = csv.writer(self.response.out)
writer.writerow(["foo", "foo,bar", "bar"])
This way you don't need to write any files
Here is a complete example of using the Python CSV module in GAE. I typically use it for creating a csv file from a gql query and prompting the user to save or open it.
import csv
class MyDownloadHandler(webapp2.RequestHandler):
def get(self):
q = ModelName.gql("WHERE foo = 'bar' ORDER BY date ASC")
reqs = q.fetch(1000)
self.response.headers['Content-Type'] = 'text/csv'
self.response.headers['Content-Disposition'] = 'attachment; filename=studenttransreqs.csv'
writer = csv.writer(self.response.out)
create row labels
writer.writerow(['Date', 'Time','User' ])
iterate through query returning each instance as a row
for req in reqs:
writer.writerow([req.date,req.time,req.user])
Add the appropriate mapping so that when a link is clicked, the file dialog opens
('/mydownloadhandler',MyDownloadHandler),
import StringIO
tmp = StringIO.StringIO()
writer = csv.writer(tmp)
writer.writerow(["foo", "foo,bar", "bar"])
contents = tmp.getvalue()
tmp.close()
print contents

Categories

Resources