i'm trying to append text to file in specific location.
i want to create program which takes input from user(name, image, id) and adds them to this file
names = []
images = []
id = 0
url = ['https://somewebsiteUsingId10.com',
'https://somewebsiteUsingId20.com']
if id == 5:
names.append("Testing Names")
images.append("Testing Images")
elif id == 0:
names.append("Testing one names")
images.append("Testing one Images")
I want modified file to be like this:
names = []
images = []
id = 0
url = ['https://somewebsiteUsingId20.com',
'https://somewebsiteUsingId10.com',
'https://somewebsiteUsingId50.com']
if id == 5:
names.append("Testing Names")
images.append("Testing Images")
elif id == 0:
names.append("Testing one names")
images.append("Testing one Images")
elif id == 50:
names.append("User input")
images.append("User Input")
Thanks!
In cases like this, a good course of action is to put the variable data in a configuration file.
On start-up, your program reads the configuration file and processes it.
Another program can update the configuration file.
Python has the json module in its standard library. This supports lists and dicts, so it is a good match for Python data structures.
Say you write a file urls.json, looking like this:
[
"https://somewebsiteUsingId20.com",
"https://somewebsiteUsingId10.com",
"https://somewebsiteUsingId50.com"
]
In your program you can then do:
import json
with open("urls.json") as f:
urls = json.load(f)
The variable urls now points to a list containing the aforementioned URLs.
Writing the config data goes about the same:
urls = [
"https://www.parrot.org",
"https://www.ministryofsillywalks.org",
"https://www.cheese.net",
]
with open("newurls.json", "w") as f:
json.dump(urls, f, indent=4)
The file newurls.json now contains:
[
"https://www.parrot.org",
"https://www.ministryofsillywalks.org",
"https://www.cheese.net"
]
Note that JSON is pretty flexible, you are not limited to strings:
import datetime
config = {
'directories': ["https://www.parrot.org", "https://www.ministryofsillywalks.org"],
'saved': str(datetime.datetime.now()),
'count': 12
}
with open("configuration.json", "w") as cf:
json.dump(config, cf, indent=4)
This would result in something like:
{
"directories": [
"https://www.parrot.org",
"https://www.ministryofsillywalks.org"
],
"saved": "2022-02-07 21:21:14.787420",
"count": 12
}
(You'd get another date/time, of course.)
The only major downside to JSON files is that they don't allow comments. If you need comments, use another format like the configparser module.
Note that there are other methods like shelve and read&eval but those have potential safety issues.
I have all the data of a lead object from salesforce by python and I save it by csv.
but since there is a lot of information I get python memory error
.
**This code get python memory error code**
from simple_salesforce import Salesforce
from datetime import datetime
import csv
import os
import json
import account
SALESFORCE_USERNAME = '123'
PASSWORD = '123'
SECURITY_TOKEN = '123'
def main():
# Authentication settings
sf = Salesforce(username=SALESFORCE_USERNAME,
password=PASSWORD,
security_token=SECURITY_TOKEN)
# Lead Column setting to be acquired
columns = [
"CreatedDate"
]
sosl = 'SELECT {0[0]} FROM Lead'.format(
columns)
# Data acquisition with SOSL
data = sf.query_all(sosl)
# Delete CSV file if it exists
output_csv = 'output.csv'
if os.path.exists(output_csv):
os.remove(output_csv)
# Write to CSV file
for k, v in data.items():
if type(v) is list:
with open(output_csv, 'w', newline="") as f:
writer = csv.DictWriter(f, fieldnames=columns)
writer.writeheader()
for d in v:
data = json.loads(json.dumps(d))
del data['attributes']
writer.writerow(data)
if __name__ == '__main__':
main()
That's why when there are more than 1000 lines in the csv I want
csv are recorded as follows.
1 output1.csv (1000 row)
2 output2.csv (1000 row)
3 output3.csv ......
And I get the following error, what do I need to do so that I can get out this way?
I want to split the cvs and I put in with open csv
iterator = True, chunk size = 1000
Code
from simple_salesforce import Salesforce
from datetime import datetime
import csv
import os
import json
import account
SALESFORCE_USERNAME = '123'
PASSWORD = '123'
SECURITY_TOKEN = '123'
def main():
# Authentication settings
sf = Salesforce(username=SALESFORCE_USERNAME,
password=PASSWORD,
security_token=SECURITY_TOKEN)
# Lead Column setting to be acquired
columns = [
"CreatedDate"
]
sosl = 'SELECT {0[0]} FROM Lead'.format(
columns)
# Data acquisition with SOSL
data = sf.query_all(sosl)
# Delete CSV file if it exists
output_csv = 'output.csv'
if os.path.exists(output_csv):
os.remove(output_csv)
# Write to CSV file
for k, v in data.items():
if type(v) is list:
with open(output_csv, 'w', newline="",iterator=True,chunksize=1000) as f:
writer = csv.DictWriter(f, fieldnames=columns)
writer.writeheader()
for d in v:
data = json.loads(json.dumps(d))
del data['attributes']
writer.writerow(data)
if __name__ == '__main__':
main()
Error message
Traceback (most recent call last):
File "c:/Users/test/Documents/test/test5.py", line 44, in <module>
main()
File "c:/Users/test/Documents//test5.py", line 36, in main
with open(output_csv, 'w', newline="",iterator=True,chunksize=1000) as f:
TypeError: 'iterator' is an invalid keyword argument for open()
With this way I think that I will not get python error if there is another way they can teach me?
If anyone knows, please let me know.
data = sf.query_all(sosl)
This call retrieves all information into memory for the given query, which is SOQL, not SOSL.
Instead, use
data = sf.query_all_iter(sosl)
and iterate over the resulting iterator instead of data.items(), which will be much more memory-efficient as it won't attempt to retrieve all items at once.
I have a binary file with this format:
and i use this code to open it:
import numpy as np
f = open("author_1", "r")
dt = np.dtype({'names': ['au_id','len_au_name','au_name','nu_of_publ', 'pub_id', 'len_of_pub_id','pub_title','num_auth','len_au_name_1', 'au_name1','len_au_name_2', 'au_name2','len_au_name_3', 'au_name3','year_publ','num_of_cit','citid','len_cit_tit','cit_tit', 'num_of_au_cit','len_cit_au_name_1','au_cit_name_1', len_cit_au_name_2',
'au_cit_name_2','len_cit_au_name_3','au_cit_name_3','len_cit_au_name_4',
'au_cit_name_4', 'len_cit_au_name_5','au_cit_name_5','year_cit'],
'formats': [int,int,'S13',int,int,int,'S61', int,int,'S8',int,'S7',int,'S12',int,int,int,int,'S50',int,int,
'S7',int,'S7',int,'S9',int,'S8',int,'S1',int]})
a = np.fromfile(f, dtype=dt, count=-1, sep="")
And I take this:
array([ (1, 13, b'Scott Shenker', 200, 1, 61, b'Integrated services in the internet architecture: an overview', 3, 8, b'R Braden', 7, b'D Clark', 12, b'S Shenker\xe2\x80\xa6', 1994, 1000, 401, 50, b'[HTML] An architecture for differentiated services', 5, 7, b'D Black', 7, b'S Blake', 9, b'M Carlson', 8, b'E Davies', 1, b'Z', 1998),
(402, 72, b'Resource rese', 1952544370, 544108393, 1953460848, b'ocol (RSVP)--Version 1 functional specification\x05\x00\x00\x00\x08\x00\x00\x00R Brad', 487013, 541851648, b'Zhang\x08', 1109414656, b'erson\x08', 542310400, b'Herzog\x07\x00\x00\x00S ', 1768776010, 511342, 103168, 22016, b'\x00A reliable multicast framework for light-weight s', 1769173861, 544435823, b'and app', 1633905004, b'tion le', 543974774, b'framing\x04', 458752, b'\x00\x00S Floy', 2660, b'', 1632247894),
Any idea how can open the whole file?
I agree with Ryan: parsing the data is straightforward, but not trivial, and really tedious. Whatever disk space saving you gain by packing the data in this way, you pay it dearly at the hour of unpacking.
Anyway, the file is made of variable length records and fields. Each record is made of variable number and length of fields that we can read in chunks of bytes. Each chunk will have different format. You get the idea. Following this logic, I assembled these three functions, that you can finish, modify, test, etc:
from struct import Struct
import struct
def read_chunk(fmt, fileobj):
chunk_struct = Struct(fmt)
chunk = fileobj.read(chunk_struct.size)
return chunk_struct.unpack(chunk)
def read_record(fileobj):
author_id, len_author_name = read_chunk('ii', f)
author_name, nu_of_publ = read_chunk(str(len_author_name)+'ci', f) # 's' or 'c' ?
record = { 'author_id': author_id,
'author_name': author_name,
'publications': [] }
for pub in range(nu_of_publ):
pub_id, len_pub_title = read_chunk('ii', f)
pub_title, num_pub_auth = read_chunk(str(len_pub_title)+'ci', f)
record['publications'].append({
'publication_id': pub_id,
'publication_title': pub_title,
'publication_authors': [] })
for auth in range(num_pub_auth):
len_pub_auth_name = read_chunk('i', f)
pub_auth_name = read_chunk(str(len_pub_auth_name)+'c', f)
record['publications']['publication_authors'].append({'name': pub_auth_name})
year_publ, nu_of_cit = read_chunk('ii', f)
# Finish building your record with the remaining fields...
for cit in range(nu_of_cit):
cit_id, len_cit_title = read_chunk('ii', f)
cit_title, num_cit_auth = read_chunk(str(len_cit_title)+'ci', f)
for cit_auth in range(num_cit_auth):
len_cit_auth_name = read_chunk('i', f)
cit_auth_name = read_chunk(str(len_cit_auth_name)+'c', f)
year_cit_publ = read_chunk('i', f)
return record
def parse_file(filename):
records = []
with open(filename, 'rb') as f:
while True:
try:
records.append(read_record(f))
except struct.error:
break
# do something useful with the records...
The data structure stored in this file is hierarchical, rather than "flat": child arrays of different length are stored within each parent element. It is not possible to represent such a data structure using numpy arrays (even recarrays), and therefore it is not possible to read the file with np.fromfile().
What do you mean by "open the whole file"? What sort of python data structure would you like to end up with?
It would be straightforward, but still not trivial, to write a function to parse the file into a list of dictionaries.
I want to open csv file for reading purpose. But I'm facing some exceptions regarding to that.
I'm using Python 2.7.
main.python-
if __name__ == "__main__":
f = open('input.csv','r+b')
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
reader = csv.DictReader(iter(m.readline, ""))
for read in reader:
num = read['time']
print num
output-
Traceback (most recent call last):
File "/home/PycharmProjects/time_gap_Task/main.py", line 22, in <module>
for read in reader:
File "/usr/lib/python3.4/csv.py", line 109, in __next__
self.fieldnames
File "/usr/lib/python3.4/csv.py", line 96, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How to resolve this error? and how to open csv file using mmap and csv in good manner so code is working perfect?
I know you asked this a while ago, but I actually created a module for myself that does this, because I do a lot of work with large CSV files, and sometimes I need to convert them into dictionaries, based on a key. Below is the code I've been using. Please feel free to modify as needed.
def MmapCsvFileIntoDict(csvFilePath, skipHeader = True, transform = lambda row: row, keySelector = lambda o: o):
"""
Takes a CSV file path and uses mmap to open the file and return a dictionary of the contents keyed
on the results of the keySelector. The default key is the transformed object itself. Mmap is used because it is
a more efficient way to process large files.
The transform method is used to convert the line (converted into a list) into something else. Hence 'transform'.
If you don't pass it in, the transform returns the list itself.
"""
contents = {}
firstline = False
try:
with open(csvFilePath, "r+b") as f:
# memory-map the file, size 0 means whole file
mm = mmap.mmap(f.fileno(), 0)
for line in iter(mm.readline, b''):
if firstline == False:
firstline = True
if skipHeader == True:
continue
row = ''
line = line.decode('utf-8')
line = line.strip()
row = next(csv.reader([line]), '')
if transform != None and callable(transform):
if row == None or row == '':
continue
value = transform(row)
else:
value = row
if callable(keySelector):
key = keySelector(value)
else:
key = keySelector
contents[key] = value
except IOError as ie:
PrintWithTs('Error decomposing the companies: {0}'.format(ie))
return {}
except:
raise
return contents
When you call this method, you have some options.
Assume you have a file that looks like:
Id, Name, PhoneNumber
1, Joe, 7175551212
2, Mary, 4125551212
3, Vince, 2155551212
4, Jane, 8145551212
The easiest way to call it is like this:
dict = MmapCsvFileIntoDict('/path/to/file.csv', keySelector = lambda row: row[0])
What you get back is a dict looking like this:
{ '1' : ['1', 'Joe', '7175551212'], '2' : ['2', 'Mary', '4125551212'] ...
One thing I like to do is create a class or a namedtuple to represent my data:
class CsvData:
def __init__(self, row):
self.Id = int(row[0])
self.Name = row[1].upper()
self.Phone = int(row[2])
And then when I call the method, I pass in a second lambda to transform each row in the file to an object I can work with:
dict = MmapCsvFileIntoDict('/path/to/file.csv', transform = lambda row: CsvData(row), keySelector = lambda o: o.Id)
What I get back that time looks like:
{ 1 : <object instance>, 2 : <object instance>...
I hope this helps! Best of luck
When open a file with the flag b like this:
f = open('input.csv','r+b')
You read the file as bytes and not as string.
So, try to change the flags to r:
f = open('input.csv','r')
if you just want to read data with specific columnes from csv file, just try:
import csv
with open('input.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['time']
I read a csv file from a link :
csv file format looks like this when I open it on text editor.
name, id, age, city
,steve, 1, 34, NY
,Rob, 2, 29, NY
,Ashly, 3, 28, NY
#!/usr/bin/python
url = 'http://domainname.com/file.csv'
response = urllib2.urlopen(url).read()
output = StringIO.StringIO(response)
cr = csv.reader((line.replace('NUL','') for line in output), delimiter=",")
when I iterate over the cr using
for row in cr:
print row
I get this output which is diferent than the actual file data.
['\x1b']
['^']
['\xd3']
['\xd4']
['\xe7']
['\x88']
['\xf7']
I have tried your code with a csv file from link mentioned below and everything is working perfectly fine. All the rows are printing correctly that means csv file is correctly received from url.
import urllib2
import StringIO
import csv
url = "http://www.andrewpatton.com/countrylist.csv"
response = urllib2.urlopen(url).read()
output = StringIO.StringIO(response)
cr = csv.reader((line.replace('NUL','') for line in output), delimiter=",")
for row in cr:
print row