This question already has answers here:
Python RE library String Split but keep the delimiters/separators as part of the next string
(2 answers)
Closed 5 months ago.
Given git log output like such:
commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)
Author: Slim Shady
Date: Sun Sep 18 19:53:42 2022 -0700
ci: remove debugging line github action script
commit body
commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)
Author: Slim Shady
Date: Sun Sep 18 19:41:20 2022 -0700
feat: read and write IDs
commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874
Author: Slim Shady
Date: Sun Sep 18 17:41:03 2022 -0700
feat: new hook to allow custom tags
I'd like that to turn into a list in python, with each element containing a single commit (including hash, author, body, etc.).
I've tried using re.split(r"commit \w{40}", git_log), but it doesn't keep the hash in the output.
You could also use a positive lookahead to split your data.
with open('git_log.txt', 'r') as f:
data = f.read()
res = list(filter(None, re.split(r"(?=commit \w{40})", data)))
Output:
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
You need to put the split pattern in a capture group to allow it to be part of the output:
# filter(None, ...) to remove empty strings
>>> res = filter(None, re.split(r'(commit \w{40})', inp))
# Join items in group of two to handle the split between a commit line and rest of its body
>>> output = ["".join(item) for item in zip(*[res] * 2)]
>>> output
[
'commit 19e0f017ac832238f5a800dd3ea7a5966b3c1343 (HEAD -> master, origin/master, origin/HEAD)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:53:42 2022 -0700\n\n ci: remove debugging line github action script\n\n commit body\n\n',
'commit ef82c672d21d70c43f0454b0b4d6fa22ef4ad0a9 (fix_release_action)\nAuthor: Slim Shady\nDate: Sun Sep 18 19:41:20 2022 -0700\n\n feat: read and write IDs\n\n',
'commit 8ee8fcbebcab76a2fbf0ee096a0d216e51fe2874\nAuthor: Slim Shady\nDate: Sun Sep 18 17:41:03 2022 -0700\n\n feat: new hook to allow custom tags'
]
But if you do have control over the git log output, you could format it differently and parse it without regex:
git log --pretty=format:'"%H"%x09"%an"%x09"%ad"%x09"%B"' > output.csv
Then:
>>> import csv
>>> with open("output.csv") as f:
... items = list(csv.reader(f, delimiter='\t'))
...
>>> items[0]
["19e0f017ac832238f5a800dd3ea7a5966b3c1343", "Slim Shady", "Sun Sep 18 19:53:42 2022 -0700", "ci: remove debugging line github action script"]
Other option is to use libraries like https://gitpython.readthedocs.io/en/stable/ to get access to commits as Python objects you can access easily.
I want to open a JSON file using Python in my project, but I constantly get the following error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
This is the code:
import json
with open("../data.txt") as json_file:
data = json.load(json_file)
I have a really simple text file with JSON formatted data in it. This is the data.txt file:
{
"data": [
{
"day": "22/04/2020 15:35",
"viewcount": "1"
},
{
"day": "22/04/2020 20:51",
"viewcount": "2"
}
]
}
I've tried your source code and the visible JSON data as is, it runs with no problems at all.
I'd suggest checking the contents of file in binary form, e.g. by using a utility such as hexdump to see how it begins:
$ hexdump data.txt
0000000 0a7b 2020 2020 6422 7461 2261 203a 5b20
0000010 200a 2020 2020 2020 2020 2020 2020 2020
0000020 2020 0a7b 2020 2020 2020 2020 2020 2020
...
Or use file utility to check the encoding as described in the following post: https://unix.stackexchange.com/questions/11602/how-can-i-test-the-encoding-of-a-text-file-is-it-valid-and-what-is-it
The python code was not the problem.
The file was not saved in UTF-8 encoding, this was the problem.
Warning: beginner here:
So I am reading in a text file that is in the form of a json file. Since the son file is just like a dictionary I want to address parts of the json like I would a dictionary but I don't know how to do this. This is the little bit of what i have:
code:
with open("trump.txt","r") as lines:
for line in lines:
print(line)
what this prints:
{"created_at":"Wed Sep 27 01:19:39 +0000 2017","id":912849180741087232,"id_str":"912849180741087232","text":"RT #TheRickWilson: I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump ca\u2026","source":"\u003ca href=\"http://twitter.com/download/iphone\" rel=\"nofollow\"\u003eTwitter for iPhone\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":66914769,"id_str":"66914769","name":"Kathy","screen_name":"mydoggigi","location":"Earth","url":null,"description":"Love politics, Grandchildren & PSU #StillWithHer #NotMyPresident Blocked by Susan Sarandon, Glenn Greenwald, Joel Osteen and Joe Scarborough!!\ud83d\ude0e #TheResistance","translator_type":"none","protected":false,"verified":false,"followers_count":5878,"friends_count":5973,"listed_count":143,"favourites_count":110285,"statuses_count":138191,"created_at":"Wed Aug 19 04:55:41 +0000 2009","utc_offset":-14400,"time_zone":"Eastern Time (US & Canada)","geo_enabled":true,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"C0DEED","profile_background_image_url":"http://abs.twimg.com/images/themes/theme1/bg.png","profile_background_image_url_https":"https://abs.twimg.com/images/themes/theme1/bg.png","profile_background_tile":false,"profile_link_color":"1DA1F2","profile_sidebar_border_color":"C0DEED","profile_sidebar_fill_color":"DDEEF6","profile_text_color":"333333","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/903412377424732160/NqCfPFiB_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/66914769/1504225271","default_profile":true,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Wed Sep 27 01:08:45 +0000 2017","id":912846439964987392,"id_str":"912846439964987392","text":"I see the clickservatives are out in force screaming there were special circumstances in AL.\n\nYes, it's because Trump can't deliver. Sad!","source":"\u003ca href=\"http://twitter.com/download/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":19084896,"id_str":"19084896","name":"Rick Wilson","screen_name":"TheRickWilson","location":"Florida and points beyond","url":"http://facebook.com/therickwilson","description":"GOP Media Guy, Dad, Husband, Pilot, Hunter, Writer. I make ads and do politics. Daily Beast columnist. Everything Trump Touches Dies.","translator_type":"none","protected":false,"verified":true,"followers_count":238578,"friends_count":3518,"listed_count":4235,"favourites_count":48094,"statuses_count":250609,"created_at":"Fri Jan 16 20:50:17 +0000 2009","utc_offset":-14400,"time_zone":"America/New_York","geo_enabled":false,"lang":"en","contributors_enabled":false,"is_translator":false,"profile_background_color":"1A1B1F","profile_background_image_url":"http://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_image_url_https":"https://pbs.twimg.com/profile_background_images/220716353/Firefox_Wallpaper.jpg","profile_background_tile":true,"profile_link_color":"445555","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"252429","profile_text_color":"666666","profile_use_background_image":true,"profile_image_url":"http://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_image_url_https":"https://pbs.twimg.com/profile_images/813585115934658560/gnuRozoD_normal.jpg","profile_banner_url":"https://pbs.twimg.com/profile_banners/19084896/1504722796","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"quote_count":5,"reply_count":50,"retweet_count":100,"favorite_count":456,"entities":{"hashtags":[],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"TheRickWilson","name":"Rick Wilson","id":19084896,"id_str":"19084896","indices":[3,17]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"en","timestamp_ms":"1506475179263"}
so how can i do something as simple as something below in my code?
dict["created_at"]="Wed Sep 27 01:19:39 +0000 2017"
Try this:
import json
with open('file.json') as file:
data = json.load(file)
#code
I have a view in my Django application that automatically creates an image using the PIL, stores it in the Nginx media server, and returns a html template with a img tag pointing to it's url.
This works fine, but I notice an issue. For every 5 times I access this view, in 1 of them the image doesn't render.
I did some investigation and I found something interesting, this is the HTTP response header when the image renders properly:
Accept-Ranges:bytes
Connection:keep-alive
Content-Length:14966
Content-Type:image/jpeg
Date:Wed, 18 Aug 2010 15:36:16 GMT
Last-Modified:Wed, 18 Aug 2010 15:36:16 GMT
Server:nginx/0.5.33
and this is the header when the image doesn't load:
Accept-Ranges:bytes
Connection:keep-alive
Content-Length:0
Content-Type:image/jpeg
Date:Wed, 18 Aug 2010 15:37:47 GMT
Last-Modified:Wed, 18 Aug 2010 15:37:46 GMT
Server:nginx/0.5.33
Notice the Content-Lenth equals to zero. What could have caused this? Any ideas on how could I further debug this problem?
Edit:
When the view is called, it calls this "draw" method of the model. This is basically what it does (I removed the bulk of the code for clarity):
def draw(self):
# Open/Creates a file
if not self.image:
(fd, self.image) = tempfile.mkstemp(dir=settings.IMAGE_PATH, suffix=".jpeg")
fd2 = os.fdopen(fd, "wb")
else:
fd2 = open(os.path.join(settings.SITE_ROOT, self.image), "wb")
# Creates a PIL Image
im = Image.new(mode, (width, height))
# Do some drawing
.....
# Saves
im = im.resize((self.get_size_site(self.width),
self.get_size_site(self.height)))
im.save(fd2, "JPEG")
fd2.close()
Edit2: This is website:
http://xxxcnn7979.hospedagemdesites.ws:8000/cartao/99/
if you keep hitting F5 the image on the right will eventually render.
We had this problem a while back when writing HTML pages out to disk. The solution for us was to write to a temporary file and then atomically rename the file. You might also want to consider using fsync.
The full source is available here: staticgenerator/__init__.py, but here are the useful bits:
import os
import stat
import tempfile
...
f, tmpname = tempfile.mkstemp(dir=directory)
os.write(f, content)
# See http://docs.python.org/library/os.html#os.fsync
f.flush()
os.fsync(f.fileno())
os.close(f)
# Ensure it is webserver readable
os.chmod(tmpname, stat.S_IREAD | stat.S_IWRITE | stat.S_IWUSR | stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH)
# Rename is an atomic operation in POSIX
# See: http://docs.python.org/library/os.html#os.rename
os.rename(tmpname, fn)