Get list of all JIRA issues (python) - python

I am trying to get a list of all JIRA issues so that I may iterate through them in the following manner:
from jira import JIRA
jira = JIRA(basic_auth=('username', 'password'), options={'server':'https://MY_JIRA.atlassian.net'})
issue = jira.issue('ISSUE_KEY')
print(issue.fields.project.key)
print(issue.fields.issuetype.name)
print(issue.fields.reporter.displayName)
print(issue.fields.summary)
print(issue.fields.comment.comments)
The code above returns the desired fields (but only an issue at a time), however, I need to be able to pass a list of all issue keys into:
issue = jira.issue('ISSUE_KEY')
The idea is to write a for loop that would go through this list and print the indicated fields.
I have not been able to populate this list.
Can someone point me in the right direction please?

def get_all_issues(jira_client, project_name, fields):
issues = []
i = 0
chunk_size = 100
while True:
chunk = jira_client.search_issues(f'project = {project_name}', startAt=i, maxResults=chunk_size, fields=fields)
i += chunk_size
issues += chunk.iterable
if i >= chunk.total:
break
return issues
issues = get_all_issues(jira, 'JIR', ["id", "fixVersion"])

options = {'server': 'YOUR SERVER NAME'}
jira = JIRA(options, basic_auth=('YOUR EMAIL', 'YOUR PASSWORD'))
size = 100
initial = 0
while True:
start= initial*size
issues = jira.search_issues('project=<NAME OR ID>', start,size)
if len(issues) == 0:
break
initial += 1
for issue in issues:
print 'ticket-no=',issue
print 'IssueType=',issue.fields.issuetype.name
print 'Status=',issue.fields.status.name
print 'Summary=',issue.fields.summary
The first 3 arguments of jira.search_issues() are the jql query, starting index (0 based hence the need for multiplying on line 6) and the maximum number of results.

You can execute a search instead of a single issue get.
Let's say your project key is PRO-KEY, to perform a search, you have to use this query:
https://MY_JIRA.atlassian.net/rest/api/2/search?jql=project=PRO-KEY
This will return the first 50 issues of the PRO-KEY and a number, in the field maxResults, of the total number of issues present.
Taken than number, you can perform others searches adding the to the previous query:
&startAt=50
With this new parameter you will be able to fetch the issues from 51 to 100 (or 50 to 99 if you consider the first issue 0).
The next query will be &startAt=100 and so on until you reach fetch all the issues in PRO-KEY.
If you wish to fetch more than 50 issues, add to the query:
&maxResults=200

You can use the jira.search_issues() method to pass in a JQL query. It will return the list of issues matching the JQL:
issues_in_proj = jira.search_issues('project=PROJ')
This will give you a list of issues that you can iterate through

Starting with Python3.8 reading all issues can be done relatively short and elegant:
issues = []
while issues_chunk := jira.search_issues('project=PROJ', startAt=len(issues)):
issues += list(issue issues_chunk)
(since we need len(issues) in every step we cannot use a list comprehension, can we?
Together with initialization and cashing and "preprocessing" (e.g. just taking issue.raw) you could write something like this:
jira = jira.JIRA(
server="https://jira.at-home.com",
basic_auth=json.load(open(os.path.expanduser("~/.jira-credentials")))
validate=True,
)
issues = json.load(open("jira_issues.json"))
while issues_chunk := jira.search_issues('project=PROJ', startAt=len(issues)):
issues += [issue.raw for issue issues_chunk]
json.dump(issues, open("jira_issues.json", "w"))

Related

Python, Tweepy --- struggling with getting tweets filtered on certain criteria

I am struggling with the following, any help would be highly appreciated. The path I chose to solve the problem might be clunky, even outdated, but it is the best I could do. So, I am trying to get recent tweets BASED on a query and ONLY from the people I follow on Twitter. So I ran two different queries:
1)
followers = client.get_users_following(id = '', max_results = 100)
and 2)
tweets = client.search_recent_tweets(query=query, tweet_fields=['author_id', 'created_at'], max_results=100)
I managed to get the responses into json objects, then normalise and at the end I get two dataframes:
A) a dataframe df['id']--where the 'id' is the unique username of the Twitter user, result of the first query("get_users_following"); here I converted the 'id' type from "object"to "int"
B) a dataframe with the following columns ['author_id'], ['text'], ['created_at'], ['id'] ---where 'author_id' is the unique username of the Twitter user, the same as the 'id' from the previous dataframe
All good until the point where I am trying to iterate through the 'author_id' to see if it matches my list of the 'id's of the people I follow and whenever it does, I would like to add the text of that particular tweet to a list and start analysing the data.
The code I am struggling with is below and the thing is that somehow that the error I get is in fact an empty list.
all = []
x =len(df['id'])
for number in twe['author_id']:
for j in range(x):
if number == df['id'][j]:
all.append(twe['text'][number])
else:
j+=1
print(all)
print(len(all))
I checked and there were people that I follow that were tweeting on a particular topic or another.
Any thoughts would be highly appreciated.
LATER EDIT:
In the meanwhile I worked a bit more on the for loop, but still the same empty list as a result.
al = []
print(f1.shape)
x =len(f1['id'])
print(x)
y = len(twe['text'])
print(y)
i = 0
j = 0
for (i,j) in [(i,j) for i in range(x) for j in range(y)]:
if f1['id'][i] == twe['author_id'][j]:
al.append(['text'][j])
else:
if j<y:
j+=1
else:
i+=1
print(len(al))

How can I convert a result into a list of variables that I can use as an input?

I was able to come up with these two parts, but I'm having trouble linking them.
Part 1 - This accepts a filter which is listed as 'project = status = blocked'. This will list all issue codes that match the filter and separate them line by line. Is it necessary to convert the results into a list? I'm also wondering if it converts the entire result into one massive string or if each line is a string.
issues_in_project = jira.search_issues(
'project = status = Blocked'
)
issueList = list(issues_in_project)
search_results = '\n'.join(map(str, issueList))
print(search_results)
Part 2 - Right now, the jira.issue will only accept an issue code one at a time. I would like to use the list generated from Part 1 to keep running the code below for each and every issue code in the result. I'm having trouble linking these two parts.
issue = jira.issue(##Issue Code goes here##)
print(issue.fields.project.name)
print(issue.fields.summary + " - " + issue.fields.status.statusCategory.name)
print("Description: " + issue.fields.description)
print("Reporter: " + issue.fields.reporter.displayName)
print("Created on: " + issue.fields.created)
Part 1
'project = status = Blocked' is not a valid JQL. So first of all, you will not get a valid result from calling jira.search_issues('project = status = Blocked').
The result of jira.search_issues() is basically a list of jira.resources.Issue objects and not a list of string or lines of string. To be correct, I should say the result of jira.search_issues() is of type jira.client.ResultList, which is a subclass of python's list.
Part 2
You already have all the required data in issues_in_project if your JQL is correct. Therefore, you can loop through the list and use the relevant information of each JIRA issue. For your information, jira.issue() returns exactly one jira.resources.Issue object (if the issue key exists).
Example
... # initialize jira
issues_in_project = jira.search_issues('status = Blocked')
for issue in issues_in_project:
print(issue.key)
print(issue.fields.summary)

How to access the next page using JIRA -REST-API for python

I am trying to fetch all issues related to a project. When I execute the below code, I get only 50 results. I need to navigate all pages and get all the bugs.Please help
all_issues = jira.search_issues('project=ProjectName')
each_issue = sorted([issue.key for issue in all_issues])
for item in each_issue:
print item
This gives me only 50 issues since the page has default value of 50. I need to get all the issues.
-- Update 18/Oct/2021
As discovered in the answer below, setting maxResults to False appears to remove the limit on the result set.
all_issues = jira.search_issues('project=ProjectName', maxResults=False)
-- Original Post
Try;
all_issues = jira.search_issues('project=ProjectName', maxResults=50, startAt=50)
The results from the REST API are paged, with the default number of results being 50. You can supply the startAt value to start the results from a point in the result set. By default this value is 0.
So, your original query would get results 0-49, the query above would get results 50-99 and changing startAt to 100 would get 100-149, and so on.
You can also increase the value of maxResults to return more results per page. However, this is limited to the max value of jira.search.views.default.max configured in your JIRA instance (set to 1000 by default).
It is not possible to make the API return all issues without paging. You would have to configure jira.search.views.default.max to a very large value and supply that value as maxResults.
According to the source code:
https://github.com/pycontribs/jira/blob/f5d7dd032e719fe35f5fc377f302200f6c69afd4/jira/client.py#L2737
Setting maxResults=False should do the trick, so your example would look like:
all_issues = jira.search_issues('project=ProjectName', maxResults=False)
each_issue = sorted([issue.key for issue in all_issues])
for item in each_issue:
print item
I shortly tested it right now and it worked here.
I love André Düwel's answer for reasonable result sets.
Just in case there's a use case where the total number of issues returned is too big to handle, or if you want to process the results in more reasonable chunks, here's a fetch that I use:
def fetch_jql(jc, jql, limit=0, page_size=50, fields=None, expand=None):
""" fetches a list of JIRA.issues found in the JQL. Handles JIRA pagination
:param jc: jira_client
:param jql: query to search for
:param limit: max results to find, 0 finds all
:param page_size: jira pagination setting (max=100, def=50)
:param fields: fields to search for, if None then get everything
:param expand: usually just the changelog, if None get everything
:return: list of issues as JIRA Resources
"""
if limit != 0 and limit < page_size:
page_size = limit
response = list()
index = 0
while True:
issues = jc.search_issues(jql, startAt=index, maxResults=page_size, fields=fields, expand=expand)
if len(issues) == 0:
break
index += len(issues)
response += issues
if limit != 0 and limit <= index:
break
return response

How to use offset in VKontakte with Python?

I am trying to build a script where I can get the check-ins for a specific location. For some reason when I specify lat, long coords VK never returns any check-ins so I have to fetch location IDs first and then request the check-ins from that list. However I am not sure on how to use the offset feature, which I presume is supposed to work somewhat like a pagination function.
So far I have this:
import vk
import json
app_id = #enter app id
login_nr = #enter your login phone or email
password = '' #enter password
vkapi = vk.API(app_id, login_nr, password)
vkapi.getServerTime()
def get_places(lat, lon, rad):
name_list = []
try:
locations = vkapi.places.search(latitude=lat, longitude=lon, radius=rad)
name_list.append(locations['items'])
except Exception, e:
print '*********------------ ERROR ------------*********'
print str(e)
return name_list
# Returns last checkins up to a maximum of 100
# Define the number of checkins you want, 100 being maximum
def get_checkins_id(place_id,check_count):
checkin_list= []
try:
checkins = vkapi.places.getCheckins(place = place_id, count = check_count)
checkin_list.append(checkins['items'])
except Exception, e:
print '*********------------ ERROR ------------*********'
print str(e)
return checkin_list
What I would like to do eventually is combine the two into a single function but before that I have to figure out how offset works, the current VK API documentation does not explain that too well. I would like the code to read something similar to:
def get_users_list_geo(lat, lon, rad, count):
users_list = []
locations_lists = []
users = []
locations = vkapi.places.search(latitude=lat, longitude=lon, radius=rad)
for i in locations[0]:
locations_list.append(i['id'])
for i in locations:
# Get each location ID
# Get Checkins for location
# Append checkin and ID to the list
From what I understand I have to count the offset when getting the check-ins and then somehow account for locations that have more than 100 check-ins. Anyways, I would greatly appreciate any type of help, advice, or anything. If you have any suggestions on the script I would love to hear them as well. I am teaching myself Python so clearly I am not very good so far.
Thanks!
I've worked with VK API with javascript, but I think, logic is the same.
TL;DR: Offset is a number of results (starting with the first) which API should skip in response
For example, you make query, which should return 1000 results (lets imagine that you know exact number of results).
But VK return to you only 100 per request. So, how to get other 900?
You say to API: give me next 100 results. Next is offset - number of results you want to skip because you've already handled them. So, VK API takes 1000 results, skip first 100, and return to you next (second) 100.
Also, if you are talking about this method (http://vk.com/dev/places.getCheckins) in first paragraph, please check that your lat/long are float, not integer. And it could be useful to try swap lat/long - maybe you got them mixed up?

Loading datasets from datastore and merge into single dictionary. Resource problem

I have a productdatabase that contains products, parts and labels for each part based on langcodes.
The problem I'm having and haven't got around is a huge amount of resource used to get the different datasets and merging them into a dict to suit my needs.
The products in the database are based on a number of parts that is of a certain type (ie. color, size). And each part has a label for each language. I created 4 different models for this. Products, ProductParts, ProductPartTypes and ProductPartLabels.
I've narrowed it down to about 10 lines of code that seams to generate the problem. As of currently I have 3 Products, 3 Types, 3 parts for each type, and 2 languages. And the request takes a wooping 5500ms to generate.
for product in productData:
productDict = {}
typeDict = {}
productDict['productName'] = product.name
cache_key = 'productparts_%s' % (slugify(product.key()))
partData = memcache.get(cache_key)
if not partData:
for type in typeData:
typeDict[type.typeId] = { 'default' : '', 'optional' : [] }
## Start of problem lines ##
for defaultPart in product.defaultPartsData:
for label in labelsForLangCode:
if label.key() in defaultPart.partLabelList:
typeDict[defaultPart.type.typeId]['default'] = label.partLangLabel
for optionalPart in product.optionalPartsData:
for label in labelsForLangCode:
if label.key() in optionalPart.partLabelList:
typeDict[optionalPart.type.typeId]['optional'].append(label.partLangLabel)
## end problem lines ##
memcache.add(cache_key, typeDict, 500)
partData = memcache.get(cache_key)
productDict['parts'] = partData
productList.append(productDict)
I guess the problem lies in the number of for loops is too many and have to iterate over the same data over and over again. labelForLangCode get all labels from ProductPartLabels that match the current langCode.
All parts for a product is stored in a db.ListProperty(db.key). The same goes for all labels for a part.
The reason I need the some what complex dict is that I want to display all data for a product with it's default parts and show a selector for the optional one.
The defaultPartsData and optionaPartsData are properties in the Product Model that looks like this:
#property
def defaultPartsData(self):
return ProductParts.gql('WHERE __key__ IN :key', key = self.defaultParts)
#property
def optionalPartsData(self):
return ProductParts.gql('WHERE __key__ IN :key', key = self.optionalParts)
When the completed dict is in the memcache it works smoothly, but isn't the memcache reset if the application goes in to hibernation? Also I would like to show the page for first time user(memcache empty) with out the enormous delay.
Also as I said above, this is only a small amount of parts/product. What will the result be when it's 30 products with 100 parts.
Is one solution to create a scheduled task to cache it in the memcache every hour? It this efficient?
I know this is alot to take in, but I'm stuck. I've been at this for about 12 hours straight. And can't figure out a solution.
..fredrik
EDIT:
A AppStats screenshoot here.
From what I can read the queries seams fine in AppStats. only taking about 200-400 ms. How can the difference be that big?
EDIT 2:
I implemented dound's solution and added abit. Now it looks like this:
langCode = 'en'
typeData = Products.ProductPartTypes.all()
productData = Products.Product.all()
labelsForLangCode = Products.ProductPartLabels.gql('WHERE partLangCode = :langCode', langCode = langCode)
productList = []
label_cache_key = 'productpartslabels_%s' % (slugify(langCode))
labelData = memcache.get(label_cache_key)
if labelData is None:
langDict = {}
for langLabel in labelsForLangCode:
langDict[str(langLabel.key())] = langLabel.partLangLabel
memcache.add(label_cache_key, langDict, 500)
labelData = memcache.get(label_cache_key)
GQL_PARTS_BY_PRODUCT = Products.ProductParts.gql('WHERE products = :1')
for product in productData:
productDict = {}
typeDict = {}
productDict['productName'] = product.name
cache_key = 'productparts_%s' % (slugify(product.key()))
partData = memcache.get(cache_key)
if partData is None:
for type in typeData:
typeDict[type.typeId] = { 'default' : '', 'optional' : [] }
GQL_PARTS_BY_PRODUCT.bind(product)
parts = GQL_PARTS_BY_PRODUCT.fetch(1000)
for part in parts:
for lb in part.partLabelList:
if str(lb) in labelData:
label = labelData[str(lb)]
break
if part.key() in product.defaultParts:
typeDict[part.type.typeId]['default'] = label
elif part.key() in product.optionalParts:
typeDict[part.type.typeId]['optional'].append(label)
memcache.add(cache_key, typeDict, 500)
partData = memcache.get(cache_key)
productDict['parts'] = partData
productList.append(productDict)
The result is much better. I now have about 3000ms with out memcache and about 700ms with.
I'm still abit worried about the 3000ms, and on the local app_dev server the memcache gets filled up for each reload. Shouldn't put everything in there and then read from it?
Last but not least, does anyone know why the request take about 10x as long on the production server the the app_dev?
EDIT 3:
I noticed that non of the db.Model are indexed, could this make a differance?
EDIT 4:
After consulting AppStats (And understanding it, took some time. It seams that the big problems lies within part.type.typeId where part.type is a db.ReferenceProperty. Should have seen it before. And maybe explained it better :) I'll rethink that part. And get back to you.
..fredrik
A few simple ideas:
1) Since you need all the results, instead of doing a for loop like you have, call fetch() explicitly to just go ahead and get all the results at once. Otherwise, the for loop may result in multiple queries to the datastore as it only gets so many items at once. For example, perhaps you could try:
return ProductParts.gql('WHERE __key__ IN :key', key = self.defaultParts).fetch(1000)
2) Maybe only load part of the data in the initial request. Then use AJAX techniques to load additional data as needed. For example, start by returning the product information, and then make additional AJAX requests to get the parts.
3) Like Will pointed out, IN queries perform one query PER argument.
Problem: An IN query does one equals query for each argument you give it. So key IN self.defaultParts actually does len(self.defaultParts) queries.
Possible Improvement: Try denormalizing your data more. Specifically, store a list of products each part is used in on each part. You could structure your Parts model like this:
class ProductParts(db.Model):
...
products = db.ListProperty(db.Key) # product keys
...
Then you can do ONE query to per product instead of N queries per product. For example, you could do this:
parts = ProductParts.all().filter("products =", product).fetch(1000)
The trade-off? You have to store more data in each ProductParts entity. Also, when you write a ProductParts entity, it will be a little slower because it will cause 1 row to be written in the index for each element in your list property. However, you stated that you only have 100 products so even if a part was used in every product the list still wouldn't be too big (Nick Johnson mentions here that you won't get in trouble until you try to index a list property with ~5,000 items).
Less critical improvement idea:
4) You can create the GqlQuery object ONCE and then reuse it. This isn't your main performance problem by any stretch, but it will help a little. Example:
GQL_PROD_PART_BY_KEYS = ProductParts.gql('WHERE __key__ IN :1')
#property
def defaultPartsData(self):
return GQL_PROD_PART_BY_KEYS.bind(self.defaultParts)
You should also use AppStats so you can see exactly why your request is taking so long. You might even consider posting a screenshot of appstats info about your request along with your post.
Here is what the code might look like if you re-wrote it fetch the data with fewer round-trips to the datastore (these changes are based on ideas #1, #3, and #4 above).
GQL_PARTS_BY_PRODUCT = ProductParts.gql('WHERE products = :1')
for product in productData:
productDict = {}
typeDict = {}
productDict['productName'] = product.name
cache_key = 'productparts_%s' % (slugify(product.key()))
partData = memcache.get(cache_key)
if not partData:
for type in typeData:
typeDict[type.typeId] = { 'default' : '', 'optional' : [] }
# here's a new approach that does just ONE datastore query (for each product)
GQL_PARTS_BY_PRODUCT.bind(product)
parts = GQL_PARTS_BY_PRODUCT.fetch(1000)
for part in parts:
if part.key() in self.defaultParts:
part_type = 'default'
else:
part_type = 'optional'
for label in labelsForLangCode:
if label.key() in defaultPart.partLabelList:
typeDict[defaultPart.type.typeId][part_type] = label.partLangLabel
# (end new code)
memcache.add(cache_key, typeDict, 500)
partData = memcache.get(cache_key)
productDict['parts'] = partData
productList.append(productDict)
One important thing to be aware of is the fact that IN queries (along with != queries) result in multiple subqueries being spawned behind the scenes, and there's a limit of 30 subqueries.
So your ProductParts.gql('WHERE __key__ IN :key', key = self.defaultParts) query will actually spawn len(self.defaultParts) subqueries behind the scenes, and it will fail if len(self.defaultParts) is greater than 30.
Here's the relevant section from the GQL Reference:
Note: The IN and != operators use multiple queries behind the scenes. For example, the IN operator executes a separate underlying datastore query for every item in the list. The entities returned are a result of the cross-product of all the underlying datastore queries and are de-duplicated. A maximum of 30 datastore queries are allowed for any single GQL query.
You might try installing AppStats for your app to see where else it might be slowing down.
I think the problem is one of design: wanting to construct a relational join table in memcache when the framework specifically abhors that.
GAE will toss your job out because it takes too long, but you shouldn't be doing it in the first place. I'm a GAE tyro myself, so I cannot specify how it should be done, unfortunately.

Categories

Resources