I am stuck at a problem
What I want to do is once a certain threshold is reached I want to trigger a ticket on KibanaHud from my python code.
I am creating a json file with all the data that I need for the ticket -> ticket.json
I am also using curl Xpost to create index
curl -XPUT 'http://localhost:9200/ticket_tweet/' -d '
index:
number_of_shards: 5
number_of_replicas: 2
'
and then doing
curl -XPOST http://localhost:9200/ticket_tweet/rook_ticket -d #ticket.json
but getting error :
{"error":"UnavailableShardsException[[ticket_tweet][3] Not enough
active copies to meet write consistency of [QUORUM] (have 1, needed
2). Timeout: [1m], request: index
{[ticket_tweet][rook_ticket][AU2zD8QRdqkd3i74WG-f]
the error was of shards....i did some hit and try for the value of shards (since mapping was not available) and the problem got solved. if anyone has a better solution please provide
Related
Good afternoon. I ask your advice as to understand with a squid and to make friends with a python does not come out. I'm writing an asynchronous helper to a squid. Squid tuned to:
external_acl_type aclproxy3 ttl = 300 children-max = 1 concurrency = 100% LOGIN python -u /opt/agent/helper.py proxy3.
Squid sends requests to the helper by numbering them. ['0', 'data'], ['1', 'data']
The dock says that:
The helper receives lines expanded per the above format specification and for each input line returns 1 line starting with OK/ERR/BH result code and optionally followed by additional keywords with more details.
But I do not understand how to form an answer. In fact, the requests will come in the order of 1,2,3 and be fulfilled 2,1,3. So the answers need to be identified as well. But how?
At this stage, I solved the problem the way it was written on the gevent stack. In fact, all the requests are added first, then all are processed in an order and the result is OK / ERR in the same order in which they came, and if the 2nd and 3rd, th requests have been processed, then they wait for the end of the first order to answer all in order.
This is a dice, I understand. So I ask advice, can someone already dig this topic. Thanks for any hint
The answer was found in the documentation
When using the concurrency= option the protocol is changed by
introducing a query channel tag in front of the request/response.
The query channel tag is a number between 0 and concurrency-1.
This value must be echoed back unchanged to Squid as the first part
of the response relating to its request.
Blockquote
That is, we get on Stdin, for example: [0, data], [1, data] ...
and must return 1 OK, 0 ERR
I am having weird issues with Neo4j's legacy indexing, and got stuck today. I need full text support, as I wish to run a performance comparison against Solr (which uses Lucene full text) to see how the different data model compares.
I have been following a few guides online, as well as various posts around here on stack.
I had success up until yesterday, where all of a sudden I had corrupted index files, as range queries were returning invalid and inconsistent results. So I am trying to set in stone exactly the steps I need to take.
I use the CSV bulk import tool to populate my database with about 4 million nodes with the label "record", and various nodes with labels like "data:SPD", "data:DIR", "data:TS", etc (using 2 labels, to represent that they are ocean data nodes, for different types of measurements).
The data model is simple. I have:
(r:record {meta:M, time:T, lat:L1, lon:L2})-[:measures]-(d:data {value:V})
M is a ID-like string which I use to keep track of my data internally for testing purposes. T is an epoch time integer. L1 / L2 are geo-spatial coordinate floats. My data nodes represent various kinds of collected data and not all records have the same data nodes. (Some have temperatures, wind speeds, wind directions, sea temperatures, etc). These values are all represented as floats. Each data node has a second label that says what kind of data it contains.
After I complete the import, I open up the shell and execute the following sequence:
index --create node_auto_index -t Node
index --set-config node_auto_index fulltext
I have the following configuration added to the default neo4j.conf file (this is there even before the CSV bulk import happens):
dbms.auto_index.nodes.enabled=true
dbms.auto_index.nodes.keys=meta,lat,lon,time
Before today, I would see that the fulltext command indeed worked by querying the shell:
index --get-config node_auto_index
returned something like:
{
"provider": "lucene",
"type": "fulltext"
}
I ran a series of tests on my data using the MATCH clause recently. I understand that this uses the more modern, schema indexing. My results were fine, and returned the expected data.
I read somewhere that since my data was imported prior to legacy index creation, I needed to manually index the relevant properties by doing something like this:
START n=node(*)
WITH n SKIP {curr_skip} LIMIT {fixed_lim}
WHERE EXISTS(n.meta)
SET n.time=n.time, n.lat=n.lat, n.lon=n.lon, n.meta=n.meta
RETURN n
Since I have 4 million records, my python handler does this as a series of batch operations by upping the {curr_skip} each time by {fixed_lim} and executing the query until I get 0 results.
Upon transitioning to my tests which involve the START clause yesterday, I found that using a lucene query like:
START r=node:node_auto_index(lon:[{} TO {}]) RETURN count(r)
(with a filled in range) was giving me bad results. Data that I expected to be returned was not. Furthermore, different ranges were yielding strange results such. Range (a,b) might yield 1000 results, but (a-e,b+e), a superset of the previous range would yield 0 results!! However, the exact same style of queries on time and lat seemed to be working perfectly! Even more so, I could do a multi facted query like:
START r=node:node_auto_index(time:[{} TO {}] AND lat:[{} TO {}]) RETURN count(r)
My best guess, was that somehow I corrupted the index files for lon.
The recommendations I have found online are to stop the database, go to /path/to/graph.db, and remove all of index*, and restart the database. Upon following these instructions today, I have discovered more weird behavior. I re-exceuted the same index creation / configuration statements from above, but after querying the configuration, find that the index type remains as an "type": "exact". Even stranger, is that the index files are not actually being created! There is no index directory being created under path/to/graph.db.
I am certain I have started the shell correctly by using:
neo4j-shell -path /path/to/graph.db/
If I try use index --create node_auto_index -t Node, I get an already exists notification, when it clearly does not.
For now, I think I am just going to start from scratch again and see if I can either reproduce these errors, or somehow bypass them.
Otherwise, if anyone with experience here has any idea of what might be going wrong, I would greatly appreciate some input!
UPDATE:
So I went ahead and started from scratch.
# ran my bulk import code
python3
>>> from mylib.module import load_data()
>>> load_data()
>>> # ... lots of printed stuff ...
IMPORT DONE in 3m 37s 950ms.
Imported:
15394183 nodes
15394171 relationships
27651625 properties
Peak memory usage: 361.94 MB
>>> exit()
# switched out my new database
cd /path/to/neo4j-community-3.1.0
mv data/databases/graph.db data/databases/oldgraph.db
mv data/databases/newgraph.db data/databases/graph.db
# check neo4j is off
ps aux | grep neo
# neo4j shell commands
bin/neo4j-shell -path data/databases/graph.db/
... some warning about GraphAware Runtime disbaled.
... the welcome message
neo4j-sh (?)$ index --create node_auto_index -t Node
neo4j-sh (?)$ index --set-config node_auto_index fulltext
INDEX CONFIGURATION CHANGED, INDEX DATA MAY BE INVALID
neo4j-sh (?)$ index --get-config node_auto_index -t Node
{
"provider": "lucene",
"type": "exact"
}
neo4j-sh (?)$ exit # thought maybe I just had to restart
# try again
bin/neo4j-shell -path data/databases/graph.db/
neo4j-sh (?)$ index --get-config node_auto_index -t Node
{
"provider": "lucene",
"type": "exact"
}
neo4j-sh (?)$ index --set-config node_auto_index fulltext
INDEX CONFIGURATION CHANGED, INDEX DATA MAY BE INVALID
neo4j-sh (?)$ index --get-config node_auto_index -t Node
{
"provider": "lucene",
"type": "exact"
}
# hmmmmm
neo4j-sh (?)$ index --create node_auto_index -t Node
Class index 'node_auto_index' alredy exists
# sanity check
neo4j-sh (?)$ MATCH (r:record) RETURN count(r);
+----------+
| count(r) |
+----------+
| 4085814 |
+----------+
1 row
470 ms
neo4j-sh (?)$ exit
As you can see, even after recreating a fresh database, I am not able to activate a fulltext index now. I have no idea why it worked a few days prior and not now, as I am the only one who is working on this server! Perhaps I will have to even reinstall neo4j as a whole.
UPDATE / IDEA:
Ok I have a potential idea as to my problem, and I think it may be permissions related. I have a dashboard.py module which I have been using to orchestrate turning on/off solr and neo4j. The other day, I had some weird issues with not being able to execute the start/stop sequences from within my shell, so I messed with a lot of permissions.
Lets call me userA. I belong to groups groupA, groupB.
I remember running the following yesterday:
sudo chown -R $USER:groupB neo4j-community-3.1.0
I have noticed that all of the new database files my python scripts are producing belong to group groupA. Could this be the culprit?
I am having the weird error again where I can't recreate the index because it thinks it still exists after i deleted it. I am going to rerun the bulk import once again, and fix these permissions prior trying to set the full text index. Will update tonight.
EDIT:
This did not seem to have an effect :(
I even tried chowning everything to root, both user and group to no avail. My lucene index will not change from exact to fulltext.
I am going to go ahead and do a full reinstall of everything now.
UPDATE:
Not even a full reinstall has worked.
I removed my entire neo4j-community-3.1.0 folder, and unpacked the tarball I had.
I set ownership of the entire folder to my own (because it was nfsnobody previously):
chown -R $USER:mygroup neo4j-community-3.1.0
I added the two lines to neo4j.conf:
dbms.auto_index.nodes.enabled=true
dbms.auto_index.nodes.keys=meta,lat,lon,time
I imported the data via bulk import tool, then did the same index creation / configuration commands as before. The index is still reporting that it is using an exact lucene index after it tells me the configuration changed.
I am at an utter loss here. Maybe I will just go ahead and try the START clauses tests I have anyways and see if they work.
UPDATE:
WOOOOW. I figured out my exact->fulltext issue!!!
The command:
index --set-config node_auto_index fulltext
needed to be:
index --set-config node_auto_index type fulltext
Incredible. What a doozy. The output message about the index being changed is really what threw me off, thinking that the command was being run correctly and that some other problem was at hand. Should I throw in a request on git for this? Is this command actually changing the index at all if I don't include type?
As for the invalid range queries, I am going to test this further soon. I believe that when I ran some code the first time around, I had a bug in my python handler that didn't loop over all the results, effectively missing out on some node's during manual indexing. Once I finish this process again, I will run my tests to check the results.
I'm trying to execute the following code with dumbo(Python) / haddop
https://github.com/klbostee/dumbo/wiki/Short-tutorial#jobs-and-runners
I followed the tutorial correctly, I have done every step but when I run code in hadoop environment I obtain as output as follows:
SEQ/org.apache.hadoop.typedbytes.TypedBytesWritable/org.apache.hadoop.typedbytes.TypedBytesWritable�������ޭǡ�q���%�O��������������172.16.1.10������������������172.16.1.12������������������172.16.1.30������
It should return a list of IP addresses with connections counter.
Why those characters appear? Is it an encoding problem? How do I fix it? Thanks
Also if I try other programs in the tutorial, I have the same problem.
I answer by myself. That output is the serialized form of Dumbo. There is no error.
To convert it into a readable text, it's sufficient the follow command (the answer was in the tutorial ! I don't saw it)
dumbo cat ipcounts/part* -hadoop /usr/local/hadoop | sort -k2,2nr | head -n 5
I am trying to use the requests library in Python to push data (a raw value) to a firebase location.
Say, I have urladd (the url of the location with authentication token). At the location, I want to push a string, say International. Based on the answer here, I tried
data = {'.value': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
I get <Response [400]>. p.text gives me:
u'{\n "error" : "Invalid data; couldn\'t parse JSON object, array, or value. Perhaps you\'re using invalid characters in your key names."\n}\n'
It appears that they key .value is invalid. But that is what the answer linked above suggests. Any idea why this may not be working, or how I can do this through Python? There are no problems with connection or authentication because the following works. However, that pushes an object instead of a raw value.
data = {'name': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
Thanks for your help.
The answer you've linked is a special case for when you want to assign a priority to a value. In general, '.value' is an invalid name and will throw an error.
If you want to write just "International", you should write the stringified-JSON version of that data. I don't have a python example in front of me, but the curl command would be:
curl -X POST -d "\"International\"" https://...
Andrew's answer above works. In case someone else wants to know how to do this using the requests library in Python, I thought this would be helpful.
import simplejson as sjson
data = sjson.dumps("International")
p = requests.post(urladd, data = data)
For some reason I had thought that the data had to be in a dictionary format before it is converted to stringified JSON version. That is not the case, and a simple string can be used as an input to sjson.dumps().
need help in formulating a basic elasticutils search query
I am trying to test elasticutils mainly becz I was not able to get optimum performance for #searches results / second. ( more details : here )
So far here is what I have done.
es=get_es(hosts=['localhost:9200'],timeout=30,default_indexes=['ncbi_taxa_names'],dump_curl=CurlDumper())
es.get_indices()
# [2012-08-22T15:36:10.639102]
curl -XGET
http://localhost:9200/ncbi_taxa_names/_status
Out[26]: {u'ncbi_taxa_names': {'num_docs': 1316005}}
S().indexes('ncbi_taxa_names').values_dict()
Out[27]: [{u'tax_name': u'Conyza sp.', u'tax_id': u'41553'}, ...
so what I want to do is formulate a query where I can search for { "taxa_name":"cellvibrio"} and then do some comparison to how many search results I can retrieve with elasticutils compared to pyes.
May be it has something to do with the way the ES is running locally and not with the API's.
Update1
I tried the following and the searh results are still very similar to what I am getting from pyes. Now I am beginning to wonder whether it has something to do with how the local ES is running . Still need help figuring that out.
es=get_es(hosts=['localhost:9200'],timeout=30,default_indexes=['ncbi_taxa_names'],dump_curl=CurlDumper())
es.get_indices()
# [2012-08-22T15:36:10.639102]
curl -XGET
http://localhost:9200/ncbi_taxa_names/_status
Out[26]: {u'ncbi_taxa_names': {'num_docs': 1316005}}
s=S().indexes('ncbi_taxa_names').values_dict()
Out[27]: [{u'tax_name': u'Conyza sp.', u'tax_id': u'41553'}, ...
results = s.query(tax_name='aurantiacus') # using elasticutils
Appreciate your help.
Thanks!