Reputation
Badges 1
979 × Eureka!I made sure before deleting the old index that the number of docs matched
Should I try to disable dynamic mapping before doing the reindex operation?
my agents are all .16 and I install trains 0.16rc2 in each Task being executed by the agent
SuccessfulKoala55 Here is the trains-elastic error
how would it interact with the clearml-server api service? would it be completely transparent?
That said, you might have accessed the artifacts before any of them were registered
I called task.wait_for_status() to make sure the task is done
Yes, but I am not certain how: I just deleted the /data folder and restarted the server
You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data
So I created a symlink in /opt/train/data -> /data
it worked for the other folder, so I assume yes --> I archived the /opt/trains/data/mongo, sent the archive via scp, unarchived, updated the rights and now it works
Ok, now I would like to copy from one machine to another via scp, so I copied the whole /opt/trains/data folder, but I got the following errors:
Early debugging signals show that auto_connect_frameworks={'matplotlib': False, 'joblib': False}
seem to have a positive impact - it is running now, I will confirm in a bit
you mean to run it on the CI machine ?
yes
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
It just to test that the logic being executed in if not Task.running_locally()
is correct
AppetizingMouse58 btw I had to delete the old logs index before creating the alias, otherwise ES wonβt let me create an alias with the same name as an existing index
I think that somehow somewhere a reference to the figure is still living, so plt.close("all") and gc cannot free the figure and it ends up accumulating. I don't know where yet
Also what is the benefit of having by default index.number_of_shards = 1
for the metrics and the logs indices? Having more allows to scale and later move them in separate nodes if needed - the default heap size being 2Gb, it should be possible, or?
I am using 0.17.5, it could be either a bug on ignite or indeed a delay on the send. I will try to build a simple reproducible example to understand to cause
Alright, thanks for the answer! Seems legit then π
Indeed, I actually had the old configuration that was not JSON - I converted to json, now works π
` trains-elastic | {"type": "server", "timestamp": "2020-08-12T11:01:33,709Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "trains", "node.name": "trains", "message": "uncaught exception in thread [main]",
trains-elastic | "stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];",
trains-elastic | "at org.elasticsearc...
Could you please point me to the relevant component? I am not familiar with typescript unfortunately π
` # Set the python version to use when creating the virtual environment and launching the experiment
# Example values: "/usr/bin/python3" or "/usr/local/bin/python3.6"
# The default is the python executing the clearml_agent
python_binary: ""
# ignore any requested python version (Default: False, if a Task was using a
# specific python version and the system supports multiple python the agent will use the requested python version)
# ignore_requested_python_version: ...
Should I open an issue in github clearml-agent repo?