Reputation
Badges 1
979 × Eureka!Hi TimelyPenguin76 , any chance this was fixed? 🙂
I will try addingsudo sh -c "echo '\n* soft nofile 65535\n* hard nofile 65535' >> /etc/security/limits.conf"
to the extra_vm_bash_script
, maybe that’s enough actually
region is empty, I never entered it and it worked
I’ve set dynamic: “strict” in the template of the logs index and I was able to keep the same mapping after doing the reindex
AppetizingMouse58 btw I had to delete the old logs index before creating the alias, otherwise ES won’t let me create an alias with the same name as an existing index
ok, and if not the case, it will fall back to 3.8, right? Would it be possible to support such use case? (have the clearml-agent setting-up a different python version when a task needs it?)
even if I explicitely use previous_task.output_uri = "
s3://my_bucket "
, it is ignored and still saves the json file locally
mmmh there is no closing of the task happening at that point. Note that just before the task.upload_artifact, I call task.logger.report_table("Metric summary", "Metric summary", 0, df_scores)
, if that matters
I also don't understand what you mean by unless the domain is different...
The same way ssh keys are global, I would have expected the git creds to be used for any git operation
Some context: I am trying to log an HTML file and I would like it to be easily accessible for preview
In all the steps I want to store them as artifacts to s3 because it’s very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps
AgitatedDove14 Yes that might work, also the first one (with conda) might work as well, I will give it a try, thanks!
(btw, yes I adapted to use Task.init(...output_uri=)
No space at the end of the diff file:
` diff --git a/configs/2.2.2_from_scratch.yaml b/configs/2.2.2_from_scratch.yaml
index 9fece48..5816f78 100644
--- a/configs/2.2.2_from_scratch.yaml
+++ b/configs/2.2.2_from_scratch.yaml
@@ -136,7 +136,7 @@ data_processing:
optimizer:
type: 'RMSprop'
args:
- lr: 2.5e-4
- lr: 1.5e-5
momentum: 0
weight_decay: 0 `
No worries! I asked more to be informed, I don't have a real use-case behind. This means that you guys internally catch the argparser object somehow right? Because you could also simply use sys argv to find the parameters, right?
line 13 is empty 🤔
Could be, but not sure -> from 0.16.2 to 0.16.3
AgitatedDove14 This seems to be consistent even if I specify the absolute path to /home/user/trains.conf
Done! Also I tried to use git cache ( https://git-scm.com/docs/git-credential-cache ) as a workaround (hoping that the first time it clones the experiment repo, it caches the creds for the next times, but I then get a different error: fatal: unable to find a suitable socket path; use --socket
)
AnxiousSeal95 Any update on this topic? I am very excited to see where this can go 🤩
Sure, it’s because of a very annoying bug that I shared in this https://clearml.slack.com/archives/CTK20V944/p1648647503942759 , that I couldn’t solve so far.
I’m not sure you can downgrade that easily ...
Yea that’s what I thought, that’s a bit of pain for me now, I hope I can find a way to fix the bug somehow
AgitatedDove14 I am actually considering rolling back to 1.1.0, so 1.3.0 is not really an option for now
I would like to try it to see if it solves some dependencies not found eventhough they are installed when using --system-site-packages
File "devops/valid.py", line 80, in valid(parse_args) File "devops/valid.py", line 41, in valid valid_task.output_uri = args.artifacts File "/data/.trains/venvs-builds/3.6/lib/python3.6/site-packages/trains/task.py", line 695, in output_uri ", check configuration file ~/trains.conf".format(value)) ValueError: Could not get access credentials for 's3://ml-artefacts' , check configuration file ~/trains.conf
` trains-elastic | {"type": "server", "timestamp": "2020-08-12T11:01:33,709Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "trains", "node.name": "trains", "message": "uncaught exception in thread [main]",
trains-elastic | "stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: AccessDeniedException[/usr/share/elasticsearch/data/nodes];",
trains-elastic | "at org.elasticsearc...
AgitatedDove14 Yes exactly, I tried the fix suggested in the github issue urllib3>=1.25.4
and the ImportError disappeared 🙂
I see what I described in https://allegroai-trains.slack.com/archives/CTK20V944/p1598522409118300?thread_ts=1598521225.117200&cid=CTK20V944 :
randomly, one of the two experiments is shown for that agent
Yes I did, I found the problem: docker-compose was using trains-server 0.15 because it didn't see the new version of trains-server. Hence I had trains-server 0.15 running with ES7.
-> I deleted all the containers and it successfully pulled trains-server 0.16. Now everything is running properly 🙂