Reputation
Badges 1
981 × Eureka!ha wait, I removed the http:// in the host and it worked ๐
No space at the end of the diff file:
` diff --git a/configs/2.2.2_from_scratch.yaml b/configs/2.2.2_from_scratch.yaml
index 9fece48..5816f78 100644
--- a/configs/2.2.2_from_scratch.yaml
+++ b/configs/2.2.2_from_scratch.yaml
@@ -136,7 +136,7 @@ data_processing:
optimizer:
type: 'RMSprop'
args:
- lr: 2.5e-4
- lr: 1.5e-5
momentum: 0
weight_decay: 0 `
Ok, now I get ERROR: No matching distribution found for conda==4.9.2 (from -r /tmp/cached-reqscaw2zzji.txt (line 13))
And since I ran the task locally with python3.9, it used that version in the docker container
In all the steps I want to store them as artifacts to s3 because itโs very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps
So I want to be able to visualise it quickly as a table in the UI and be able to download it as a dataframe, which of report_media or artifact is better?
I am now trying with agent.extra_docker_arguments: ["--network='host'", ] instead of what I shared above
No, they have different names - I will try to update both agents to the latest versions
SuccessfulKoala55 I was able to make it work with use_credentials_chain: true in the clearml.conf and the following patch: https://github.com/allegroai/clearml/pull/478
Ok yes, I get it, this info is also available at the very beginning of the logs, where the agent logs the full docker run command, this docker_cmd is a shorter version?
I have no idea what's going on
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
Hi AgitatedDove14 , I upgraded to 1.3.1 and the bug of missing logs in the console is still thereโฆ ๐
I made another recording so that you can understand what it is about:
I enqueue a task the task starts, the logs shown in the console are very sparse I scroll up and down to try to fetch missing logs, without success I download the logs, open the file and there I see the full logs
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: falseSo it states that IAM role using metadata service should be supported, right?
Hi DeterminedCrab71 Version: 1.1.1-135 โข 1.1.1 โข 2.14
Thanks for the clarification SuccessfulKoala55 ! A follow-up question:
I would like to install several packages (opencv, numpy, torch) in the system-site-packages so that they are available in each experiment (to reduce setup time of the experiments). Installing them globally via
(docker was install with sudo snap install docker )
AgitatedDove14 Yes I have the xpack security disabled, as in the link you shared (note that its xpack.security.enabled: "false" with brackets around false), but this command throws:
{"error":{"root_cause":[{"type":"parse_exception","reason":"request body is required"}],"type":"parse_exception","reason":"request body is required"},"status":400}
haa got it, I am on a self hosted server, thatโs why I donโt see it
Hi CostlyOstrich36 , this weekend I took a look at the diffs with the previous version ( https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0# ) and I saw several changes related to the scrolling/logging:
apiserver/bll/event/ http://log_events_iterator.py apiserver/bll/event/ http://events_iterator.py apiserver/config/default/services/_mongo.conf apiserver/database/model/ http://base.py apiserver/services/ http://events.pyI suspect that one of these changes might be responsible ...
Well actually I do see many errors like that in the browser console:
So two possible cases for trains-agent-1: either:
It picks a new experiment -> show randomly one of the two experiments in the "workers" tab no new experiment in default queue to start -> show randomly no experiment or the one that it is running
by mistake I have two agents started in one machine
Answering myself: Yes, Task.set_base_docker RTFM!!!