Reputation
Badges 1
981 × Eureka!The main issue is the task_logger.report_scalar() not reporting the scalars
and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working
Yes, I guess that's fine then - Thanks!
I guess I can have a workaround by passing the pipeline controller task id to the last step, so that the last step can download all the artifacts from the controller task.
Thanks! (Maybe could be added to the docs ?) π
I tried removing type=str but I got same problem π
Indeed, I actually had the old configuration that was not JSON - I converted to json, now works π
both are repos for python modules (experiment one and dependency of the experiment)
ha wait, I removed the http:// in the host and it worked π
No space at the end of the diff file:
` diff --git a/configs/2.2.2_from_scratch.yaml b/configs/2.2.2_from_scratch.yaml
index 9fece48..5816f78 100644
--- a/configs/2.2.2_from_scratch.yaml
+++ b/configs/2.2.2_from_scratch.yaml
@@ -136,7 +136,7 @@ data_processing:
optimizer:
type: 'RMSprop'
args:
- lr: 2.5e-4
- lr: 1.5e-5
momentum: 0
weight_decay: 0 `
Ok, now I get ERROR: No matching distribution found for conda==4.9.2 (from -r /tmp/cached-reqscaw2zzji.txt (line 13))
And since I ran the task locally with python3.9, it used that version in the docker container
In all the steps I want to store them as artifacts to s3 because itβs very convenient.
The last step should merge them all, ie. it needs to know all the other artifacts of the previous steps
I want to make sure that an agent did finish uploading its artifacts before marking itself as complete, so that the controller does not try to access these artifacts while they are not available
So I want to be able to visualise it quickly as a table in the UI and be able to download it as a dataframe, which of report_media or artifact is better?
I am now trying with agent.extra_docker_arguments: ["--network='host'", ] instead of what I shared above
No, they have different names - I will try to update both agents to the latest versions
SuccessfulKoala55 I was able to make it work with use_credentials_chain: true in the clearml.conf and the following patch: https://github.com/allegroai/clearml/pull/478
Ok yes, I get it, this info is also available at the very beginning of the logs, where the agent logs the full docker run command, this docker_cmd is a shorter version?
I have no idea what's going on
Now, I know the experiments having the most metrics. I want to downsample these metrics by 10, ie only keep iterations that are multiple of 10. How can I query (to delete) only the documents ending with 0?
Hi AgitatedDove14 , I upgraded to 1.3.1 and the bug of missing logs in the console is still thereβ¦ π
I made another recording so that you can understand what it is about:
I enqueue a task the task starts, the logs shown in the console are very sparse I scroll up and down to try to fetch missing logs, without success I download the logs, open the file and there I see the full logs
I am confused now because I see in the master branch, the clearml.conf file has the following section:# Or enable credentials chain to let Boto3 pick the right credentials. # This includes picking credentials from environment variables, # credential file and IAM role using metadata service. # Refer to the latest Boto3 docs use_credentials_chain: falseSo it states that IAM role using metadata service should be supported, right?
I specified a torch @ https://download.pytorch.org/whl/cu100/torch-1.3.1%2Bcu100-cp36-cp36m-linux_x86_64.whl and it didn't detect the link, it tried to install latest version: 1.6.0
Hi DeterminedCrab71 Version: 1.1.1-135 β’ 1.1.1 β’ 2.14
Thanks for the clarification SuccessfulKoala55 ! A follow-up question:
I would like to install several packages (opencv, numpy, torch) in the system-site-packages so that they are available in each experiment (to reduce setup time of the experiments). Installing them globally via