Reputation
Badges 1
113 × Eureka!Not a solution, but just curious: why would you need that many "debug" images ?
Those are images automatically generated by your training code that ClearML automatically upload them. May be disable auto upload image during Task Init ?
nice !! That is exactly what I am looking for !!
oh, looks like I need to empty the Installed Package before enqueue the cloned task
Yes. I am investigating that route now.
note: you will need to set the env var very early, before the first import clearml in your code
got it
Thanks @<1523701070390366208:profile|CostlyOstrich36>
@<1523703436166565888:profile|DeterminedCrab71> Thansk for the suggestion. But no effect.
We already have client_max_body_size 0; in the server section
I tried to set both http and server section 100M but nothing changes.
Do you think the gzip be related ?
I did.
I am now redeploying to new container to be sure.
Are the uncommit changes in un-tracked files ?
In other words: clearml will only save uncommited changes from files that are tracked by your local git repo
Task.export_task() will contains what you are looking for.
In this case ['script']['diff']
ohhhh ! Thanks ! It works
Should I get all the workers None
Then go through them and count how many is in my queue of interest ?
@<1523703436166565888:profile|DeterminedCrab71> Found the solution:
We needed proxy_set_header Connection ""; in the location /api section. Putting it above in the server section had no effect
will send the nginx -T results once the container is deployed
yes, I did try that exactly: all curl request succeed
Deployed fresh new and ran nginx -T in the container:
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
# configuration file /etc/nginx/nginx.conf:
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
error_log stderr notice;
events {
worker_connections 768;
# multi_accept on;
}
http {
client_max_body_size 100M;
rewrite_l...
and I tried the curl against the web url: and it fails 8/10
meanwhile, the SDK support CLEARML_CONFIG_FILE=/path
Not sure what is your use case, but if you want it to be dynamic, you can on-the-fly create the config file to /tmp for example and point to that in your code with
import os
os.environ['CLEARML_CONFIG_FILE']="/path"
import clearml
I confirm that clearml handle properly -e installed package. That is very nice because package get updated automatically without touch requirements.txt 😄
Your clearml worker need to be setup in order to have access to the git repo.
this looks like the agent running inside your docker did not have any username/password to do git clone. so the default behavior is to wait for keyboard input: which look like hanging ....
I know that git clone and pip verify all installed is normal. But for some reason in Michael screenshot, I don't see those steps ...
@<1523701087100473344:profile|SuccessfulKoala55> Yes, I am aware of that one. It build docker container ... I wanted to build without docker. Like when clearml-agent run in non-docker mode, it is already building the running env inside it caching folder structure. I was wondering if there was a way to stop that process just before it execute the task .py
Is it because Azure is "whitelisted" in our network ? Thus need a different certificate ?? And how do I provide 2 differents certificate ? Is bundling them simple as a concat of 2 pem file ?
please provide the full logs and error message.
Looks like your issue is not that ClearML is not tracking your changes but more about your Configuration is overwrriten.
This often happen to me. The way I debug this is put a lot of print statement along the code to track when the Configuration is overwriten and narrow down why. print statement will show up in the Console tab.
not sure if related but clearml 1.14 tend to not "show" the gpu_type
You can use single PC and have multi agent running in the same time, each assigned one or multi GPU.
You likely to hit CPU bottleneck, depending on how much augmentation you are applying when training ....
I will try it. But it's a bit random when this happen so ... We will see
just saw that repo: who are coder ? That not the vscode developer team is it ?