Reputation
Badges 1
19 × Eureka!You mean this chart? None
The question is why do you need a custom certificate for Azure at all?
@<1523701083040387072:profile|UnevenDolphin73> at what point exactly? What's the scenario you're thinking of?
Well, if the machine you're installing on has a public name, you typically simply use it
That's because ClearML needs the hooks in that class to make sure any changes you make in that data structure are propagated back to the server when you're not in remote mode
Well, if you need an external IP, you'll probably want to configure the docker params to use the host network
Hi @<1541954607595393024:profile|BattyCrocodile47> , I'm not sure I understand - there's no relation between the docker compose for the server, and the autoscaler (which is a script using capabilities on the SDK)
I think you should try to manually start such a docker container and try to see what fails in the process. Attaching to an existing one has too many differences already
Are you using a self-hosted server? If so, what's the version? I have a feeling you're running v1.0.1 or v1.0.0 (as the "newer version" message on the top indicates). This error looks exactly like what was fixed on v1.0.2... (see https://clear.ml/docs/latest/docs/release_notes/ver_1_0#clearml-server-102 )
That's because you need to set up a clearml.conf file on your machine (where you run the autoscaler)
In any case, the watchdog setting can be controlled using the services.tasks.non_responsive_tasks_watchdog.threshold_sec
server configuration setting (default is 7200 seconds)
Hi @<1637624982261469184:profile|LittleCockroach89> , since in offline mode there's not connectivity tot he server, you can't query previously added objects, including datasets.
Also, from the log, I see your ES version is 5.6:at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-5.6.16.jar:5.6.16]
and I'm not sure why - in v0.16, the ES version is already 7.6, which makes me think your original version was v0.15, also in the new docker-compose the version is 7.16, so maybe you are not using the updated docker-compose file?
If you only want to move the server to another machine and not upgrade the server version, you should just copy the folder and restart the server there, but change nothing
Hi ColorfulBeetle67 ,
I also notice that this config can be mounted as a volume or secret or a configmap as suggested
That's indeed an option...
Another option is to pass these settings as environment variables, if that can suit your requirements
Oh... that's very strange
If it points to your own S3 server, it must have a port
This is what it does when you specify a port...
Hi @<1694157594333024256:profile|DisturbedParrot38> , which agent version are you using?
Also, can you make sure the vcs cache is cleared and share the complete task log?
MagnificentSeaurchin79 can you try with dataset.list_files('*.pkl', recursive=True)
?
ElegantCoyote26 did you install 1.24.1
previously (as described in step 4)? I just want to make sure the instructions are correct
@<1559711623147425792:profile|PlainPelican41> status reason 3 means the task status was changed mid-run
Is clearml-init also has to connect to the ClearML server to successfully finish?
Yes, it verifies the credentials in the same way, and creates a clearml.conf file when done
Also, I think getting the logs for the restarting docker container will help
Hi PompousParrot44 ,
Regarding the console error, do you still see it? There was a temporary issue with a service responding to this update check, so this error makes sense, but it's in no way critical and should not affect anything
HarebrainedBaldeagle11 this repository is deprecated. You need to refer to the new repository with the latest and greatest helm charts for ClearML 🙂
See https://github.com/allegroai/clearml-helm-charts