
Reputation
Badges 1
25 × Eureka!By the way, will downloading still happen if the datasets is available in the cache folder?
If it is cached, then there is no need to re-download π
Happy new year @<1618780810947596288:profile|ExuberantLion50>
- Is this the right place to mention such bugs?Definitely the right place to discuss them, usually if verified we ask to also add in github for easier traceability / visibility
m (i.e. there's two plots shown side-by-side but they're actually both just the first experiment that was selected). This is happening across all experiments, all my workspaces, and all the browsers I've tried.
Can you share a screenshot? is this r...
This is so odd,
could you add prints right after the Task.init?
Also could you verify it still gets stuck with the latest RC
clearml==1.16.3rc2
Basically the links to the file server are saved in both mongo and elastic, so as long as these are host:ip based, at least in theory it should work
GreasyPenguin14 GrittyKangaroo27 the new release contains a fix, could you verify it solves the issue in your scenario as well (there is now a smart timeout to detect the inconsistent state, that means the close/exit procedure might be delayed (10sec) instead of hanging in these specific rare scenarios)
GreasyPenguin14 I think this is what you are looking forTask.get_project_id('project_name')
quick update 1.0.2 will be ready in an hour, apologies π
I think the real issue is that I am not able to specify a platform for the model,
None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background
. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
Okay, so basically set a template for the pod, specifying the docker image. Make sure you pass the correct trains-server configuration (i.e. api/web/file server addresses and credentials), and select the queue name the agent will listen to.
container image / details
https://hub.docker.com/r/allegroai/trains-agent
https://github.com/allegroai/trains-agent/tree/master/docker/agent
Full environment variable list to pass can be found here:
https://github.com/allegroai/trains-server/blob/...
Hi SmarmySeaurchin8
, I was wondering if I could change the commit id to the current one as well.
Actually that would be possible, but will need a bit of code to support controlling Task properties (not just configuration parameters)
How can I do that without running this Task by it's own?
Assuming you have a committed code that already supports it. You can clone the executed Task, and then change the commit ID to the "latest on branch" (see drop down when editing)
Would t...
basically
would allow blocking the machine from being scaled-in when
Oh this is what I was missing π That makes sense to me!
So what you are saying is that the AWS autoscaler agent, when it is launching a Task, inside the container you will set "protection flag" when the Task ends, you will unset "protection flag"
Is that correct?
DilapidatedDucks58 You might be able to, check the links, they might be embedded into the docker, so you can map diff png file from the host π
BTW: what would you change the icons to?
I was thinking mainly about AWS.
Meaning S3?
Hi DisgustedDove53
Is redis used as permanent data storage or just cache?
Mostly cache (Ithink)
Would there be any problems if it is restarted and comes up clean?
Pretty sure it should be fine, why do you ask ?
Pretty confusing that neither
services
StickyLizard47 basically this is how a services queue agent should be spinned:
https://github.com/allegroai/clearml-server/blob/9b108740da21f25407bd2c59583ca1c86f8e1faa/docker/docker-compose.yml#L123
When spinning on a k8s cluster, this is a bit more complicated, as it needs to work with the clearml-k8s-glue.
See here how to spin it on k8s
https://github.com/allegroai/clearml-agent/tree/master/docker/k8s-glue
If we setup a ingress with MetalLB or Nginx, and added LoadBalancer into the template yaml, do you think this will work?
I would configure the k8s glue pod template to have "Service" port forward to the pod's 10022 port (default SSH port for the clearml-session), basically allowing the k8s ingest to allocate a port to the pod.
To test if it worked, spin the clearml session, and try to SSH to the external IP:port.
Once that works you can basically tell the clearml-session client which ...
Ohh yes, if you deleted the token then you have to recreate the cleaml.conf
BTW: no need to generate a token, it will last π
So essentially, the server helm chart creates randomly generated secret pair and deploys it as a shared k8 secret that pods can access.
This is the tricky part, for the helm chart to be able to create it, it means it can login to the server it means there is a secret embedded in the helm chart that lets you access the default server. you see my point ?
Hmm CourageousLizard33 seems you stumbled on a weird bug,
This piece of code only tries to get the username of the current UID, but since you are running inside a docker and probably set the environment UID but there is no "actual" UID by that number on /etc/passwd , and so it cannot resolve it.
I'm attaching a quick fix, please let me know if it solved the problem.
I'd like to make sure we have it in the next RC as soon as possible.
The fact is that I use docker for running clearml server both on Linux and Windows.
My question was on running the agent, is it running with --docker
flag, i.e. docker mode
Also, just forgot to note, that I'm running clearml-agent and clearml processes in virtual environment - conda environment on Windows and venv on Linux.
Yep that answers my question above π
Does it make any sense to chdngeΒ
system_site_packages
Β toΒ
true
Β if I r...
I'm assuming you are building for x86
? Do you have a link how to setup a task scheduler to run in service mode in k8s?
basically spin the agent pod and add an argument to the agent itself (this is the --service-mode)
https://clear.ml/docs/latest/docs/clearml_agent#services-mode
So I'd create the queue in the UI, then update the helm yaml as above, and install? How would I add a 3rd queue?
Same process?!
Also I'd like to create the queues pragmatically, is that possible?
Yes, you can, you can also pass an argument for the agent to create the queue if it does not already exist, just add --create-queue
to the agent execution commandline
Btw it seems the docker runs in
network=host
Yes, this is so if you have multiple agents running on the same machine they can find a new open port π
I can telnet the port from my mac:
Okay this seems like it is working
Hi @<1523701066867150848:profile|JitteryCoyote63>
Hi, how does
agent.enable_git_ask_pass
works
basically it pushes the pass through stdin to git when it asks (it is a git feature)
We would "donate" back to the community a docker stack template that can be used to set up the community edition.
Perfect, feel free to PR to the clearml-server repository, we can take it from there
π π π