Awesome! Any chance you feel like contributing it, I'm sure ppl would be thrilled 🙂
Hi ReassuredOwl55
How would I find Tasks that have the same code with different inputs/parameters?
Assuming you have the git repo
you can do:Task.query_tasks(..., task_filter={'_all_'=dict(fields=['script.repository'], pattern='github.com/user/repo'))
wdyt?
this issue on when trying to set up on our remote machines
You mean setting up the trains-server on remote machine?
Let me check the API reference
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all
So not straight query, but maybe:
https://clear.ml/docs/latest/docs/references/api/endpoints#post-tasksget_all_exall
section might do the trick.
SuccessfulKoala55 any chance you have an idea on what to pass there ?
DilapidatedDucks58 no don't say that, you are wonderful 😉
trains-agent --gpus 0 --queue my_queue -d
should create a worker machine:gpu0
Then you can do trains-agent --gpus 1 --queue my_queue -d
which will create machine:gpu1
Hi SuperiorDucks36
you have such a great and clear GUI
😊
I personally would love to do it with a CLI
Actually a lot of stuff are harder to get from UI (like current state of your local repository etc.) But I think your point stands 🙂 We will start with CLI, because it is faster to deploy/iterate, then when you guys say this is a winner we will have a wizard in the UI.
What do you think?
Hi AbruptHedgehog21
can you send the two models info page (i.e. the original and the updated one) ?
do you see the two endpoints ?
BTW: --version would add a version to the model (i.e. create a new endpoint with version "endpoint/{version}"
This means that if something happens with the k8s node the pod runs on,
Actually if the pod crashed (the pod not the Task) k8s should re spin it, no?
I also experience that if a worker pod running a task is terminated, clearml does not fail/abort the task.
From the k8s perspective, if the task ended (failed/completed) it always return with exit code 0, i.e. success. Because the agent was able to spin the Task. We do not want Tasks with exception to litter the k8s with endless r...
SuperiorDucks36 from code ? or UI?
(You can always clone an experiment and change the entire thing, the question is how will you get the data to fill in the experiment, i.e. repo / arguments / configuration etc)
There is a discussion here, I would love to hear another angle.
https://github.com/allegroai/trains/issues/230
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
Hi ReassuredTiger98
Are you running the agent in venv mode ?
Does this mean that I need to create multiple ssh keys? 1 key for each user?
I think so
Use .git-credentials
This might also support multiple user/repo
How would one do this? Do I just share a link to the experiment, like
See "Share" in the right click menu on the experiment
Fixed in pip install clearml==1.8.1rc0
🙂
You can try callingtask._update_repository()
I'm still trying to figure out how to reproduce it...
you mean in the enterprise
Enterprise with the smarter GPU scheduler, this is inherent problem of sharing resources, there is no perfect solution, you either have fairness, but then you get idle GPU's of you have races, where you can get starvation
Hi MistakenDragonfly51
Hello everyone! First, thanks a lot to everyone that made ClearML possible,
❤
To your questions 🙂
long story short, no unless you really want to compile the dockers, which I can't see the real upside here Yes, add the following /opt/clearml.conf:/root/clearml.conf
herehttps://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L154
and configure your hosts " /opt/clearml.conf"
with ...
ThickDove42 Windows conda python3.6 was exactly what I was using,
started the jupyter with:
"python -m jupyter notebook"
Then opened / created a new notebook, everything worked.
Tested on latest clearml 0.17.2
Maybe it's something with the path to the repo that breaks it? Because obviously the issue is it is looking in the wrong folder.
Hi CloudySwallow27
how can I just "define" it on my local PC, but not actually run it.
You can use the clearml-task
CLI
https://clear.ml/docs/latest/docs/apps/clearml_task#how-does-clearml-task-work
Or you can add the following line in your code, that will cause the execution to stop, and to continue on a remote machine (basically creating the Task and pushing it into an execution queue, or just aborting it)task = Task.init(...) task.execute_remotely()
https://clear.ml/do...
ThickDove42 Windows also works 😞
Any specifics on the setup?
Since you are running in venv mode, adding the OS environment before the clearml-agent, will basically make sure it will propagate to the process itself.
ReassuredTiger98 make sense ?
AdventurousRabbit79 are you passing cache_executed_step=False
to the PipelineController ?
https://github.com/allegroai/clearml/blob/332ceab3eadef4997e897d171957975a247a6dc1/clearml/automation/controller.py#L129
Could you send a usage example ?
my pipeline controller always updates to the latest git commit id
This will only happen if the Task the pipeline creates has no specific commit ID, and instead just uses the latest from the git repo. Is this the case ?
Right, so this "vault" design is built into the paid tiers of ClearML to achieve exactly that. Long story short, users can put their credentials/configs on the clearml-server and the agent (or the clients) will pull and merge them into the execution.
It's very cool and works really nice, but not part of the open source (or the SaaS tier).
What you could do is store these configurations on the Task itself (one way o r another). Maybe for example have an empty definitions.py
file part of ...
- ...that file and the logs of the agent service always say the same thing as before:
Oh in that case you need feel in Your credentials here:
https://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L137
Basically CLEARML_API_ACCESS_KEY / CLEARML_API_SECRET_KEY will let the agent running inside the docker talk to the server itself. Just put your own credentials there as a start, it should solve the issue
unless the domain is different
?
Imagine that you are working with both github and bitbucket for example, if you are using git-ssh than git will know which of the domains to send the key to. Currently there is a single user/pass entry so all domains will get the same credentials. But I think this is a rare use case.
Hi @<1715175986749771776:profile|FuzzySeaanemone21>
and then run "clearml-agent daemon --gpus 0 --queue gcp-l4" to start the worker.
I'm assuming the docker service cannot spin a container with GPU access, usually this means you are missing the nvidia docker runtime component
Hi FantasticPig28
or does every individual user have to configure their own minio credentials?
You can configure the clients files
entry in the clearml.conf (or use an OS environment)files_server: "
"
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/docs/clearml.conf#L10
Notice to make sure you also provide credentials here:
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/docs/clearml.conf#L97