WittyOwl57 I think this is a great idea, can you open a feature issue on GitHub so this is not forgotten ?
BTW: regardless, if you have time to upgrade to the new the azure package upgrade, it will be great π this is on our to do list for a while, but since not a lot of users complained it got pushed ...
I'm not familiar with this one, I think you should be able to control it with:
None
CLEARML_AGENT__API__HTTP__RETRIES__BACKOFF_FACTOR
Hi @<1729309120315527168:profile|ShallowLion60>
Clearml in our case installed on k8s using helm chart (version: 7.11.0)
It should be done "automatically", I think there is a configuration var in the helm chart to configure that.
What urls are you urls seeing now, and what should be there?
Hi @<1529271085315395584:profile|AmusedCat74>
ClearML Scheduler where it doesn't reuse the task
What do you mean by doesn't reuse the Task, do you mean you want each time the scheduler is launched to basically overwrite the previous run ?
From creating the event to actually sending it ... 30 min sounds like enough "time"...
Hi EagerOtter28
I think the replacement should happen here:
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/clearml_agent/helper/repo.py#L277
Hi SuperficialGrasshopper36
/home/ubuntu/.clearml/venvs-builds.1/3.8/task_repository/repository_name/.venv
This is the problem, they should not be installed there, it should be in/home/ubuntu/.clearml/venvs-builds.1/3.8/
Could you post the poetry.lock file? Maybe it is something there?
What's the poetry version and cleaml-agent versions ?
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all
Hi ColossalAnt7
Following on SuccessfulKoala55 answer
I saw that there is a config file where you can specify specific users and passwords, but it currently requires
- mount the configuration file (the one holding the user/pass) into the pod from a persistent volume .
I think the k8s way to do this would be to use mounted config maps and secrets.
You can use ConfigMaps to make sure the routing is always correct, then add a load-balancer (a.k.a a fixed IP) for the users a...
PompousParrot44 That should be very easy to do, basically a service mode code that clones a base task and puts it into a queue:
This should more or less do what you need :)
` from trains import Task
task = Task.init('devops', 'daily train', task_type='controller')
stop the local execution of this code, and put it into the service queue, so we have a remote machine running it.
task = execute_remotely('services')
while True:
a_task = Task.clone(base_task_id='aaabb111')
Task.enqueu...
Do we support GPUs in a) docker mode b) k8s glue?
yes on both
Is there a good reference to get started with k8s glue?
A few folks here already set it up, do you have a k8s cluster with GPU support ?
Well it is there, do you have it in your docker-compose as well?
https://github.com/allegroai/trains-server/blob/master/docker-compose.yml#L55
Β are models technicallyΒ
Task
s and can they be treated as such? If not, how to delete a model permanently (both from the server and from AWS storage)?
When you call Task.delete() it actually goes over a;; the models/artifacts and deletes them from the storage
Containers are not running
? but you are running the docker-compose, how come no containers are running ?
Bad news, there isn't a nice interface to get the table from the Optimizer object (I will make sure we add it, no reason not to).
But you can very easily get all the information you need and more:all_the_tasks = an_optimizer.get_top_experiments(top_k=100)
Then for every task in the list you can get All the information:for task in all_the_tasks: task_params_as_dict = task.get_parameters() task_scalars = task.get_last_scalar_metrics()
Basically the Task object enables you to que...
BTW is it cheaper than ec2 instance? Why not use the aws autoscaler ?
Hi FancyChicken53
This is a noble cause you are after π
Could you be more specific on what you had in mind, I'll try to find the best example once I have more understanding ...
It should print to console...print(task.get_output_log_web_page())
Hmm, not a bad idea π
Could you please open a Git Issue, so it will not get forgotten ?
(btw: I'm not sure how trivial it is to implement, nonetheless obviously possible π
And still a difference between A/B , one detecting the repo the other does not?
Hi @<1598487094601191424:profile|MysteriousCow84>
You should put it in the dedicated section:
None
I want to be able to install the venv in multiple servers and start the "simple" agents in each one on them. You can think of it as some kind of one-off agent for a specific (distributed) hyperparameter search task
ExcitedFish86 Oh if this is the case:
in your cleaml.conf:agent.package_manager.type: conda agent.package_manager.conda_env_as_base_docker: true
https://github.com/allegroai/clearml-agent/blob/36073ad488fc141353a077a48651ab3fabb3d794/docs/clearml.conf#L60
https://git...
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
think perhaps it came across as way more passive aggressive than I was intending.
Dude, you are awesome for saying that! no worries π we try to assume people have the best intention at heart (the other option is quite depressing π )
I've been working on a Azure load balancer example, ...
This sounds exciting, let me know if we can help in any way
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
does this work for multiple levels?
Yep π
StorageManager
Oh it has no remove πStorageHelper.delete
is the only way
Hi MelancholyElk85
I think you are right, OutputModel is missing, remove
method.
Maybe we should have a class method on Model , something like:@classmethod Model.remove(model: Union[str, Model], delete_weights_file: bool, force: bool): # actually remove model and weights file
wdyt?