
Reputation
Badges 1
533 × Eureka!🤔 is the "installed packages" part editable? good to know
Isn't it a bit risky manually changing a package version? what if it won't be compatible with the rest?
I'm saying that because in the task under "INSTALLED PACKAGES" this is what appears
We try to break up every thing into independent tasks and group them using a pipeline. The dependency on an agnet caused an unnecessary overhead since we just want to execute locally. It became a burden once new data scientists join the project and instead of just telling them "yeah, just execute this script" you have to now teach them about clearml, the role of agents, how to launch them, how they behave, how to remove them and stuff like that... things you want to avoid with data scientists
I'll just exclude .cfg files from the deletion, my question is how to recover, must i recreate the agents or there is another way?
That's awesome, but my problem right now is that I have my own cronjob deleting the contents of /tmp
each interval, and it deletes the cfg files... So I understand I must skip deleting them from now on
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Sorry I meant this link
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/apps-4-rent.clearml-on-centos8
(I'm working with maria)
essentially, what maria says is when she has a script with uncomitted changes, when executing remotely, the script that actually runs on the remote machine is without the uncomitted changes
e.g.:
Her git status
is clean, she makes some changes to script.py
and executes it remotely. What gets executed remotely is the original script.py
and not the modified version she has locally
Okay, so let me get this straight
The autoscaling is basically an ever-running task (lets say on the services
queue). Now, the actual auto scaling and which queues exist have nothign to do with that, and are configured in the auto scale task?
Yep, the trains server is basically a docker-compose based service.
All you have to do is change the ports in the docker-compose.yml
file.
If you followed the instructions in the docs you should find that file in /opt/trains/docker-compose.yml
and then you will see that there are multiple services ( apiserver
, elasticsearch
, redis
etc.) and in each there might be a section called ports
which then states the mapping of the ports.
The number on the left, is ...
So once I enqueue it is up? Docs says I can configure the queues that the auto scaler listens to in order to spin up instances, inside the auto scale task - I wanted to make sure that this config has nothing to do to where the auto scale task was enqueued to
Oh I get it, that also makes sense with the docs directing this at inference jobs and avoiding GPU - because of the 1-N thing
Where is the docker-compose file? It's not at /opt
(again, I didn't place it anywhere, I'm just using the ami)
cluster.routing.allocation.disk.watermark.low:
SuccessfulKoala55 here it is
How did it come to this? I didn't configure anything, I'm using the trains AMI, with the suggested instance type
Or should I change all three of them?
I only have like 40 tasks including the example ones
you want to see its contents?