Reputation
Badges 1
25 × Eureka!so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
That's the missing part for me, You have all the requiremnts on the Task (that you can fully control), the agent is setting a brand new venv for each Task inside a container (the venv is cahced, and you can also make the agent just use the default python without installing anything). The part where I'm lost is why would you need the path to t...
Hmm, what's the clearml-agent version ?
LuckyRabbit93 We do!!!
Looking at theΒ
supervisor
Β method of the baseΒ
AutoScaler
Β class, where are the worker IDs kept.
Is it in the class attributeΒ
queues
Β ?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
JitteryCoyote63 yes this is very odd, seems like a pypi flop ?!
On the website they do say there is 0.5.0 ... I do not get it
https://pypi.org/project/pytorch3d/#history
, I generate some more graphs with a file calledΒ
graphs.py
Β and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
Hi JumpyPig73
Funny enough this is being fixed as we speak π
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC
do you have your Task.init call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
Can you share the storagemanager usage, and error you are getting ?
Hi ResponsiveCamel97
Let me explain how it works, essentially it creates a new venv inside the docker, inheriting all the packages form the main system packages.
This allows it to use the installed packages if the version match, and upgrade/change if you need, all without the need to rebuild a new container. Make sense ?
One last question: Is it possible to set the pip_version task-dependent?
no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)
My bad I wrote refresh and then edited it to the correct "reload" π
Yes you can drag it in the UI :) it's a new feature in v1
Is this a common case? maybe we should change the run_pipeline_steps_locally argument to False?
(The idea of run_pipeline_steps_locally=True is that it will be easier to debug the entire pipeline on the same machine)
SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner π
CheerfulGorilla72
yes, IP-based access,
hmm so this is the main downside of using IP based server, the links (debug images, models, artifacts) store the full URL (e.g. http://IP:8081/ http://IP:8081/... ) This means if you switched IP they will no longer work. Any chance to fix the new server to the old IP?
(the other option is somehow edit the DB with the links, I guess doable but quite risky)
So it seems decorator is simply the superior option?
Kind of yes π
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
CheerfulGorilla72 could it be the server address has changed when migrating ?
Regulatory reasons and proprietary data is what I had in mind. We have some projects that may need to be fully self hosted in the end
If this is the case then, yes do self-hosted, or talk to clearml sales to get the VPC option, but SaaS is just not the right option
I might take a look at it when I get a chance but I think I'd have to see if ClearML is a good fit for our use case before I can justify the commitment
I hope it is π
Is it possible to do something so that the change of the server address is supported and the pictures are pulled up on the new server from the new server?
The link itself (full link) is stored inside the server. Can I assume the access is IP based not host based (i.e. dns) ?
Okay, so basically set a template for the pod, specifying the docker image. Make sure you pass the correct trains-server configuration (i.e. api/web/file server addresses and credentials), and select the queue name the agent will listen to.
container image / details
https://hub.docker.com/r/allegroai/trains-agent
https://github.com/allegroai/trains-agent/tree/master/docker/agent
Full environment variable list to pass can be found here:
https://github.com/allegroai/trains-server/blob/...
StorageManager π
StorageHelper is used internally.
I'll make sure we remove it from the examples/docs
Hi @<1551376687504035840:profile|StraightSealion9>
AWS Autoscaler to create a new instance when you enqueue a task to the relevant queue.
Does that mean that you were able to enqueue a Task and have it launch on the remote EC2 machine ?
I think I found something, let me see if I can reproduce it
Any other port that could be open? (if SSH is already open we cannot launch another daemon on the same port)