Reputation
Badges 1
25 × Eureka!Hmm, what's the clearml-agent version ?
LuckyRabbit93 We do!!!
looks like a great idea, I'll make sure to pass it along and that someone reply 🙂
Looking at the
supervisor
method of the base
AutoScaler
class, where are the worker IDs kept.
Is it in the class attribute
queues
?
Actually the supervisor is passing a fixed prefix, then it asks the clearml-server on workers starting with this name.
This way we can have a fixed init script for all agents, while we still can differentiate them from the other agent instances in the system. Make sense ?
JitteryCoyote63 yes this is very odd, seems like a pypi flop ?!
On the website they do say there is 0.5.0 ... I do not get it
https://pypi.org/project/pytorch3d/#history
, I generate some more graphs with a file called
graphs.py
and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
Hi ShortElephant92
You could get a local copy from the local server, then switch credentials to the hosted server and upload again, would that work?
It's just the print (_ repr _) not showing the datafor w in client.workers.get_all(): print(w.data)
Hi JumpyPig73
Funny enough this is being fixed as we speak 🙂
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC
Is the code in this "other" repo downloaded to the agent's machine? Or is the component's code pushed to the machine on which the repository is?
Yes this repo is downloaded into the agent, so your code has access to it
How can I ensure that additional tasks aren’t created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
do you have your Task.init call inside the "train.py" script ? (and if you do, what are you getting in the Execution tab of the task) ?
Yes that should work, only thing is you need to call Task init on the master process (and make sure you call Task.current_task() on the subprocesses, if you want to automagic to kick in, that said, usually there is no need, they are supposed to report everything back to the main one anyhow
basically
` @call_parse
def main(
gpus:Param("The GPUs to use for distributed training", str)='all',
script:Param("Script to run", str, opt=False)='',
args:Param("Args to pass to script", nargs=...
Hmm, maybe the right way to do so is to abuse "models" which have entity, you can specify a system_tag on them, they can store a folder (and extract it if you need), they are on projects and they are cloned and can be changed.
wdyt?
Can you share the storagemanager usage, and error you are getting ?
PompousParrot44 the fundamental difference is that artifacts are uploaded manually (i.e. a user will specifically "ask" to upload an artifact), models are logged automatically and a user might not want them uploaded (imagine debugging sessions, or testing).
By adding the 'upload_uri' arguments, you can specify to trains that you want all models to be automatically uploaded (not just logged).
Now here is the nice thing, when running using the trains-agent, you can have:
Always upload the mod...
Hi ResponsiveCamel97
Let me explain how it works, essentially it creates a new venv inside the docker, inheriting all the packages form the main system packages.
This allows it to use the installed packages if the version match, and upgrade/change if you need, all without the need to rebuild a new container. Make sense ?
One last question: Is it possible to set the pip_version task-dependent?
no... but why would it matter on a Task basis ? (meaning what would be a use case to change the pip version per Task)
My bad I wrote refresh and then edited it to the correct "reload" 😞
Hi DilapidatedCow43
I'm assuming the returned object cannot be pickled (which is ClearML's way of serializing it)
You can upload it as a model with
` uploaded_model_url = Task.current_task().update_output_model(model_path="/path/to/local/model")
...
return uploaded_model_url `wdyt?
Yes you can drag it in the UI :) it's a new feature in v1
Is this a common case? maybe we should change the run_pipeline_steps_locally argument to False?
(The idea of run_pipeline_steps_locally=True is that it will be easier to debug the entire pipeline on the same machine)
SoggyFrog26 there is a full pythonic interface, why don't you use this one instead, much cleaner 🙂
CheerfulGorilla72
yes, IP-based access,
hmm so this is the main downside of using IP based server, the links (debug images, models, artifacts) store the full URL (e.g. http://IP:8081/ http://IP:8081/... ) This means if you switched IP they will no longer work. Any chance to fix the new server to the old IP?
(the other option is somehow edit the DB with the links, I guess doable but quite risky)
So it seems decorator is simply the superior option?
Kind of yes 😊
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
CheerfulGorilla72 could it be the server address has changed when migrating ?
Regulatory reasons and proprietary data is what I had in mind. We have some projects that may need to be fully self hosted in the end
If this is the case then, yes do self-hosted, or talk to clearml sales to get the VPC option, but SaaS is just not the right option
I might take a look at it when I get a chance but I think I'd have to see if ClearML is a good fit for our use case before I can justify the commitment
I hope it is 🙂
Is it possible to do something so that the change of the server address is supported and the pictures are pulled up on the new server from the new server?
The link itself (full link) is stored inside the server. Can I assume the access is IP based not host based (i.e. dns) ?