Reputation
Badges 1
25 × Eureka!Very lacking wrt to how things interact with one another
If I'm reading it correctly, what you are saying is that some of the "big picture" / holistic approach on how different parts interact with one another is missing, is that correct?
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem
Interesting thought, what exactly would you suggest we "borrow" in terms of approach?
Hi UnevenDolphin73
Does ClearML somehow
remove
any loggers from
logging
module? We suddenly noticed that we have some handlers missing when running in ClearML
I believe it adds a logger, it should not remove any loggers,
What's the clearml version you are using ?
Weird issue, I'll make sure we fix compatibility with python 3.9
JitteryCoyote63 okay... but let me explain a bit so you get a better intuition for next time 🙂
The Task.init call, when running remotely, assumes the Task object already exists in the backend, so it ignores whatever was in the code and uses the data stored on the trains-server, similar to what's happening with Task.connect and the argparser.
This gives you the option of adding/changing the "output_uri" for any Task regardless of the code. In the Execution tab, change the "Output Destina...
About .get_local_copy... would that then work in the agent though?
Yes it would work both locally (i.e. without agent) and remotely
Because I understand that there might not be a local copy in the Agent?
If the file does not exist locally it will be downloaded and cached for you
I don't know whether you have access to the backend,
Creepy , no I do not 🙂
I can't make anything appear in the console part of the ui
clearml_task.logger.report_text("some text") should work
As I understand, providing this param at the Task.init() inside the subtask is too late, because step is already started.
If you are running the task on an agent (with I assume you do), than one way would be to configure the "default_output_uri" on the agnets clearml.conf file.
The other option is to change the task as creation time, task.storage_uri = 's3://...'
In our case this is not possible due to client security (e.g. training data from clients can potentially be 'reverse engineered' from trained models in future).
Hmm I see, wouldn't it make more sense to separate clients like a multi-tenant SAAS solution ?
The problem is of course filling in all the configuration details, so that they are viewable.
Other than that, check out:
https://allegro.ai/docs/task.html#trains.task.Task.export_task
https://allegro.ai/docs/task.html#trains.task.Task.import_task
Sounds good ?
Hi BlandPuppy7 , is this Trains related, are you trying to integrate it, and need help?
BTW is it cheaper than ec2 instance? Why not use the aws autoscaler ?
My typos are killing us, apologies :
change -t to -it it will make it interactive (i.e. you can use bash 🙂 )
Hi WickedElephant66
in the pipeline component, import the required package it should auto detect it, or in the decorator function add the argument "packages"
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/clearml/automation/controller.py#L2941
DefeatedMoth52 how many agents do you have running on the same GPU ?
SmarmySeaurchin8 what do you think?
https://github.com/allegroai/trains/issues/265#issuecomment-748543102
task.connect_configuration
btw: both should work fine
Weird that this code is also uploading to the 'Plots'. I replicated the same thing as my main script, but main script is still uploading to Debug Samples.
SmarmyDolphin68 are you saying the same code behaves differently ?
now i cant download neither of them
would be nice if address of the artifacts (state and zips) was assembled on the fly and not hardcoded into db.
The idea is this is fully federated, the server is not actually aware of it, so users can manage multiple storage locations in a transparent way.
if you have any tips how to fix it in the mongo db that would be great ....
Yes that should be similar, but the links would be in artifact property on the Tasks object
not exactly...
Was wondering how it can handle 10s, 100s of models.
Yes, it supports dynamically loading/unloading models based on requests
(load balancing multiple nodes is disconnected from it, but assuming they are under diff endpoints, the load balancer can be configured to route accordingly)
EnviousPanda91 'connect' will log the object properties, the automagic logging is controlled in the Task.init call. Specifically Which framework produces metrics that are not logged? Your sample code manually reports some scalars/values, do you these as well?
Yes, the container level (when these docker shell scripts run).
I think this is the tricky part, in code you can access the user ID of the Task, and download the .env and apply it, but before the process starts I can't really think of a way to do that ...
That said, I think that in the paid version they have "vault" support, which allows you to store the .env file on the clearml-server, and then the agent automatically applies it at the beginning of the container execution.
Can clearml-agent currently detect this?
Hmm you mean will agent clean it self up?
DefeatedCrab47 no idea, but you are more then welcome to join the thread here, and point it out:
https://github.com/PyTorchLightning/pytorch-lightning-bolts/issues/249
connect_configuration
seems to take about the same amount of time unfortunately!
I think it is a better solution, that said from your description it sounds the issue is the upload bandwidth (i.e. json-ing the dict itself), could that be it?
(and even 1000 entries seems like something that would end up at 1mb upload, that is not that much)