Reputation
Badges 1
25 × Eureka!LovelyHamster1 what do you mean by "assume the permissions of a specific IAM Role" ?
In order to spin an ec2 instance (aws autoscaler) you have to have correct credentials, to pass those credentials you must create a key/secret pair to pass to the autoscaler. There is no direct support for IAM Role. Make sense ?
LovelyHamster1 Now I see... Interesting credentials ability. Specifically all the S3 access on trains is derived from the ~/clearml.conf credentials section :
https://github.com/allegroai/clearml/blob/ebc0733357ac9ead044d0ed32d41447763f5797e/docs/clearml.conf#L73
( or the AWS S3 environment variables )
I'm not sure how this AWS feature works, I suspect it is changing the AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY variables on the ec2 instance. If this is the case, it should work out of...
Hmm so I guess the actual code adds it into the reporting itself ...
How about we call:task.set_initial_iteration(0)
Sure thing, any vanilla AMI will work, as long as it has python3 and docker preinstalled (obviously if you need GPU support than drivers preinstalled as well)
I thought this is the issue on the thread you linked, did I miss something ?
Funny it's the extension "h5" , it is a different execution path inside keras...
Let me see what can be done π
Thanks @<1671689437261598720:profile|FranticWhale40> !
I was able to locate the issue, fix should be released later today (or worst case tomorrow)
Hi WackyRabbit7 ,
Yes we had the same experience with kaggle competitions. We ended up having a flag that skipped the task init :(
Introducing offline mode is on the to do list, but to be honest it is there for a while. The thing is, since the Task object actually interacts with the backend, creating an offline mode means simulation of the backend response. I'm open to hacking suggestions though :)
It talks about referencing an issue.
Yes please, just better visibility π
Last but not least - can I cancel the offline zip creation if I'm not interested in it
you can override with OS environment, would that work?
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling
task.close()
takes a long time
It actually zips the entire offline folder so you can later upload it. Maybe we can disable that part?!
` # generate the script section
script = (
"fr...
I can't seem to find a difference between the two, why would matplotlib get listed and pandas does not... Any other package that is missing?
BTW: as an immediate "hack" , before your Task.init call add the following:Task.add_requirements("pandas")
Yes that would work π
You can also put it in the docker compose see TRAINS_AGENT_DEFAULT_BASE_DOCKER
Thanks OutrageousGrasshopper93
I will test it "!".
By the way the "!" is in the project or the Task name?
The idea of queues is not to let the users have too much freedom on the one hand and on the other allow for maximum flexibility & control.
The granularity offered by K8s (and as you specified) is sometimes way too detailed for a user, for example I know I want 4 GPUs but 100GB disk-space, no idea, just give me 3 levels to choose from (if any, actually I would prefer a default that is large enough, since this is by definition for temp cache only), and the same argument for number of CPUs..
Ch...
We just donβt want to pollute the server when debugging.
Why not ?
you can always remove it later (with Task.delete) ?
as a backup plan: is there a way to have an API key set up prior to running docker compose up?
Not sure I follow, the clearml API pair is persistent across upgrades, and the storage access token are unrelated (i.e. also persistent), what am I missing?
Are you using tensorboard or do you want to log directly to trains ?
Hi @<1536881167746207744:profile|EnormousGoose35>
, Could we just share the entire project instead of Workspace ?
You mean allow access to a project between workspaces ?
If the answer is yes, then unfortunatly the SaaS version (app.clear.ml) does not really support these level of RBAC, this is part of the enterprise version, which assumes a large organization with the need for that kind of access limit.
What is the use case ? Why not just share the entire workspace ?
I still wonder how no one noticed ... (maybe 100 unique title/series report is relatively high threshold)
How can I ensure that additional tasks arenβt created for a notebook unless I really want to?
TrickySheep9 are you saying two Tasks are created in the same notebook without you closing one of them ?
(Also, how is the git diff warning there with the latest clearml, I think there was some fix related to that)
Yes that is an issue for me, even if we could centralize an environment today, it leaves a concern whenever we add a model that possible package changes are going to cause issues with older models.
yeah changing the environment on the fly is tricky, it basically means spinning an internal http service per model...
Notice you can have many clearml-serving-sessions, they are not limited, so this means you can always spin new serving with new environments. The limitation is changing an e...
FiercePenguin76
So running the Task.init from the jupyter-lab works, but running the Task.init from the VSCode notebook does not work?
I would like to put table with url links and image thumnails.
StraightParrot3 links will work inside table (your code sample looks like the correct way to add them), but I think plotly (which is the UI package that displays the table) does not support embedding images into tables π
When they add it, the support will be transparent and it would work as you expect
But this config should almost never need to change!
Exactly the idea π
notice the password (initially random) is also fixed on your local machine, for the exact same reason
NVIDIA_VISIBLE_DEVICES=0,1
Basically it is uses "as is" and Nvidia drivers do the rest
Same goes for all or 0-3 etc.
NastySeahorse61 it might that the frequency it tests the metric storage is only once a day (or maybe half a day), let me see if I can ask around
(just making sure you can still login to the platform?)