I guess it does not do so for all settings, but only those that come from Session()
Right, but that's as defined in the services agent, which is not immediately transparent
Let me know if you do; would be nice to have control over that 😁
The idea is that the features would be copied/accessed by the server, so we can transition slowly and not use the available storage manager for data monitoring
It misses the repository information of course, but the 'configuration/Args' were logged. So something weird in identifying the repository
Thanks SuccessfulKoala55 and AgitatedDove14 ! We'll go through the hoops of setting up mongo on AWS then.
We're working to decouple the data from the helm chart, seems like a dangerous idea to store long term data on k8s in case of failure 😅
Or some users that update their poetry.lock
and some that update manually as they prefer to resolve on their own.
Well you can install the binary in the additional start up commands.
Matter of fact, you can just include the ECR login in the "startup steps" offered by the scaler, so no need for this repository. I was thinking these are local instances.
Thanks CostlyOstrich36 !
And can I make sure the same budget applies to two different queues?
So that for example, an autoscaler would have a resource budget of 6 instances, and it would listen to aws
and default
as needed?
Kinda, yes, and this has changed with 1.8.1.
The thing is that afaik currently ClearML does not officially support a remotely executed task to spawn more tasks, so we also have a small hack that marks the remote "master process" as a local task prior to anything else.
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...
i.e.ERROR Fetching experiments failed. Reason: Backend timeout (600s)
ERROR Fetching experiments failed. Reason: Invalid project ID
I don't think there's a PR issue for that yet, at least I haven't created one.
I could have a look at this and maybe make a PR.
Not sure what would the recommended flow be like though 🤔
Thanks, that's what I thought - so I'm missing something else in the installation. I'll dig further 🙂
Another example - trying to validate dataset interactions ends with
` else:
self._created_task = True
dataset_project, parent_project = self._build_hidden_project_name(dataset_project, dataset_name)
task = Task.create(
project_name=dataset_project, task_name=dataset_name, task_type=Task.TaskTypes.data_processing)
if bool(Session.check_min_api_server_version(Dataset.__min_api_version)):
get_or_create_proje...
I'm not sure; the setup is not unique to Mac.
Each user has their own .env
file which is given to the code entry point, and at some point will be loaded with dotenv.load_dotenv()
.
The environment variables are not set in code anywhere, but the clearml.conf
uses them directly.
No it does not show up. The instance spins up and then does nothing.
Maybe it's the missing .bashrc
file actually. I'll look into it.
Anything specific we should look into TimelyPenguin76 ?
We just redeployed to use the 1.1.4 version as Jake suggested, so the logs are gone 😞
... and any way to define the VPC is missing too 🤔
We do not CostlyFox64 , but this is useful for the future 🙂 Thanks!
TimelyPenguin76 I'll have a look, one moment.
SweetBadger76 TimelyPenguin76
We're finally tackling this (since it has kept us back at 1.3.2 even though 1.6.2 is out...), and noticed that now the bucket name is also part of the folder?
So following up from David's latest example:StorageManager.download_folder(remote_url='s3://****-bucket/david/', local_folder='./')
Actually creates a new folder ./****-bucket/david/
and puts it contents there.
EDIT: This is with us using internal MinIO, so I believe ClearML parses that end...
Sorry, I misspoke, yes of course, the agents config file, not the queues
Seems like you're missing an image definition (AMI or otherwise)
I cannot, the instance is long gone... But it's not different to any other scaled instances, it seems it just took a while to register in ClearML
Hey FrothyDog40 ! Thanks for clarifying - guess we'll have to wait for that as a feature 😁
Should I create a new issue or just add to this one? https://github.com/allegroai/clearml/issues/529