Reputation
Badges 1
662 × Eureka!I would expect the service to actually implicitly inject it to new instances prior to applying the user's extra configuration π€
That's up and running and is perfectly fine.
(the extra_vm_bash_script
is what you're after)
TimelyPenguin76 CostlyOstrich36 It seems a lot of manual configurations is required to get the EC2 instances up and running.
Would it not make sense to update the autoscaler (and example script) so that the config.yaml
that's used for the autoscaler service is implicitly copied to the EC2 services, and then any extra_clearml_conf
are used/overwritten?
Since the additional credentials are available to the autoscaler when it boots up (via the config file), I thought it could use those natively?
I am indeed
Those are cool and very welcome additions (hopefully the additional info in the Info
tab will be a link?) π
The main issue is the clutter that the forced renaming creates, as shown in the pictures I attached in the other thread.
Why does ClearML hide the dataset task from the main WebUI? Users should have some control over that. If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some .datasets
hidden sub-project. Not...
I commented on your suggestion to this on GH. Uploading the artifacts would happen via some SDK before switching to remote execution.
When cloning a task (via WebUI or SDK), a user should have an option to also clone these input artifacts or simply linking to the original. If linking to the original, then if the original task is deleted - it is the user's mistake.
Alternatively, this potentially suggests "Input Datasets" (as we're imitating now), such that they are not tied to the original t...
I opened a GH issue shortly after posting here. @<1523701312477204480:profile|FrothyDog40> replied (hoping I tagged the right person).
We need to close the task. This is part of our unittests for a framework built on top of ClearML, so every test creates and closes a task.
Not sure if ClearML has any built in support, but we used the above for a similar issue but with Prefect2 :)
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why π€
That doesn't make sense? π€
Maybe I was not clear, but it's a simple part of the config file.
UPDATE: Apparently the quotation type matters for furl
? I switched the '
to \"
and it seems to work now
Well you can install the binary in the additional start up commands.
Matter of fact, you can just include the ECR login in the "startup steps" offered by the scaler, so no need for this repository. I was thinking these are local instances.
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...
Is Task.create
the way to go here? π€
Another example - trying to validate dataset interactions ends with
` else:
self._created_task = True
dataset_project, parent_project = self._build_hidden_project_name(dataset_project, dataset_name)
task = Task.create(
project_name=dataset_project, task_name=dataset_name, task_type=Task.TaskTypes.data_processing)
if bool(Session.check_min_api_server_version(Dataset.__min_api_version)):
get_or_create_proje...
Last but not least - can I cancel the offline zip creation if I'm not interested in it π€
EDIT: I see not, guess one has to patch ZipFile
...
Note that it would succeed if e.g. run with pytest -s
Yeah I managed to work around those former two, mostly by using Task.create
instead of Task.init
. It's actually the whole bunch of daemons running in the background that takes a long time, not the zipping.
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
There is a data object it, but there is no script object attached to it (presumably again because of pytest?)
Yes exactly that AgitatedDove14
Testing our logic maps correctly, etc for everything related to ClearML
Or if it wasn't clear, that chunk of code is from clearml's dataset.py
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling task.close()
takes a long time
This is with:Task.set_offline_mode(True) task = Task.init(..., auto_connect_streams=False)
When is the next release expected? π
Ah, uhhhh whatever is in the helm/glue charts. I think itβs the allegroai/clearml-agent-k8s-base
, but since I hadnβt gotten a chance to try it out, itβs hard to say with certainty which would be the best for us π
Honestly I wouldn't mind building the image myself, but the glue-k8s setup is missing some documentation so I'm not sure how to proceed
I know the ClearML enterprise offers a vault.
If these are static-ish, you can set them directly in the agent's config file.
If not, what we did was that before executing remotely, we uploaded environment variables of interest as parameters, and then loaded them in the remote task.
These can then be overwritten with *** after loading them.