Reputation
Badges 1
662 × Eureka!CostlyOstrich36 That looks promising, but I don't see any documentation on the returned schema (i.e. workers.worker_stats
is not specified anywhere?)
We're wondering how many on-premise machines we'd like to deprecate. For that, we want to see how often our "on premise" queue is used (how often a task is submitted and run), for how long, how many resources it consumes (on average), etc.
I would expect the service to actually implicitly inject it to new instances prior to applying the user's extra configuration 🤔
That's up and running and is perfectly fine.
(the extra_vm_bash_script
is what you're after)
TimelyPenguin76 CostlyOstrich36 It seems a lot of manual configurations is required to get the EC2 instances up and running.
Would it not make sense to update the autoscaler (and example script) so that the config.yaml
that's used for the autoscaler service is implicitly copied to the EC2 services, and then any extra_clearml_conf
are used/overwritten?
Does that make sense CostlyOstrich36 ? Any thoughts on how to treat this? For the time being I'm also perfectly happy to include something specific to extra_clearml_conf
, but I'm not sure how to set the sdk.aws.s3.credentials
to be a list of dictionaries as needed
Since the additional credentials are available to the autoscaler when it boots up (via the config file), I thought it could use those natively?
If I set the following:"extra_clearml_conf": "sdk.aws.s3.credentials = [\n{\nhost: 'ip:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n},\n{\nhost: 'ip2:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n}\n]"
I run into a weird furl
error:ValueError: Invalid port '9000''.
I'm saying it's a bug
I am indeed
Hey SuccessfulKoala55 ! Is the configuration file needed for Task.running_locally()
? This is tightly related with issue #395, where we need additional files for remote execution but have no way to attach them to the task other then using the StorageManager
as a temporary cache.
I see, okay that already clarifies some stuff, I'll dig a bit more into this then! Thanks!
e.g. a separate structured user guide with common tips, usability, best practices - https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html
vs the doc, where each function is its own page, e.g.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Yes, thanks AgitatedDove14 ! It's just that the configuration
object passed onwards was a bit confusing.
Is there a planned documentation overhaul? 🤔
Those are cool and very welcome additions (hopefully the additional info in the Info
tab will be a link?) 😁
The main issue is the clutter that the forced renaming creates, as shown in the pictures I attached in the other thread.
Why does ClearML hide the dataset task from the main WebUI? Users should have some control over that. If I specified a project for the dataset, I specifically want it there, in that project, not hidden away in some .datasets
hidden sub-project. Not...
Basically you have the details from the Dataset page, why should it be mixed with the others ?
Because maybe it contains code and logs on how to prepare the dataset. Or maybe the user just wants increased visibility for the dataset itself in the tasks view.
why would you need the Dataset Task itself is the main question?
For the same reason as above. Visibility and ease of access. Coupling relevant tasks and dataset in the same project makes it easier to understand that they're...
I commented on your suggestion to this on GH. Uploading the artifacts would happen via some SDK before switching to remote execution.
When cloning a task (via WebUI or SDK), a user should have an option to also clone these input artifacts or simply linking to the original. If linking to the original, then if the original task is deleted - it is the user's mistake.
Alternatively, this potentially suggests "Input Datasets" (as we're imitating now), such that they are not tied to the original t...
can I assume these files are reused
A definite maybe, they may or may not be used, but we'd like to keep that option 🙃
Maybe the "old" way Dataset were shown is better suited ?
It was, but then it's gone now 😞
I see your point, this actually might be a "bug"?!
I would say so myself, but could be also by design..?
Awesome, I'll ask Product to reach out
LMK, happy to help out!
I know our use case is maybe a very different one, but...
I opened a GH issue shortly after posting here. @<1523701312477204480:profile|FrothyDog40> replied (hoping I tagged the right person).
We need to close the task. This is part of our unittests for a framework built on top of ClearML, so every test creates and closes a task.
Not sure if ClearML has any built in support, but we used the above for a similar issue but with Prefect2 :)
-ish, still debugging some weird stuff. Sometimes ClearML picks ip
and sometimes ip2
, and I can't tell why 🤔
That doesn't make sense? 🤔
Maybe I was not clear, but it's a simple part of the config file.
UPDATE: Apparently the quotation type matters for furl
? I switched the '
to \"
and it seems to work now
Hey AgitatedDove14 🙂
Finally managed; you keep saying "all projects" but you meant the "All Experiments" project instead. That's a good start 👍 Thanks!
Couple of thoughts from this experience:
Could we add a comparison feature directly from the search results (Dashboard view -> search -> highlight some experiments for comparison)? Could we add a filter on the project name in the "All Experiments" project? Could we add the project for each of the search results? (see above pictur...
Well you can install the binary in the additional start up commands.
Matter of fact, you can just include the ECR login in the "startup steps" offered by the scaler, so no need for this repository. I was thinking these are local instances.
Coming back to this; ClearML prints a lot of error messages in local tests, supposedly because the output streams are not directly available:
` --- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.10/logging/init.py", line 1103, in emit
stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
File "/home/idan/CC/git/ds-platform/.venv/lib/python3.10/site-packages/clearml/task.py", line 3504, in _at_exit
self.__shutdown...