Reputation
Badges 1
662 × Eureka!Weβd be happy if ClearML captures that (since it uses e.g. pip, then we have the git + commit hash for reproducibility), as it claims it would π
Any thoughts CostlyOstrich36 ?
Still failing with 1.2.0rc3 π AgitatedDove14 any thoughts on your end?
Yes and no SmugDolphin23
The project is listed, but there is no content and it hides my main task that it is attached to.
Then I wonder:
- How to achieve this? The pipeline controller seems to only work with functions, not classes, so running smaller steps remotely seems more difficult then I imagined. I was already prepared to upload artifacts myself etc, but now Iβm not sure?
- Do I really need to recreate the pipeline everytime from scratch? Or can I remove/edit steps? Itβs mostly used as aβ¦ controller for notebook-based executions and experimentations, before the actual pipeline is known. That is, it will ...
Parquet file in this instance (used to be CSV, but that was even larger as everything is stored as a string...)
Yes, that one shows up. I forgot to mention we also set the version explicitly, but that just creates a duplicate dataset under Datasets and anyway our main Task is now hidden from the original project.
So project project exists, but it is empty.
When is the next release expected? π
The instance that took a while to terminate (or has taken a while to disappear from the idle workers)
I'll have a look at 1.1.6 then!
And that sounds great - environment variables should be supported everywhere in the config, or then the docs should probably mention where they are and are not supported π
I'll be happy to test it out if there's any commit available?
The thing I don't understand is how come this DOES work on our linux setups π€
1.8.3; what about when calling task.close() ? We suddenly have a need to setup our logging after every task.close() call
This is related to my other thread, so Iβll provide an example there -->
I just used this to create the dual_gpu queue:clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached
I will! (once our infra guy comes back from holiday and updates the install, for some reason they setup server 1.1.1???)
Meanwhile wondering where I got a random worker from
AFAIU, something like this happens (oversimplified):
` from clearml import Task # <--- Crash already happens here
import argparse
import dotenv
if name == "main":
# set up argparse with optional flag for a dotenv file
dotenv.load_dotenv(args.env_file)
# more stuff `
I cannot, the instance is long gone... But it's not different to any other scaled instances, it seems it just took a while to register in ClearML
Follow-up question/feature request (out of interest) - could the WebUI show the matching commit message?
Honestly, this is all related to issue #340. The only reason we have this to begin with is because we need one separate "initializer" task that downloads the remote cache and prepares the agent environment for execution (downloading the configuration files, etc).
Otherwise it fits perfectly with pipelines, but we're not there yet.
In the local execution we don't have this initializer task, so we use Task.init() before starting to work on a model, and task.close() when we're done....
CostlyOstrich36 I'm not sure what is holding it from spinning down. Unfortunately I was not around when this happened. Maybe it was AWS taking a while to terminate, or maybe it was just taking a while to register in the autoscaler.
The logs looked like this:
- Recognizing an idle worker and spinning down.
2022-09-19 12:27:33,197 - clearml.auto_scaler - INFO - Spin down instance cloud id 'i-058730639c72f91e1'2. Recognizing a new task is available, but the worker is still idle.
` 2022-09...
Is it currently broken? π€
AgitatedDove14 the issue was that we'd like the remote task to be able to spawn new tasks, which it cannot do if I use Task.init before override_current_task_id(None) .
When would this callback be called? I'm not sure I understand the usecase.
I've also followed https://clearml.slack.com/archives/CTK20V944/p1628333126247800 but it did not help
... and any way to define the VPC is missing too π€
Ah, you meant βfree python codeβ in that sense. Sure, I see that. The repo arguments also exist for functions though.
Sorry for hijacking your thread @<1523704157695905792:profile|VivaciousBadger56>
Setting the endpoint will not be the only thing missing though, so unfortunately that's insufficient π
There's code that strips the type hints from the component function, just think it should be applied to the helper functions too :)
Feels like we've been over this π Has there been new developments perhaps?
It's essentially that this - https://clear.ml/docs/latest/docs/guides/advanced/multiple_tasks_single_process cannot work in a remote execution.
No that does not seem to work, I get
task.execute_remotely(queue_name="default")
2024-01-24 11:28:23,894 - clearml - WARNING - Calling task.execute_remotely is only supported on main Task (created with Task.init)
Defaulting to self.enqueue(queue_name=default)
Any follow-up thoughts, @<1523701070390366208:profile|CostlyOstrich36> , or maybe @<1523701087100473344:profile|SuccessfulKoala55> ? π€
Of course Im using report_table in the above; it seems the support for Pandas DataFrame does not include support for MultiIndex other than by concatenating the indices together
That's fine (as in, it works), but it looks a bit weird and defies the purpose of a MultiIndex π€ Was wondering if there are plans to add better support for it