So where should I install the latest clearml version? On the client that's running a task, or on the worker machine?
The instance that took a while to terminate (or has taken a while to disappear from the idle workers)
After the task was initialized? 🤔
No that does not seem to work, I get
task.execute_remotely(queue_name="default")
2024-01-24 11:28:23,894 - clearml - WARNING - Calling task.execute_remotely is only supported on main Task (created with Task.init)
Defaulting to self.enqueue(queue_name=default)
Any follow-up thoughts, @<1523701070390366208:profile|CostlyOstrich36> , or maybe @<1523701087100473344:profile|SuccessfulKoala55> ? 🤔
SuccessfulKoala55 The changelog wrongly cites https://github.com/allegroai/clearml/issues/400 btw. It is not implemented and is not related to being able to save CSVs 😅
I wouldn't mind going the requests
route if I could find the API end point from the SDK?
My current workaround is to use poetry
and tell users to delete poetry.lock
if they want their environment copied verbatim
Example configuration -
` version: 1
disable_existing_loggers: true
formatters:
simple:
format: '%(asctime)s %(levelname)-9s %(name)-24s: %(message)s'
filters:
brackets:
(): ccutils.logger.BracketFilter
handlers:
console:
class: ccmlp.utils.TqdmStreamHandler
level: INFO
formatter: simple
filters: [brackets]
loggers: # Set logging levels for specific packages
urllib3:
level: WARNING
matplotlib:
level: WARNING
...
~
is a bit weird since it's not part of the package (might as well let the user go through clearml-init
), but using ${PWD} works! 👍 👍
(Though I still had to add the CLEARML_API_HOST and CLEARML_WEB_HOST ofc, or define them in the clearml.conf)
I'm also getting the following warning, I guess it's some ClearML dependency?IPython could not be loaded!
First bullet point - yes, exactly
Second bullet point - all of it, really. The SDK documentation and the examples.
For example, the Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
Any linked example to github is welcome, but some visualization/inline code with explanation is also very much welcome.
AgitatedDove14 the issue was that we'd like the remote task to be able to spawn new tasks, which it cannot do if I use Task.init
before override_current_task_id(None)
.
When would this callback be called? I'm not sure I understand the usecase.
Am I making sense ?
No, not really. I don't see how task.connect_configuration
interacts with our existing CLI? Additionally, the documentation for task.connect_configuration
say the second argument is the name of a file, not the path to it? So something is off
As the meme goes, well yes but actually no, since the input path is provided via argparse? I'm also not sure how this would help debug from the WebUI - you can't really see the contents of a zipped file/the configuration tab is too messy for such a nested configuration as the one we have. It's best suited as an artifact.
EDIT: Or am I missing something? Point being, when the remote execution begins, the entry point tries to run e.g. python train.py --config_file path/to/local/file.yaml
...
Debugging. It's very useful for us to be able to see the contents of the configuration and understand what is going on and what is meant to be going on. Without a preview (which in our case is the entire content of the configuration file), one has to take an annoying route of downloading the files etc. The configurations are uploaded to a single task and then linked across all task to conserve storage space (so the S3 storage point is identical across tasks) Sure, sounds good. I think it's a ...
That could work, given that:
Could we add a preview section? One reason I don't like using the configuration section is that it makes debugging much much harder. Will the clearml-agent download and unzip the files, placing them into the same local folder as needed for execution? What if we want to include non-configuration objects? (i.e. the model case I listed)
Then that did not work, but I'll look into it again soon!
It's okay 🙂 I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough 😄
Sounds like a nice idea 😁
Follow-up; any ideas how to avoid PEP 517 with the auto scaler? 🤔 Takes a long time to build the wheels
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials'
to the extra_vm_bash_script
and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
Yes it would be 🙂
Visualization is always a difficult topic... I'm not sure about that, but a callback would be nice.
One idea that comes to mind (this is of course limited to DataFrames), but think the git diff
, where I imagine 3 independent section:
Removed columns (+ truncated preview of removed values) (see below) Added columns (+ truncated preview of removed values)
The middle column is then a bit complicated, but I would see some kind of "shared columns" dataframe, where each ...
Yes, you're correct, I misread the exception.
Maybe it hasn't completed uploading? At least for Datasets one needs to explicitly wait IIRC
SuccessfulKoala55 That at least did not work, unless one has to specify wildcard patterns perhaps..?
Same result 😞 This is frustrating, wtf happened :shocked_face_with_exploding_head:
This is also specifically the services queue worker I'm trying to debug 🤔
That still seems to crash SuccessfulKoala55 🤔
EDIT: No, wait, the environment still needs updating. One moment still...
Oh, well, no, but for us that would be one way solution (we didn't need to close the task before that update)
I'll have yet another look at both the latest agent RC and at the docker-compose, thanks!
There was no "default" services agent btw, just the queue, I had to launch an agent myself (not sure if it's relevant)
Any updates on this? We can't do anything with our K8s since this 404...
@<1523701070390366208:profile|CostlyOstrich36> I added None btw