Reputation
Badges 1
662 × Eureka!Seems like you're missing an image definition (AMI or otherwise)
You mean at the container level or at clearml?
Yes, the container level (when these docker shell scripts run).
The per user ID would be nice, except I upload the .env
file before the Task
is created (it's only available really early in the code).
Using the PipelineController with add_function_step
Yeah 🤔 🤔 they did. I'll give your suggested fix a go on Monday!
Sure, for example when reporting HTML files:
task.upload_artifact(..., is_requirement=True)
, task.connect_configuration(..., is_requirement=True)
Just implies these artifacts/configurations must be downloaded prior to running the code itself; then you also don't have to worry about zipping? 🤔
So where should I install the latest clearml version? On the client that's running a task, or on the worker machine?
The instance that took a while to terminate (or has taken a while to disappear from the idle workers)
After the task was initialized? 🤔
No that does not seem to work, I get
task.execute_remotely(queue_name="default")
2024-01-24 11:28:23,894 - clearml - WARNING - Calling task.execute_remotely is only supported on main Task (created with Task.init)
Defaulting to self.enqueue(queue_name=default)
Any follow-up thoughts, @<1523701070390366208:profile|CostlyOstrich36> , or maybe @<1523701087100473344:profile|SuccessfulKoala55> ? 🤔
SuccessfulKoala55 The changelog wrongly cites https://github.com/allegroai/clearml/issues/400 btw. It is not implemented and is not related to being able to save CSVs 😅
I wouldn't mind going the requests
route if I could find the API end point from the SDK?
My current workaround is to use poetry
and tell users to delete poetry.lock
if they want their environment copied verbatim
Example configuration -
` version: 1
disable_existing_loggers: true
formatters:
simple:
format: '%(asctime)s %(levelname)-9s %(name)-24s: %(message)s'
filters:
brackets:
(): ccutils.logger.BracketFilter
handlers:
console:
class: ccmlp.utils.TqdmStreamHandler
level: INFO
formatter: simple
filters: [brackets]
loggers: # Set logging levels for specific packages
urllib3:
level: WARNING
matplotlib:
level: WARNING
...
~
is a bit weird since it's not part of the package (might as well let the user go through clearml-init
), but using ${PWD} works! 👍 👍
(Though I still had to add the CLEARML_API_HOST and CLEARML_WEB_HOST ofc, or define them in the clearml.conf)
I'm also getting the following warning, I guess it's some ClearML dependency?IPython could not be loaded!
First bullet point - yes, exactly
Second bullet point - all of it, really. The SDK documentation and the examples.
For example, the Task
object is heavily overloaded and its documentation would benefit from being separated into logical units of work. It would also make it easier for the ClearML team to spot any formatting issues.
Any linked example to github is welcome, but some visualization/inline code with explanation is also very much welcome.
Heh, good @<1523704157695905792:profile|VivaciousBadger56> 😁
I was just repeating what @<1523701070390366208:profile|CostlyOstrich36> suggested, credits to him
AgitatedDove14 the issue was that we'd like the remote task to be able to spawn new tasks, which it cannot do if I use Task.init
before override_current_task_id(None)
.
When would this callback be called? I'm not sure I understand the usecase.
Am I making sense ?
No, not really. I don't see how task.connect_configuration
interacts with our existing CLI? Additionally, the documentation for task.connect_configuration
say the second argument is the name of a file, not the path to it? So something is off
As the meme goes, well yes but actually no, since the input path is provided via argparse? I'm also not sure how this would help debug from the WebUI - you can't really see the contents of a zipped file/the configuration tab is too messy for such a nested configuration as the one we have. It's best suited as an artifact.
EDIT: Or am I missing something? Point being, when the remote execution begins, the entry point tries to run e.g. python train.py --config_file path/to/local/file.yaml
...
Debugging. It's very useful for us to be able to see the contents of the configuration and understand what is going on and what is meant to be going on. Without a preview (which in our case is the entire content of the configuration file), one has to take an annoying route of downloading the files etc. The configurations are uploaded to a single task and then linked across all task to conserve storage space (so the S3 storage point is identical across tasks) Sure, sounds good. I think it's a ...
That could work, given that:
Could we add a preview section? One reason I don't like using the configuration section is that it makes debugging much much harder. Will the clearml-agent download and unzip the files, placing them into the same local folder as needed for execution? What if we want to include non-configuration objects? (i.e. the model case I listed)
Then that did not work, but I'll look into it again soon!
It's okay 🙂 I was originally hoping to delete my "initializer" task, but I'll just archive it if someone is interested in the worker data etc. Setting the queue is quite nice.
I think this should get my team excited enough 😄
Sounds like a nice idea 😁
Follow-up; any ideas how to avoid PEP 517 with the auto scaler? 🤔 Takes a long time to build the wheels
Hurrah! Addedgit config --system credential.helper 'store --file /root/.git-credentials'
to the extra_vm_bash_script
and now it works
(logs the given git credentials in the store file, which can then be used immediately for the recursive calls)
Yes it would be 🙂
Visualization is always a difficult topic... I'm not sure about that, but a callback would be nice.
One idea that comes to mind (this is of course limited to DataFrames), but think the git diff
, where I imagine 3 independent section:
Removed columns (+ truncated preview of removed values) (see below) Added columns (+ truncated preview of removed values)
The middle column is then a bit complicated, but I would see some kind of "shared columns" dataframe, where each ...
Yes, you're correct, I misread the exception.
Maybe it hasn't completed uploading? At least for Datasets one needs to explicitly wait IIRC