
Reputation
Badges 1
662 × Eureka!I created a new task with the project name internal tests
, and no task name (so it's derived by ClearML).
The task was a simple print out.
The project does not appear in the project space and does not turn up on searches (the task does)
It should store it on the fileserver, perhaps you're missing a configuration option somewhere?
Any leads TimelyPenguin76 ? I've also tried setting up a minio s3 bucket, but I'm not sure if the remote agent has copied the credentials and host 🤔
Not really, I've only been able to somewhat understand the scope of where it happens, and I'm not sure it's even a ClearML issue (maybe matplotlib)
I've updated my feature request to describe that as well. A textual description is not necessarily a preview 😅 For now I'll use the debug samples.
These kind of things definitely show how ClearML was designed originally only for neural networks tbh, where images are almost always only part of the dataset. Same goes for the consistent use of iteration
everywhere 😞
That's a nice work around of course - I'm sure it works and I'll give it a shot momentarily. I'm just wondering if ClearML could automatically recognize image files in upload_artifact
(and other well known suffixes) and do that for me.
Not really - it will just show the string. A preview would be more like a low-res version of the uploaded image or similar.
It's not exactly "debugging", but rather a description of the generated model/framework (generated with pygraphviz).
Opened a matching feature request issue for this -> https://github.com/allegroai/clearml/issues/418
Ah I see, if the pipeline controller begins in a Task it does not add the tags to it…
So the pipeline runs successfully, I can find all the different tasks, but I cannot see them in the Pipelines tab…
Happens with the latest version indeed.
I can’t share our code, but the gist of it is:
pipe = PipelineController(name=..., project=..., version=...)
pipe.add_function_step(...) # Many calls
pipe.set_default_execution_queue(...)
pipe.start(queue=..., wait=True)
I can only say I’ve found ClearML to be very helpful, even given the documentation issue.
I think they’ve been working on upgrading it for a while, hopefully something new comes out soon.
Maybe @<1523701205467926528:profile|AgitatedDove14> has further info 🙂
Perfect now 👌 (also nice cleanup of default_new_data_root
duplicate code :D)
@<1523704157695905792:profile|VivaciousBadger56> It seems like whatever you pickled in the zip file relies on some additional files that are not pickled.
If I set the following:"extra_clearml_conf": "sdk.aws.s3.credentials = [\n{\nhost: 'ip:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n},\n{\nhost: 'ip2:9000'\nkey: 'xxx'\nsecret: 'xxx'\nmultipart: false\nsecure: false\n}\n]"
I run into a weird furl
error:ValueError: Invalid port '9000''.
A follow up question (instead of opening a new thread), is there a way I could signal some files/directories to be copied to the execute_remotely
task?
I thought so too - so I added flush calls just in case, but nothing's changed.
This is somewhat weird since it always happens in the above scenario (Ray + ClearML), and always in the last task/job from Ray
Let me verify a hypothesis...
We have an internal mono-repo and some of the packages are required - they’re all available correctly for the controller, only some are required for the individual tasks, but the “magic” doesn’t happen 😞
That is, the controller does not identify them as a requirement, so they’re not installed in the tasks environment.
Hey @<1537605940121964544:profile|EnthusiasticShrimp49> ! You’re mostly correct. The Step
classes will be predefined (of course developers are encouraged to add/modify as needed), but as in the DataTransformationStep
, there may be user-defined functions specified. That’s not a problem though, I can provide these functions with the helper_functions
argument.
- The
.add_function_step
is indeed a failing point. I can’t really create a task from the notebook because calling `Ta...
I guess following the example https://github.com/allegroai/clearml/blob/master/examples/advanced/execute_remotely_example.py , it's not clear to me how the server has access to the data loaders location when it hits execute_remotely
Yes, that one shows up. I forgot to mention we also set the version explicitly, but that just creates a duplicate dataset under Datasets
and anyway our main Task
is now hidden from the original project.
So project project
exists, but it is empty.
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling task.close()
takes a long time