Reputation
Badges 1
149 × Eureka!Solved. The problem was a trailing space before the image name in the Image
section in web UI. I think you should probably strip the string before proceeding to environment building step, to avoid this annoying stuff to happen. Of course, users could check twice before launching, but this thing will come up every once in a while regardless
if fails during add_step
stage for the very first step, because task_overrides
contains invalid keys
AgitatedDove14 I run into this problem again. Are there any known issues about it? I don't remember what helped the last time
SparklingElephant70 in WebUI Execution/SCRIPT PATH
SparklingElephant70 Try specifying full path to the script (relative to working dir)
@<1523701070390366208:profile|CostlyOstrich36> on a remote agent, yes, running the task from the interface
Maybe displaying 9 or 10 by default would be enough + clearly visible and thick scrollbar to the right
AgitatedDove14 by task you mean the training task or the separate task corresponding to the model itself? The former won't work since I don't want to delete the training task, only the models
I just happened to spawn multiple OutputModels
within a single script which is being run in a single task. That is, I see dozens of models in Models
tab in web UI. What I want is to delete most of them (along with the files in S3), preserving the spawning task
AgitatedDove14 thank you. Maybe you know about OutputModel.remove
method or something like that?
so that the way of doing it would be like this:all_models = Model.query_models(projeect_name=..., task_name=..., tags=['running-best-checkpoint']) all_models = sorted(all_models, key=lambda x: extract_epoch(x)) for model in all_models[:-num_to_preserve]: Model.remove(model, delete_weights_file=True)
pipeline controller itself is stuck at running mode forever all step tasks are created but never enqueued
I can share some code
AgitatedDove14 SuccessfulKoala55 maybe you know. How do I add files without uploading them anywhere?
where is it in the docs?
AgitatedDove14 yeah, makes sense, that would require some refactoring in our projects though...
But why is my_name
a subproject? Why not just my_project/.datasets
?
there seems to be no way to change default_output_uri
from the code.
Dataset.create
calls Task.create
which in turn accepts add_task_init_call
flag. Task.init
accepts output_uri
, but we cannot add arguments with add_task_init_call
, so we cannot change output_uri
from Dataset.create
, right?
AgitatedDove14
`
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='git@...', branch='main', commit_id='...', tag='', docker_cmd='registry.gitlab.com/...:...', en...
if I just use plain boto3 to sync weights to/from S3, I just check how many files are stored in the location, and clear up the old ones
I still haven't figured out how to make files downloaded this way visible for future get_local_copy
calls though
specifying target storage for OutputModel
from the config is bad idea: it should be project-specific, not agent-specific
@<1523701070390366208:profile|CostlyOstrich36> Yes, I'm self deployed, and the company I want to share it with is also self deployed
@<1523701435869433856:profile|SmugDolphin23> about ignore_parent_datasets
? I renamed it the same day you added that comment. Please let me know if there is anything else I need to pay attention to