Reputation
Badges 1
149 × Eureka!AgitatedDove14 is it expected behavior?
Changing sdk.development.default_output_uri
in clearml.conf
seems to be bad idea, because different datasets will likely have different addresses on S3
there seems to be no way to change default_output_uri
from the code.
Dataset.create
calls Task.create
which in turn accepts add_task_init_call
flag. Task.init
accepts output_uri
, but we cannot add arguments with add_task_init_call
, so we cannot change output_uri
from Dataset.create
, right?
AgitatedDove14 I still have name my_name
, but the project name my_project/.datasets/my_name
rather than my_project/.datasets
I think it's still an issue, not critical though, because we have another way to do it and it works
In short, what helped isgitlab+deploy-token
in gitlab url
maybe being able to change 100% of things with task_overrides
would be the most convenient way
I don't think so. it is solved by installing openssh-client to the docker image or by adding deploy token to the cloning url in web ui
We digressed a bit from the original thread topic though 😆 About clone_base_task=False
.
I ended up using task_overrides
for every change, and this way I only need 2 tasks (a base task and a step task, thus I use clone_base_task=True
and it works as expected - yay!)
So, the problem I described in the beginning can be reproduced only this way:
- to have a base task
- export_data - modify - import_data - have a second task
- pass the second task to
add_step
with `cl...
AgitatedDove14 Yes, this is exactly what I was looking for and was running into 413 Request Entity Too Large
error during add_files
. This is what helped to solve this:
https://github.com/allegroai/clearml/issues/257#issuecomment-736020929
@<1523701435869433856:profile|SmugDolphin23> about ignore_parent_datasets
? I renamed it the same day you added that comment. Please let me know if there is anything else I need to pay attention to
SparklingElephant70 then use task_overrides
argument, like thistask_overrides={'script.branch': 'main', 'script.version_num': '', 'script.diff': '', 'project': Task.get_project_id(project_name=cfg[name]['base_project']), 'name': 'task-name', 'container.image': 'registry.gitlab.com/image:tag'}
there must be some schema to change script name as well
Refactoring is to account for the new project names. And also to resolve the project name depending on the version of a client
SparklingElephant70 in WebUI Execution/SCRIPT PATH
sorry, no GH issue, just a link to this thread (I saw other contributors did this and got their PR merged, hehe)
AgitatedDove14 yeah, makes sense, that would require some refactoring in our projects though...
But why is my_name
a subproject? Why not just my_project/.datasets
?
For datasets it's easily done with a dedicated project, a separate task per dataset, and Artifacts
tab within it
in order to work with ssh cloning, one has to manually install openssh-client to the docker image, looks like that
pipeline launches on the server anyway (appears on the web UI)
AgitatedDove14
`
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
error: Could not fetch origin
Repository cloning failed: Command '['git', 'fetch', '--all', '--recurse-submodules']' returned non-zero exit status 1.
clearml_agent: ERROR: Failed cloning repository.
- Make sure you pushed the requested commit:
(repository='git@...', branch='main', commit_id='...', tag='', docker_cmd='registry.gitlab.com/...:...', en...
add_files
. There is no upload
call, because add_files
uploads files by itself, if I got it correctly
it has the same effect as start/wait/stop, kinda weird