
Reputation
Badges 1
25 × Eureka!True, this is exactly the reason. That said, you can always manually add it. You can see the default values : https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf
What do you mean? every Model has a unique ID, what do you consider a version?
Hi EagerOtter28
The agent knows how to do the http->ssh conversion on the fly, in your cleaml.conf (on the agent's machine) set force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/42606d9247afbbd510dc93eeee966ddf34bb0312/docs/clearml.conf#L25
AbruptHedgehog21 could it be the console log itself is huge ?
For reporting the console logs you can use :logger.report_text("my log line here", print_console=False)
https://github.com/allegroai/clearml/blob/b4942321340563724bc16f60ea5dd78c9161778d/clearml/logger.py#L120
As we use a custom CUDA image, we do not want this running on user login, and get ugly error messages about missing symlinks.
You can customize the startup bash script (running inside Any container) here:
https://github.com/allegroai/clearml-agent/blob/bf07b7f76d3236c1118b81730c6d9718705a795a/docs/clearml.conf#L145
LackadaisicalOtter14 Would that help?
Hi @<1547028031053238272:profile|MassiveGoldfish6>
Is there a way for ClearML to simply save the model once training is done and to ignore the model checkpoints?
Yes, you can simple disable the auto logging of the model and manually save the checkpoint:
task = Task.init(..., auto_connect_frameworks={'pytorch': False}
...
task.update_output_model("/my/model.pt", ...)
Or for example, just "white-label" the final model
task = Task.init(..., auto_connect_frameworks={'pyt...
GiddyTurkey39
BTW: you can always add the missing package via code:Task.add_requirements('torch', optional_version)
No worries, I would love for us to come up with a nice solution π
Thanks TrickyRaccoon92
I think it's about time we remove the survey link anyhow π
I'll make sure it happens ..,
Let me know if you managed to get it working, then we can see if we can detect it automatically.
Hi EnthusiasticCoyote38
But one one process finished it changed task status to complete. May be you know some save way to deal with such situation? Or maybe the best way to check task status before upload object?
Well, you can actually forcefully set the state of the Task to running, then add artifacts, then close it?
would that work?
` my_other_task.reload()
my_other_task.mark_started(force=True)
my_other_task.upload_artifact(...)
my_other_task.flush(wait_for_uploads=True)
my_othe...
Added -v /home/uname/.ssh:/root/.ssh and it resolved the issue. I assume this is some sort of a bug then?
That is supposed to be automatically mounted the SSH_AUTH_SOCK defined means that you have to add the mount to the SSH_AUTH_SOCK socket so that the container can access it.
Try to run when you undefine SSH_AUTH_SOCK and keep the force_git_ssh_protocol
(no need to manually add the .ssh mount it will do that for you)
Exactly π
If you feel like PR-ing a fix, it will be greatly appreciated π
I see, is this what you are looking for?
https://allegro.ai/docs/task.html#trains.task.Task.init
continue_last_task='task_id'
Hi JitteryCoyote63 ,
upload_artifacts was designed to upload pre made artifacts, which actually covers everything.
With register_artifacts we tried to have something that will constantly log PD artifact, the use case was examples used for training and their order, so we could compare the execution of two different experiments and detect dataset contamination etc.
Not Sure it is actually useful though ...
Retrieving an artifact from a Task is done by:
` Task.get_task(task_id='aaa').artifact...
Regarding this, does this work if the task is not running locally and is being executed by the trains agent?
This line: "if task.running_locally():" makes sure that when the code is executed by the agent it will not reset it's own requirements (the agent updates the requirements/installed_packages after it installs them from the requiremenst.txt, so that later you know exactly which packages/versions were used)
Hi GiddyTurkey39
Are you referring to an already executed Task or the current running one?
(Also, what is the use case here? is it because the "installed packages are in accurate?)
GiddyTurkey39
A flag would be really cool, just in case if theres any problem with the package analysis.
Trying to think if this is a system wide flag (i.e. trains.conf) or a flag in task.init.
What do you think?
Hi @<1529633468214939648:profile|CostlyElephant1>
Is it possible to get user ID of the current user
On the Task.data
object itself there should be a filed named " user
" that's the user ID of the owner (creator) of the Task.
You can filter based on this id with
Tasks.get_tasks(..., task_filter={'user': ["user-id-here"]})
wdyt?
Hmm, it seems as if the task.set_initial_iteration(0) is ignored...
What's the clearml version you are using ?
Is it the same one you have on the local machine ?
Hi @<1533620191232004096:profile|NuttyLobster9>
I, but no system stats. ,,,
If the job is too short (I think 30 seconds), it doesn't have enough time to collect stats (basically it collects them over a 30 sec window, but the task ends before it sends them)
does that make sense ?
AttractiveCockroach17 can you provide some insight on the pipeline creation?
Should be fairly easy to add no?
SubstantialElk6 is this the pip to install the agent, or the pip the agent is using to install the packages for the specific experiment ?
Let me check what's the subsampling threshold
That would be great! Might have to useΒ
2>/dev/null
Β in some of my bash scripts
Feel free to test and PR :)
One other question regarding connecting. We have setup sshd inside the docker image we are using.
Actually the remote session opens port 10022 on the host machine (so it does not collide with the default ssh port)
It actually runs an additional sshd
inside the docker, setting its port.
And the clearml-session will ssh directly into the container sshd...
owning the agent helps, but still it's much better if the credentials don't show up in logs,
They are not, they are always filtered out,
- how does
force_git_ssh_protocol
help please? it doesn't solve the issue of the agent simply not having accessIt automatically maps the host .ssh into the container, so that git can use SSH to clone.
What exactly is not working?
and how are you configuring it?