Reputation
Badges 1
25 × Eureka!Hmm that sounds like the agent needs to access a vault with credentials per user, unfortunately this is not covered in the open-source 😞 I "think" this is supported in the enterprise version as part of the permission management
So like a UI for creating pipelines doing different things on the different solutions ?
I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it 🙂
Can I change the parameters before executing the draft task
Yes you can, after you clone the experiment everything becomes editable, so you can edit the config in the UI.
For example, let's assume I have config.yml, and in my code I do:my_file = task.connect_configuration('config.yml') with open(my_file, 'rt') as f: ...
Then after I clone it in the UI and edit the configuration, when it will be executed remotely,my_file
will contain the content of the configuration as s...
I would clone the first experiment, then in the cloned experiment, I would change the initial weights (assuming there is a parameter storing that) to point to the latest checkpoint, i.e. provide the full path/link. Then enqueue it for execution. The downside is that the iteration counter will start from 0 and not the previous run.
Hi GrotesqueDog77
and after some time I want to delete artifact with
You can simply upload with the same local file name and same artifact name, it will override the target storage. wdyt?
Hi BoredPigeon26
what do you mean by "reuse the task" ? is this manual execution (i.e. from code)?
How about archiving the old version?
You can also force Task.init to always create a new Task (which preserves the previous run alongside the execution tab)
Basically what's the specific use case ?
ReassuredTiger98 could you provide more information ? (versions, scenario. etc.)
ShakyJellyfish91 can you check if version 1.0.6rc2
can find the changes ?
Thanks! I think I was able to locate the issue, but I wanted to verify 🙂
No, I think it might be a glitch in the way the calculate the upload speed, nothing we can do 🙂
JitteryCoyote63 while it's running, could you give me a few details on the setup, maybe I can reproduce it.
Is it using pytorch distributed ?
Are all models uploaded to S3 ?
etc.
DeliciousBluewhale87
You could also just upload the data (i.e do not call close). Then you will be able to change it later obviously, this will make in intractable.
BTW: the clearml-data stores delta changes, so if you only change a few files it will only store those.
Hi, I was expecting to see the container rather then the actual physical machine.
It is the container, it should tunnels directly into it. (or that's how it should be).
SSH port 10022
(BTW: any reason not to use the agent?)
Bugs, definitely GitHub, this is the easiest to track.
Documentation, if these are small issues, Slack is fine, otherwise, GitHub issue.
Regrading the documentation, we are working on another iteration of improvement, but if you find inaccuracies/broken links please report 🙂
IrritableJellyfish76 if this is the case, my question is what is the reason to use Kubeflow? (jupyterLab server spinning is a good answer for example, pipelines are to my opinion a lot less)
Hi JitteryCoyote63
What do you have in the agent.cuda_version
?
(you can see it printed at the beginning of the log)
JitteryCoyote63 with pleasure 🙂
BTW: the Ignite TrainsLogger will be fixed soon (I think it's on a branch already by SuccessfulKoala55 ) to fix the bug ElegantKangaroo44 found. should be RC next week
Hi @<1523701601770934272:profile|GiganticMole91>
Do you mean something like a git ops triggered by PR / tag etc ?
Hi @<1567321739677929472:profile|StoutGorilla30>
Is it necessary to serve keras model using triton engine?
It is not, but it is the most efficient way to serve keras models, and this is why by default clearml-serving is using Nvidia Triton (we are talking 10x factors)
I would start with the keras example, see that it works and then work your way into your example (notice you always need to provide the layers form the in/out of the model)
[None](https://github.com/allegroai/clearml-s...
It should actually work the same, if you find out it fails to properly register let me know (and then I guess a github issue is the next step)
I guess we should have obfuscated the name better 😄
Is the agent itself registered on the clearml-server (a.k.a can you see it in the UI?)
Wait who is creating this file? I thought you remove it in the uncommitted changes
If i were to push the private package to, say artifactory, is it possible to use that do the install?
Yes that's the recommended way 🙂
You add the private repo here, for the agent to use:
https://github.com/allegroai/clearml-agent/blob/e93384b99bdfd72a54cf2b68b3991b145b504b79/docs/clearml.conf#L65