Reputation
Badges 1
25 × Eureka!Hmmm why don't you use "series" ?
(Notice that with iterations, there is a limit to the number of images stored per title/series , which is configurable in trains.conf, in order to avoid debug sample explosion)
GiddyTurkey39 my bad π try this onetask._update_requirements({})
And you pass:
scheduler.add_task(..., reuse_task=True)
?
Can you send the console output of this entire session please ?
In order to clone the Task it needs to complete sync, which implies closing. I guess the use case for execute remotely while still running was not considered. How / why is this your workflow? Specifically how does Jupyter get into the picture?
The problem is of course filling in all the configuration details, so that they are viewable.
Other than that, check out:
https://allegro.ai/docs/task.html#trains.task.Task.export_task
https://allegro.ai/docs/task.html#trains.task.Task.import_task
Sounds good ?
I think this one is on us, I don't think a search would have led you to the correct answer ...
I'll try to make sure they add something regrading the configuration π
where is it persisted? if I have multiple sessions I want to persist, is that possible?
On the file server, yeah it should be support that, you can specify the --continue-session to continue a previously used one.
Notice it does delete older "snapshots" (i.e. previous workspace) when you are continuing a session (use --disable-session-cleanup to disable it)
Hi ReassuredTiger98
Could you add some print ? before / after the artifact upload?
Also what's the clearml version you are using ?
My typos are killing us, apologies :
change -t to -it it will make it interactive (i.e. you can use bash π )
If the load balancer it Gateway can do the computation and leverage caching,
Oh that's True. But unfortunately out of scope for the open-source (well at the end someone needs to pay our salaries π )
Iβd prefer not to have our EC2 instance directly exposed to the public Internet.
Yep, I tend to agree π
Hi GiddyTurkey39 ,
When you say trains agent, are you referring to the trains agent command ...
I mean running the trains-agent daemon on a machine. This means you have a daemon pulling jobs from the execution queue and executing them (either in virtual environment, or inside a docker)
You can read more about https://github.com/allegroai/trains-agent and https://allegro.ai/docs/concepts_arch/concepts_arch/
Is it sufficient to queue the experiments
Yes there is no ne...
Hi WickedBee96
How can I do that?
clearml-task
https://clear.ml/docs/latest/docs/apps/clearml_task#what-is-clearml-task-for
I know this way to run it in the agent only by enqueue the draft after running it on my local machine so is there another way?
Or maybe are you looking for task.execute_remotely
https://clear.ml/docs/latest/docs/references/sdk/task#execute_remotely
Bummer... that seems like a bit of an oversight tbh.
There is never a solution for those, unless the helm chart "knows" something about the server before spinning it the first time, which basically means a predefined access-key, I do not think we want that π
HighOtter69
Could you test with the latest RC? I think this fixed it:
https://github.com/allegroai/clearml/issues/306
Hi @<1547028116780617728:profile|TimelyRabbit96>
It should process the new request A (this is a multi threading / async implementation)
Is this consistent with what you are seeing ?
MagnificentSeaurchin79 you can delay it with:task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Okay that seems to explain it. Now the question is why it installed it in the wrong place.
. I am not sure this is related to the fact the model is not correctly converted to TorchScript
Because Triton Only supports TorchScript (Not torch models) π
So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
Yeah I think that for some reason it fails detecting this is actually jupyter noteboko (not really sure why), Thank you for double checking on the container !!
This seems to be more complicated than what it looks like (ui/backend combination), not are not working on it, just that it might take some time as it passes control to the backend (which by design does not touch external storage points).
Maybe we should create an S3 cleanup service, listing buckets and removing if the Task ID does not exist any longer. wdyt?
Thus, the return data from step 2 needs to be available somewhere to be used in step 3.
Yep π
It will serialize the data on the dict?
I thought it will just point to a local file location where you have the data π
I didnβt know that each steps runs in a different process
Actually ! you can run them as functions as well, try:if __name__ == '__main__': PipelineDecorator.debug_pipeline() # call pipeline function hereIt will just run them as functions (ret...
Maybe before everything else, can you share some background on the rational if starting a new sub process?
Hmm are you getting the warning on the client side , or in the clearml-server ?
It was set to true earlier, I changed it to false to see if there would be any difference but doesnβt seem like it
I would actually just add:Task.add_requirements('google.cloud')Before the Task.init call (Notice, it has to be before the the init call)
TenseOstrich47 you can actually enter this script as part of the extra_docker_shell_script
This will be executed at the beginning of each Task inside the container, and as long as the execution time is under 12h, you should be fine. wdyt?
Hi JuicyFox94
I think you are correct, this bug will explain the entire thing.
Basically what happens is that remote_execute stops the local run before the configuration is set on the Task. Then running remotely the code pull the configuration, sees that it is empty and does nothing.
Let me see if I can reproduce it...