Well I guess you can say this is definitely not self explanatory line š
but, it is actually asking whether we should extract the code, think of it as:if extract_archive and cached_file: return cls._extract_to_cache(cached_file, name)
command line to the arg parser should be passed via the "Args" section in the Configuration tab.
What is the working directory on the experiment ?
We should probably change it so it is more human readable š
Hi @<1657918706052763648:profile|SillyRobin38>
I have included some print statements
you should see those under the Task of the inference instance.
You can also do:
import clearml
...
def preprocess(...):
clearml.Logger.current_logger().report_text(...)
clearml.Logger.current_logger().report_scalar(...)
, specifically within the containers where the inferencing occurs.
it might be that fastapi is capturing the prints...
[None](https://github.com/tiangolo/uvicor...
It's just another flag when running the trains-agent
You can have multiple service-mode instances, there is no actual limit š
WackyRabbit7 I do 'pkill -f trains' but it's the same... If you need to debug and test run with --foreground and just hit ctrl-c to end the process (it will never switch to background...). Helps?
so if the node went down and then some other node came up, the data is lost
That might be the case. where is the k8s running ? cloud service ?
That is quite neat! You can also put a soft link from the main repo to the submodule for better visibility
First I would check the CLI command it will basically prefill it for you:
https://clear.ml/docs/latest/docs/apps/clearml_task
Specifically to your question, working directory "." is the root of the git repo
But I would avoid adding it manually, use the CLI, it will either use ask you to provide info or take the git repo details from the local copy
Exporter would be nice I agree, not sure it is on the roadmap at the moment š
Should not be very complicated to implement if you want to take a stab at it.
Hmm, you can delete the artifact with:task._delete_artifacts(artifact_names=['my_artifact']
However this will not delete the file itself.
Do delete the file I would do :remote_file = task.artifacts['delete_me'].url h = StorageHelper.get(remote_file) h.delete(remote_file) task._delete_artifacts(artifact_names=['delete_me']
Maybe we should have a proper interface for that? wdyt? what's the actual use case?
Hi @<1523701949617147904:profile|PricklyRaven28>
Sorry, we missed that one
we need to invoke it with
accelerate launch
so we use
subprocess.run
So you have two options, either you change the script entry of the Task from your " script.py
" to" -m accelerate launch script.py
or you manually do that inside your entry point (i.e. call accelerate launch)
BTW, I "think" we added an "auto detect" for it, so that if you launched it manually this wa...
How does this work in the context of a pipeline?
Is your pipeline from functions / decorators ? or is it from Tasks ?
(if this is Tasks then just changing the entry point in the overides)
In case of functions or decorators, you have to do that manually (i.e. your function needs to do "accelerate launch"
from accelerate.commands.launch import launch_command, launch_command_parser
parser = launch_command_parser()
args = parser.parse_args("-command -here".split())
launch_command(arg...
We used subprocess for it, ...
Popen? os.system? fork?
Hi @<1523701260895653888:profile|QuaintJellyfish58>
Is there a way or a trigger to detect when the number of workers in a queue reaches zero?
You mean to spin them down? what's the rational ?
Iād like to implement a notification system that alerts me when there are no workers left in the queue.
How are they "dropping" ?
Specifically to your question, let me check I'm sure there is an API that get's that data becuase you can see it in the UI š
If nothing specific comes to mind i can try to create some reproducible demo code (after holiday vacation)
Yes please! š
In the mean time see if the workaround is a valid one
would those containers best be started from something in services mode?
Yes as long as the machine has enough cpu/ram
Notice that the services mode will start a second parallel Task after the first one is done setting up the env, if running with CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL, with containers that have git/python/clearml-agent preinstalled it should be minimal.
or is it possible to get no-overhead with my approach of worker-inside-docker?
No do not do that, see above e...
I can see all the steps like git clone,
git clone has nothing to do with "env setup" this is brining the code, you cannot skip that one, that said, this is why the git itself is cached on the host machine, so it is fast
... There may be some odd package that need to be installed because one of our DS is experimenting ... But all that we can see what is happening.
even if everything is preinstalled, it Verifies the packages match, this might take a long time. It's just pip being ...
Hi @<1523715429694967808:profile|ThickCrow29>
I am using the PipelineController with abort_on_failure set to False.
Is this a pipeline from code or from Tasks?
What is the clearml version?
Lastly, if a component fails, and another components is dependent on it's output, how would it run? if it is not dependent, why is it a child component?
I am trying to use the
configuration vault
option but it doesn't seem to apply the variables I am using.
Hi EmbarrassedSpider34 I think this is an enterprise feature...
Manged to make the credentials attached to the configuration when the task is spinned,
I'm assuming env variables ?
Hmm HandsomeGiraffe70
This seem like a bug, let me see what we can do about that š
could it be the parent version was created with an older version of clearml sdk ?
Hmm this is odd, when you press on the parent dataset in the UI, and go to full-details, then the INFO tab. Can you copy here everything ?
if the first task failed - then the remaining task are not schedule for execution which is what I expect.
agreed
I'm just surprised that if the first task is
aborted
instead by the user,
How is that different from failed? The assumption is if a component depends on another one it needs its output, if it does not then they can run in parallel. What am i missing?
@<1523715429694967808:profile|ThickCrow29> this is odd... how did you create the pipeline? can you provide code sample?
Well, PipelineDecorator actually allows you to do the same thing, with the same ability that is clone / modify / enqueue.
(I mean, Pipeline with tasks is also great, I just want to clarify that they have the same capabilities in this respect).
I know about clearml.conf but wanted to avoid ssh-ing through 50 instances to edit it.
LOL yeah, btw: this is exactly the reason the enterprise version has a vault feature, so one could edit the base configuration in the UI and it automatically propagates everywhere
but docker_arguments doesn't propagate if I leave docker_image as None
yeah, that's correct, you have to select a container to be used
The upload itself is in the background.
It should not take long to prepare the plot for sending. Are you experiencing a major delay ?
Local changes are applied before installing requirements, right?
correct