Reputation
Badges 1
25 × Eureka!Thatβs the question i want to raise too,
No file size limit
Let me try to run it myself
Do you have a specific numpy version you are installing ? why is it trying to install the wheel from code?
So I had to add it explicitly via a docker init script
Oh yes, that makes sense, can't think of a better hack other than sys.path.append(os.path.join(os.path.dirname(__file__), "src"))
HurtWoodpecker30 in order to have the venv cache activated, it uses the full "pip freeze" it stores on the "installed packages", this means that when you cloned a Task that was already executed, you will see it is using the cached venv.
(BTW: the packages themselves are cached locally, meaning no time is spent on downloading just on installing, but this is also time consuming, hence the full venv cache feature).
Make sense ?
is it normal that it's slower than my device even though the agent is much more powerful than my device? or because it is just a simple code
Could be the agent is not using the GPU for some reason?
Found it, definitely a bug in the callback, it has not effect on the HPO process itself
I called task.wait_for_status() to make sure the task is done
This is the issue, I will make sure wait_for_status() calls reload at the ends, so when the function returns you have the updated object
what does it mean to run the steps locally?
start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, other...
So the main difference is kedro pipelines are function based steps (I might be overly simplifying, so please take it with a grain of salt), while in ClearML pipeline is Job, i.e. it needs its own environment and is longer than a few seconds (as opposed to a single function)
Hangs there ? could it be that it's uploading slowly ?
Can you check the network ?
You can check the example here, just make sure you add the callback and you are good to go π
https://github.com/allegroai/trains/blob/master/examples/frameworks/keras/keras_tensorboard.py#L107
these are being repeated as well for a single task (this is training a t5_model with transformers):Β (edited)
Seems like someone is storing lots of files with torch.save
that ClearML automatically logs.
You can disable the autolog:task = Task.init(..., auto_connect_frameworks={'pytorch': False})
UnevenDolphin73 FYI: clearml-data is documented , unfortunately only in GitHub:
https://github.com/allegroai/clearml/blob/master/docs/datasets.md
MagnificentSeaurchin79
Can this be solved by using a docker image with the preinstalled packages at a user level?
Yes π
BTW: I think I missed how you managed to install the object_detection API in the first place?
Is it the git repo of the Task? did you fork it? is it a submodule of your git repo?
p.s.
Yes Slack is quite good at reminding you, but generally saying always prefer @ , it will send me an email if I miss the message :)
BoredHedgehog47 you need to make sure "<path here>/train.py" also calls Task.init (again no need to worry about calling it twice with different project/name)
The Task.init call will make sure the auto-connect works.
BTW: if you do os.fork , then there is no need for the Task.init, the main difference is that POpen starts a whole new process, and we need to make sure the newly created process is auto-connected as well (i.e. calling Task.init)
Very lacking wrt to how things interact with one another
If I'm reading it correctly, what you are saying is that some of the "big picture" / holistic approach on how different parts interact with one another is missing, is that correct?
I think ClearML would benefit itself a lot if it adopted a documentation structure similar to numpy ecosystem
Interesting thought, what exactly would you suggest we "borrow" in terms of approach?
Specifically for this one, this is the auto generated docstring from the actual code, so PR to the
https://github.com/allegroai/clearml/blob/e53a76b713910adaf87578c69e86f8154d4ab4c1/clearml/logger.py#L152
You are correct, the agent will clone the git and install the requirements, as written in the task installed packages section. Regrading the git branch, notice it will pull the specific commit id as stated in the execution section, and it will apply any uncommitted changes. You can edit the execution section and change the commit to the latest in a specific version (you should probably also clear the uncommitted changes of you do that)
Hi TightElk12
would like to understand the limitations ofΒ
Task.current_task()
Basically this will always get you an instance of the current Task. This will work from sub-processes as well as the main process. Is there a specific scenario you have in mind, or a challenge with the use case ?
could one also limit the number of CPU cores available?
If you are running in docker mode you can add:--cpus=<value>
see ref here: https://docs.docker.com/config/containers/resource_constraints/
Just add it to extra_docker_arguments
:
https://github.com/allegroai/clearml-agent/blob/2cb452b1c21191f17635bcb6222fa8bfd82afe29/docs/clearml.conf#L142
So good news (1) Dashboard is being worked on as we speak. (2) we released clearml-serving doing exactly that, the next release of clearml-serving will include integration with kfserving (under the hood) essentially managing the serving endpoints on top of the k8s cluster , wdyt?
Requested version: 2.28, Used version 1.0" for some reason
This is fine that means there is no change in that API
In the documentation it warns about
.close()
"Only call Task.close if you are certain the Task is not needed."
Maybe this is not clear enough, this means you do not need to automatically Add/Log/Track things into the Task in the current process.
This does Not mean you cannot access the Task or its artifacts
Mark closed means to externally (i..e not from the process that crated the Task, maybe even from a different machine) close and mark the task as completed (this...
Thank you @<1523701949617147904:profile|PricklyRaven28> !!!
Let me see if we can reproduce and how to solve it
Would it also be possible to query based on
multiple
user properties
multiple key/value I think are currently not that easy to query,
but multiple tags are quite easy to do
tags=["__$all", "tag1", "tag2],
Hi MinuteGiraffe30
Thank you so much for your awesome product!
π !
s address 10.68.167.10. I am able to send requests from all other virtual machines on the server to the address 10.68.167.10:8008. However, when I try to do this from my own computer connected to the corporate network via VPN, it fails to connect to 8008.
I'm assuming there is a firewall on the VPN connection itself (i.e. the VPN gateway) that blocks 8008 port, as you already tried curl to 8008 is...
Wonβt they be printed out as well in the Web UI?
They would in the log, but it will not be stored back on the Task (the idea is these are "agent specific" additions no need for them to go with the Task)
So Iβve tried the approach and it does work,
ScantChimpanzee51 What do you mean it does not work? what exatcly are you trying with task.connect and does not work?
Is there a way to inject environment variables into a Task or into its container?
Yes you can with:
` task.s...
SubstantialElk6 when you say "Triton does not support deployment strategies" what exactly do you mean?
BTW: updated documentation already up here:
https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving