
Reputation
Badges 1
25 × Eureka!Hi FierceHamster54
Dataset is downloading multi threaded already
But yes get_local_copy() is thread / process safe
Right! I just noticed that! this is odd... and yes defiantly has something to do with the multi pipeline executed on the agent, I think I know what to look for ...
(just making sure (again), running_locally produced exactly what we were expecting, is that correct?)
Yup, I just wanted to mark it completed, honestly. But then when I run it, Colab crashes.
task.close()
will do that
BTW what's the exception you are getting ?
Hi @<1627478122452488192:profile|AdorableDeer85>
Are you referring to running the pipeline on a remote machine ? could you provide the full Task/Pipeline log ?
Hi @<1600661423610925056:profile|StrongMouse81>
using serving base url and also other endpoint of model we add using:
clearml-serving model add
we get the attached respond:
And other model endpoints are working for you?
Nice guys! Notice that the clearml-task can auto add the Task.init call on the fly, so you can connect any arbitrary Task and control the argparser arguments (again as parameters to the cleaml-task)
BTW: A fix for the --task-type Issue will be pushed later today π
Hi LonelyMoth90 , where exactly are you getting the error ? Is it trains-agent running your experiment ?
Hi MoodyCentipede68 , I think I saw something like it, can you post the full log? The triton error is above, also I think it restarted the container automatically and then it worked
@<1787653555927126016:profile|SoggyDuck67> notice the binary
field in the Task "execution" tab, if for some reason it says "python3.10" it will try to use pytho 3.10 when running it.
That said if it does not find the request python version, it should output a warning and default to the python installed.
If you can provide the full log it will be helpful to see what happened there
Maybe WackyRabbit7 is a better approach as you will get a new object (instead of the runtime copy that is being used)
Building the pipeline in runtime from external configuration is very cool!!
I think nested components is exactly the correct solution, and it is a great use case.
Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
This is an odd error, could it be conda is not installed in the container (or in the Path) ?
Are you trying with the latest RC?
I have the agent configured to force install requirements.txt
what do you mean by that?
right click on the experiment, select Reset, now you can edit it.
Shout-out to Emilio for quickly stumbling on this rare bug and letting us know. If you have a feeling your process is stuck on exit, just upgrade to 1.0.1 π
you could also use:
https://github.com/allegroai/clearml/blob/ce7e77a00e869a2690f31cbc578636ce88bc4613/docs/clearml.conf#L188
and setup the clearml.conf
on the users machine to automatically log the environment variables at run time (stored under the Configuration tab).
Then the agent will pull these same variables at execution time and set them
Hi @<1569858449813016576:profile|JumpyRaven4>
- The gunicorn logs do not show anything including any error or trace of the 502 only siege reports the 502 as well as the ALB.Is this an ALB or an ELB ?
What's the timeout its configured?
Do you have GPU instances as well? what's theclearml-serving-inference
docker version ?
How can I specify the agent to use a specific conda environment inside the docker?
Hi CrookedWalrus33
By default it will pick the highest python in the PATH.
Then if you have a python version (in PATH) that matches the requested on on the Task, it will look for it.
Do you want to limit it to a specific python binary ?
How can I make a task that does a helm install or kubectl create deployment.yaml?
The task that it launches should have your code that actually does the helm deployments and other things, thing of the Task as a way to launch a script that does something, that script can then just interact with the cluster. The queue itself (i.e. clearml-agent) will not directly deploy helm charts, it will only deploy jobs (i.e. pods)
Hi GiganticTurtle0
Is there a simple way to makeΒ
Task.init
Β compatible with
Dask.distributed
Β client?
Please tell me more π
I think Dask is trying to pickle you Task object (which is not pickable).
You can however create the Task once with Task.init
and pass the Task ID to the child processes and then use Task.init(..., continue_last_task=task_id_here)
wdyt?
WackyRabbit7
Long story short, yes, only by name (hashing might be too slow on large files)
The easiest solution, if the hash is incorrect, delete the local copy it returns, and ask again, it will download it.
I'm not sure if the hashing is exposed, but if it is not, we can add it.
What do you think?
However, I have not yet found a flexible solution other than ssh-agent forwarding.
And is it working?
Hi JitteryCoyote63 when you run the trains-agent it tells you where it puts the logs, it's a temp auto generated filename usually under /tmp/Running TRAINS-AGENT daemon in background mode, writing stdout/stderr to /tmp/.trains_agent_daemon_out4uahki3i.txt
and they don't know how to write code, is this still possible?
well this means there is some standard of the data, right? what is that standard? unfortunately in our space there is no standard fort data, it's just too generic, so everyone always end with custom parsing of a sort.
Does that make sense ?