
Reputation
Badges 1
25 × Eureka!JitteryCoyote63 fix should be pushed later today π
Meanwhile you can manually add the Task.init() call to the original script at the top, it is basically the same π
Hi @<1547028116780617728:profile|TimelyRabbit96>
Notice that if running with docker compose you can pass an argument to the clearml triton container an use shared mem. You can do the same with the helm chart
Hi HappyDove3task.set_script
is a great way to add the info (assuming the .git is missing)
Are you running it using PyCharm? (If so use the clearml pycharm plugin, it basically passes the info from your local git to the remote machine via OS environment)
Could be nice to write some automation
Hi @<1618056041293942784:profile|GaudySnake67>Task.create
is designed to create an External task not from the current running process.Task.init
is for creating a Task from your current code, and this is why you have all the auto_connect parameters. Does that make sense ?
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
It's relatively new and it is great as from the usage aspect it is exactly like a user/pass only the pass is the PAT , really makes life easier
instead of terminating them once they are inactive, so that they could be available immediately when they are needed.
JitteryCoyote63 I think you can increase the IDLE timeout on the autoscaler, and achive the same behavior, no ?
Sure thing, any specific reason for querying on multi pod per GPU?
Is this for remote development process ?
BTW: the funny thing is, on bare metal machines multi GPU woks out of he box, and deploying it with bare metal clearml-agents is very simple
CurvedHedgehog15 is it plots or scalars you are after ?
understood, can you tryTask.add_requirements("-e path/to/folder/")
To clarify, there might be cases where we get helm chart /k8s manifests to deploy a inference services. A black box to us.
I see, in that event, yes you could use clearml queues to do that, as long as you have the credentials the "Task" is basically just a deployment helm task.
You could also have a monitoring code there so that the same Task is pure logic, spinning the helm chart, monitoring the usage, and when it's done taking it down
Hi RipeAnt6
What would be the best way to add another model from another project say C to the same triton server serving the previous model?
You can add multiple call to cleaml-serving
, each one with a new endpoint and a new project/model to watch, then when you launch it it will setup all endpoints on a single Triton server (the model optimization loading is taken care by Triton anyhow)
AttractiveCockroach17 could it be Hydra actually kills these processes?
(I'm trying to figure out if we can fix something with the hydra integration so that it marks them as aborted)
I useΒ
torch.save
Β to store some very large model, so it hangs forever when it uploads the model. Is there some flag to show a progress bar?
I'm assuming the upload is http upload (e.g. the default files server)?
If this is the case, the main issue we do not have callbacks on http uploads to update the progress (which I would love a PR for, but this is actually a "requests" issue)
I think we had a draft somewhere, but I'm not sure ...
Yes clearml is much better π
(joking aside, mlops & orchestration in clearml is miles better)
CheerfulGorilla72 What are you looking for?
is it planned to add a multicursor in the future?
CheerfulGorilla72 can you expand? what do you mean by multicursor ?
Hi @<1724235687256920064:profile|LonelyFly9>
So, I noticed that with the REST API at least the
/tasks.get_all
endpoint appears to have an undocumented maximum page size of 500.
Yeah otherwise the request size might be too big, but you have pagination:
page
optional Page number, returns a specific page out of the resulting list of tasks
Minimum value : 0 integer
So it is the automagic that is not working.
Can you print the following before calling Both Task.debug_simulate_remote_task
and Task.init
, Notice you have to call Task.initprint(os.environ)
Hi JitteryCoyote63
cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server machine?
It assumes you have an agent connected to the "services" queue π
That said, it also tries to delete the tasks artifacts/models etc, you can see it here:
https://github.com/allegroai/trains/blob/c234837ce2f0f815d3251cde7917ab733b79d223/examples/services/cleanup/cleanup_service.py#L89
The default configuration will assume you are running i...
Hi GrievingTurkey78
Can you test with the latest clearml-agent RC (I remember a fix just for that)pip install clearml-agent==1.2.0rc0
This is an odd error, could it be conda is not installed in the container (or in the Path) ?
Are you trying with the latest RC?
MistakenBee55 how about a Task doing the Model quantization, then trigger it with TriggerScheduler ?
https://github.com/allegroai/clearml/blob/master/examples/scheduler/trigger_example.py
inΒ
Β issues a delete command to the ClearML API server,...
almost, it issues the boto S3 delete commands (directly to the S3 server, not through the cleaml-server)
And that I need to enter an AWS key/secret in the profile page of the web app here?Β (edited)
correct
Hi ShinyRabbit94
system_site_packages: true
This is set automatically when running in "docker mode" no need to worry π
What is exactly the error you are getting ?
Could it be the container itself has the python packages installed in a venv not as "system packages" ?
Hi SparklingElephant70
Anyone know how to solve?
I tired git push before,
Can you send the entire log? Could it be that the requested commit ID does not exist on the remote git (for example force push deleted it) ?