Hi @<1619505588100665344:profile|GrievingHare27>
My understanding is that initiating a task with
Task.init()
captures the code for the entire notebook. I'm facing difficulties when attempting to build a final training pipeline (in a separate notebook) that uses only certain functions from the other notebooks/tasks as pipeline steps.
Well this is is kind of the limit of working with jupyter notebooks, referencing code from one to another is not really feasible (of co...
Hi MinuteCamel2
I can I disable it from automatically uploading model checkpoints to ClearML servers?
Maybe this one can help :)
https://www.youtube.com/watch?v=etGjxOKG9lo
deleted all of the models from my ClearML project but I still receive this message. Do you know why?
It might take it a few hours to update... 😞
follow the backup procedure, it is basically the same process
Hi UpsetBlackbird87
I might be wrong, but it seems like ClearML does not monitor GPU pressure when deploying a task to a worker rather rely only on its configured queues.
This is kind of accurate, the way the agent works is that you allocate a resource for the agent (specifically a GPU), then sets queues (plural) to listen to (by default priority ordered). Then each agent is individually pulling jobs and running on the allocated GPU.
If I understand you correctly, you want multiple ...
Hi MistakenDragonfly51
Hello everyone! First, thanks a lot to everyone that made ClearML possible,
❤
To your questions 🙂
long story short, no unless you really want to compile the dockers, which I can't see the real upside here Yes, add the following /opt/clearml.conf:/root/clearml.conf
herehttps://github.com/allegroai/clearml-server/blob/5de7c120621c2831730e01a864cc892c1702099a/docker/docker-compose.yml#L154
and configure your hosts " /opt/clearml.conf"
with ...
Check here:
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L78
You can configure credentials based on the bucket name. Should work for Azure as well
Hi @<1628565287957696512:profile|AloofBat92>
Yeah the name is confusing, we should probably change that. The idea is it is a low code / high code , train your own LLM and deploy it. Not really chatgpt 1:1 comparison, more like, GenAI for enterprises. make sense ?
Hi TenseOstrich47 whats the matplotlib version and clearml version you are using ?
Yes, but as you mentioned everything is created inside the lib, which means the python is not able to intercept the metrics so that clearml can send them to the backend.
Thanks SubstantialElk6 !
Happy new year 🎉 🍺 🍾 🎇
GreasyPenguin14 you mean the artifacts/models ?
Task.init(..., output_uri='s3://...')
So how do I solve the problem? Should I just relaunch the agents? Because they can't execute jobs now
Are you running in docker mode ?
If so you can actually delete mapped files (they will still be available inside the docker), just make sure you delete them X hours after they were created, and you should be fine.
wdyt?
Anyhow if the StorageManager.upload was fast, the upload_artifact is calling that exact function. So I don't think we actually have an issue here. What do you think?
Could it be the code is not in a git repository ?clearml
support either a single script or a git repository, but Not a collection of standalone files. wdyt?
VirtuousFish83
Hmm that is odd, could you send the full log?
upload_artifact
will actually do two things:
upload the file to the trains-server register it as an artifact on the experiment
What did you mean by "register the artifact manually"? You still need to upload the file to the trains-server (so it is later accessible )
I think RC should be out in a day or two, meanwhile pip install git+
https://github.com/allegroai/clearml.git
What will I do to fix my problem?
What is the problem? we just proved the upload speed is just fine?
When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.
But it worked well when using StorageManager and uploading to the minio directly, is that correct?
.. I give my Minio to output_uri argument
How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)
@<1523711619815706624:profile|StrangePelican34> are you saying that after the " with
" block the task is marked completed? how is that possible? is this done manually ?
MuddyCrab47 could you post the full sample code you are using?
No worries, I would love for us to come up with a nice solution 🙂
The agent cannot use another user (it literally has no way of getting credentials). I suspect this is all a by product of the actual mount point)
I mean , the python package, not the trains-server version
i.e. runpip install --upgrade trains
PompousBeetle71 quick question, will you ever want to pass an empty string ? reason for asking is that it is either one or the other, there is no way for Trains to actually differentiate (from the web UI, perspective this is just an empty string field...)
PompousBeetle71 BTW: if you remove the type=str
from the argparse, it will do what you want, None will stay None (instead of ''), all other values will be of type str
as this is always the default 🙂
PompousBeetle71 If this is argparser and the type is defined, the trains-agent will pass the equivalent in the same type, with str
that amounts to '' . make sense ?