Reputation
Badges 1
25 × Eureka!Can you please elaborate on the latter point? My jupyterhubβs fully containerized and allows users to select their own containers (from a list i built) at launch, and launch multiple containers at the same time, not sure I follow how toes are stepped on. (edited)
Definitely a great start, usually it breaks on memory / GPU-mem where too many containers on the same machine are eating each others GPU ram (that cannot be virtualized)
If there is new issue will let you know in the new thread
Thanks! I would really like to understand what is the correct configuration
` from time import sleep
from clearml import Task
import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?
Follow-up; any ideas how to avoid PEP 517 with the auto scaler?
Takes a
long
time to build the wheels
enable venv caching ?
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L116
Hi @<1686547375457308672:profile|VastLobster56>
where are you getting stuck? are you getting any errors ?
JitteryCoyote63 How can I reproduce it quickly?
I would suggest deleting them immediately when they're no longer needed,
This is the idea for the next RC, it will delete them after it is done using π
it handles 2FA if my repo lies in Github and my account needs 2FA to sign in
It does not π
PompousParrot44 What is the "working directory" on the experiment itself? and the "script path"?
Based on what you wrote above, in order for it work you should have:
working directory: "."
script path: "-m test.scripts.script"
notice no "--args" and working directory is "." (i.e. the root of the repository)
So the thing is clearml automatically detects the last iteration of the previous run, my assumption you also add it hence the double shift.
SourOx12 could that be it?
I think they (DevOps) said something about next week, internal roll-out is this week (I think)
PanickyMoth78 RC is outpip install clearml==1.6.3rc1π€
On my to do list, but will have to wait for later this week (feel free to ping on this thread to remind me).
Regrading the issue at hand, let me check the requirements it is using.
I think that what you need is the triggers, check this one:
https://clear.ml/docs/latest/docs/references/sdk/trigger
Specifically your error seems to be an issue with nvidia Triton container upgrade
Sadly no π
(I mean you could quickly write a reader for TB and report it, but it is not built into the SDK)
In that case you should probably mount the .ssh from the host file-system into the docker. for example:docker run -v /home/user/.ssh:/root/.ssh ...WickedGoat98 the above assumes your are running the docker manually, if you are using docker-compose.yml file the same mount should be added to the docker-compose.yml
yea the api server configuration also went away
okay that proves it
Nice, that seems to be the issue. Any chance you can open a GitHub issue, so we do not loose track of it ?
Can you see it on the console ?
JitteryCoyote63 I think I failed explaining myself.
- I think the problem of the controller is that you are interacting (aka changing hyper parameters)) with a Task created using new SDK version, with an older SDK version. specifically we added section names to the hyper parameters, and only new version of the SDK is aware of it.
Make sense? - Regrading the actual problem. It seems like this is somehow related to the first one, the task at run time is using an older SDK version , and I t...
This line π
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it
what do you have here in your docker compose :
None
DefeatedCrab47 if TB has it as image, you should find it under "debug_samples" as image.
Can you locate it there ?
Wait IrritableOwl63 this looks like ti worked, am I right ? huggingface was correctly installed
GrievingTurkey78 in your cleaml.conf do you have?agent.package_manager.type: condaOr
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L59
I assume the account name and key refers to the storage account credentials that you can from Azure Storage Explorer?
correct
(It would be nice to have all the Pypi releases tagged in github btw)
I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.