I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
It only happens in the clearml environment, works fine local.
Hi BoredHedgehog47
what do you mean by "in the clearml environment" ?
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
from the notebook run !ls ~/clearml.conf
btw: you can also do cron for that:
None
@reboot sleep 60 && clearml-agent daemon ...
Thanks, new doc site is scheduled for next week, it will also be on github, so pr-ing fixes will be a breeze :)
It will store everything locally, later you can import it back to the server, if you want.
On my to do list, but will have to wait for later this week (feel free to ping on this thread to remind me).
Regrading the issue at hand, let me check the requirements it is using.
DeterminedToad86
So based on the log it seems the agent is installing:
torch from https://download.pytorch.org/whl/cu102/torch-1.6.0-cp36-cp36m-linux_x86_64.whl
and torchvision from https://torchvision-build.s3-us-west-2.amazonaws.com/1.6.0/gpu/cuda-11-0/torchvision-0.7.0a0%2B78ed10c-cp36-cp36m-manylinux1_x86_64.whl
See in the log:Warning, could not locate PyTorch torch==1.6.0 matching CUDA version 110, best candidate 1.7.0But torchvision is downloaded from the cuda 11 folder...
I...
ExcitedSeaurchin87 can I assume in parallel means threads ?
Also, is this a single Dataset version download? at least in theory option (3) is the new default in the latest clearml version. wdyt?
What's the error you are getting ?
(2) yes weekdays with specific hour should do exactly that:)
(3) yes I see your point, maybe we should add boolean allowing you to run immediately?
Back to (1) , let me see if I can reproduce, anything specific I need to add to the schedule call?
Okay fixed, you will be able to override it with output_uri=False (which is ignored on remote execution if you have a project default or Task output uri set in the UI).
Make sense ?
the parameter datatypes are not being changed when loading them up.
These are the auto logged parameters , inside YOLO, correct?
Just to make sure, you can actually see the value None in the UI, is that correct? (if everything works as expected, you should see empty string there)
How did you define the decorator of "train_image_classifier_component" ?
Did you define:@PipelineDecorator.component(return_values=['run_model_path', 'run_tb_path'], ...Notice two return values
AttractiveCockroach17
Can you print the configuration to console when you start he run (you will get a local print and then later the remote print), are they the same? Are the 3 runs the same (local / remote print)
ContemplativeGoat37 I think there was an issues just lije you described and it was solved in later versions, upgrade to the latest clearml package version, you should be fine 🙂
Hi BlandPuppy7 , is this Trains related, are you trying to integrate it, and need help?
New python executable in /home/smjahad/.clearml/venvs-builds/3.6/bin/python2
This is the output of venv create this is odd.
Could it be that by accident you did:pip install cleamrl-agentand notpip3 install clearml-agentand now it is running on python2 (which would explain the error) ?
I would uninstall/reinstall on python3 to verify
Hi WorriedParrot51
Assuming you run the code "manually" once (i.e. without the agent). Then when you call Task.init it will register the argparser.
When running with the agent, the first time you will call parse, it will automatically override the argparse defaults with the values stored in the Task.
Make sesne?
am getting None for Task.current_task() at the beginning of my script.
Task.init() is doing the magic , only after this call you will have current_task (either running manua...
I see, so basically pull a fixed set of configuration for everyone from the server.
Currently only the scale/enterprise version supports such a feature 😞
Exactly 🙂
If you feel like PR-ing a fix, it will be greatly appreciated 🙂
ReassuredTiger98 yes this is odd:
also:Warning, could not locate PyTorch torch==1.12 matching CUDA version 115, best candidate 1.12.0.dev20220407Seems like it found a matching version and did not use it...
Let me check that
ReassuredTiger98 quick update, the issue was located, next RC will already contain a fix.
In the mean time, you can avoid it by using limiting pip version:
https://github.com/allegroai/clearml-agent/blob/715f102f6d98a44131d5bee909ee779b456c6229/docs/clearml.conf#L67pip_version: "<20.2"
ReassuredTiger98 when you look for task "dca2e3ded7fc4c28b342f912395ab9bc" there are no artifacts ?
Could you add some prints? this should have worked...
In the installed packages section it includes
pywin32 == 303
even though that is not in my requirements.txt.
So for some reason it is being detected (meaning your code base actually imports it in code)
But you can just remove it, either by manually editing the cloned Task (right click, reset, then you can edit the section), or via codeTask.ignore_requirements("pywin32") task = Task.init(...)
Hmm Okay, I think the takeaway is that we should print "missing notebook package" 🙂