Reputation
Badges 1
25 × Eureka!Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
Hi CooperativeFox72 trains 0.16 is out, did it solve this issue? (btw: you can upgrade trains to 0.16 without upgrading the trains-server)
Hmm, can you send the full log of the pipeline component that failed, because this should have worked
Also could you test it with the latest clearml python version (i.e. 1.10.2)
Hi SteadyFox10
I'll use your version instead and put any comment if I find something.
Feel free to join the discussion π https://github.com/pytorch/ignite/issues/892
Thansk for theΒ
ouput_uri
Β can I put in theΒ
~/trains.conf
Β file ?
Sure you can π
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L152
You can add it in the trains-agent machine's conf file, or/and on your development machine. Notice that once you run ...
Hi UpsetTurkey67
"General/my_parameter_name" so that only this part of the configuration will be updated?
I'm assuming this is a Hyperparameter not a configuration object (i.e. task.connect not task.connect_configuration), if this is the case then Yes π
Hi FunnyTurkey96
Which pip are you using, basically pip changed the dependency resolver after 20.1
Change: https://github.com/allegroai/clearml-agent/blob/aede6f4bac71c8fc56e7cf982318a48527953a3c/docs/clearml.conf#L57pip_version: "<20.2"
See if that helps
Hi ContemplativeCockroach39
Seems like you are running the exact code as in the git repo:
Basically it points you to the exact repository https://github.com/allegroai/clearml and the script examples/reporting/pandas_reporting.py
Specifically:
https://github.com/allegroai/clearml/blob/34c41cfc8c3419e06cd4ac954e4b23034667c4d9/examples/reporting/pandas_reporting.py
So without the flush I got the error apparently at the very end of the script -
Yes... it's a python thing, background threads might get killed in random order, so that when one needs a background thread that died you get this error, which basically should mean you need to do the work in the calling thread.
This actually explains why calling Flush solved the issue.
Nice!
okay, just so I understand, this is what you have on your client that can connect with the server:api { api_server:
web_server:
files_server:
credentials {"access_key": "KEY", "secret_key": "SECRET"} }
Hi PompousParrot44
Could you send the "Installed Packages" list?
I think there is a bug in the current trains-agent (there is already a fix but the RC is still not out),
where "packeg @ git+http" packages ignore the git+http link.
You can solve it manually by just editing the "Installed packages" (when Task is in draft mode, the section becomes editable), and remove the "package @" part, and leave the "git+http" link.
SubstantialElk6 try to add -e CLEARML_AGENT_EXTRA_PYTHON_PATH=/code/app/flair
It should add it to the runtime pythonpath
(to the BASE DOCKER IMAGE on the Task itself)
let me check when a fix can be deployed for Hydra...
Hi TenseOstrich47 whats the matplotlib version and clearml version you are using ?
FYI: These days TB became the standard even for pytorch (being a stand alone package), you can actually import it from torch.
There is an example here:
https://github.com/allegroai/trains/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
HealthyStarfish45 did you manage to solve the report_image issue ?
BTW: you also have
https://github.com/allegroai/trains/blob/master/examples/reporting/html_reporting.py
https://github.com/allegroai/trains/blob/master/examples/reporting/...
WackyRabbit7 hmmm seems like non regular character inside the diff.
Let me check something
Quick update, I might have been able to reproduce the issue ( GreasyPenguin14 working "offline" is a great hack to accelerate debugging this issue, thank you!)
It seems it is related to the known and very annoying Python forking issue (and this is why changing to "spawn" method solves the issue):
https://bugs.python.org/issue6721
Long story short, in some cases when forking (i.e. ProcessPoolExecutor), python can copy locks in a "bad" state, this means that you can end up with a lock acquir...
Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
to setup ClearML agent in kubernetes with the SSH keys?
You can add env variable:CLEARML_AGENT__AGENT__FORCE_GIT_SSH_PROTOCOL="true"
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#dynamic-environment-variables
CooperativeFox72 I would think the easiest would be to configure it globally in the clearml.conf (rather than add more arguments to the already packed Task.init) π
I'm with on 60 messages being way too much..
Could you open a Github Issue on it, so we do not forget ?
I think you are correct, it seems like it is missing requirements to boto/azure/google (I will make sure this is added). In the meantime, you can stop the "triton serving engine" Task, reset it, add boto3 to the installed packages and relaunch.
That said your main issue might be packaging the python model. Basically you need to create a model from the entire folder (with whatever there is inside the folder), then Triton should be able to run it (if the config.pbtxt is correct).
` m = OutputMo...
Hi CurvedHedgehog15
Yes you are correct, plots are displayed side-by-side in the ui. The reason is that since they are very generic, it is very challenging to actually be able to merge / overlay two arbitrary plots.
I can see two options
- To allow user to combine two plots in the ui (this way the responsibility is on the user to understand this is possible
- Maybe add programmatic interface to more easily access the raw data?
Wdyt?
LovelyHamster1 verified, this is a UI bug with old limitation enforced.
I will make sure they know about it, it should be fixed for the upcoming release π
Hi LovelyHamster1
Could you think of a toy code that reproduces this issue ?
In any case, do you have any suggestion of how I could at least hack tqdm to make it behave? Thanks
I think I know what the issue is, it seems tqdm is using Unicode for the CR this is the 1b 5b 41
sequence I see on the binary log.
Let me see if I can hack something for you to test π
Hi FancyWhale93 , in your clear.conf configure default output uri, you can specify the file server as default, or any object storage:
https://github.com/allegroai/clearml-agent/blob/9054ea37c2ef9152f8eca18ee4173893784c5f95/docs/clearml.conf#L409
Let me try to build a minimal reproducible version
Thank you!
Hi DrabCockroach54
Notice the free GPU memory is global hence (low), but the memory (at least with new nvidia drivers) is per process. I'm assuming that the processes using the memory is not a sub process? could that be ? whats the OS you are running on?
Yes π
BTW: do you guys do remote machine development (i.e. Jupyter / vscode-server) ?