Reputation
Badges 1
25 × Eureka!confirmed that the change had been added by
Make sure you see them in the Task log in the UI (the agent print it when it starts)
Any insight on how we can reproduce the issue?
Can this be reproducible using a simple script that we can also run?
URLs that it was uploaded with, as that URL could change.
How would that change, the actual files are there ?
works seamlessly throughout and in our current on premise servers...
I'm assuming via something close to what I suggested above with .netrc ?
Hi VivaciousPenguin66
Seems like a CUDA/CUDNN issue.
You argent is configured to work in venvmode, which mean it will pull the correct pytorch version based on the detected CUDA driver support. Speicifally you can see in the log "agent.cuda_version = 111" which means CUDA 11.1 and from the log it found the correct pytorch version:
` Torch CUDA 111 download page found
Found PyTorch version torch==1.8.1 matching CUDA version 111
Found PyTorch version torchvision==0.9.1 matching CUDA version 1...
Hey JitteryCoyote63 I think I need to better explain the config feature:agent.package_manager.post_packages = ["PyJWT"]
Basically this means that IF you have pyjwt in the installation package it will be installed after everything else is installed.
This doesn't mean it will always be installed.
Think for example "horovod" has to be installed after you have TF / PyTorch installed.
(The same goes for "pre_package" and Cython)
I'm assuming you mean for the clients, right?
Hi VexedCat68
txt file or pkl file?
If this is a string , it just stored it (not as a file, this is considered a "link")
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/clearml/binding/artifacts.py#L521
- try with the latest RC
1.8.1rc2
, it feels like after git clone, it spend minutes without outputting anything
yeah that is odd , can you run the agent with --debug (add before the daemon
command) , and then at the end of the command add --foreground
Now launch the same task on that queue, you will have a verbose log in the console.
Let us know what you see
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
So the only difference is how I log in into machine to start clear-ml
the only different that I can think of is the OS Environments in the two login types:
can you run export
in the two cases and check the diff between them?export
now, I need to pass a variable to the Preprocess class
you mean for the construction ?
hmm that would explain it failing
The latest RC (0.17.5rc6) moved all logs into separate subprocess to improve speed with pytorch dataloaders
EnviousStarfish54 following on this issue, the root cause is that dictConfig will clean All handlers if Not passed "incremental": True
conf_logging = { "incremental": True, ... }
Since you pointed that Kedro is internally calling logging.config.dictConfig(conf_logging)
,
this seems like an issue with Kedro as this call will remove All logging handlers, which seems problematic. wdyt ?
I see TightElk12
You can always setup the OS environments : CLEARML_API_HOST CLEARML_WEB_HOST CLEARML_FILES_HOST with the correct configuration Or you can simply set CLEARML_NO_DEFAULT_SERVER=1 which will prevent any usage of the default demo serverwdyt?
I think so (you can also comment out the Task.init() just to verify this is not a clearml issue)
Your git execution needs this file, just like your machine does, to know where the server is and how to authenticate. You have to Manually pass it to your git action.
That said, you might have accessed the artifacts before any of them were registered
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
okay so it is downloaded to your machine, and unzipped , is that part correct?
Thanks ShakyJellyfish91 ! please let me know what you come up with, I would love for us to fix this issue.
Hmm I see what you mean. It is on the roadmap (ETA the next version 0.17, 0.16 is due in a week or so) to add multiple models per Task so it is easier to see the connections in the UI. I'm assuming this will solve the problem?
Hi @<1547390415320125440:profile|SilkySparrow85>
because it is trying to send a debug-sample to fileserver!
Yes, you should always configure the "files server" to point to your minio S3, basically:
None
files_server: "
"
But do not forget to also configure the credentials here:
[None](https://github.com/allegroai/clearml/blob/40c6db9d95016382c721546d42...
We should probably make sure it is properly stated in the documentation...
Thanks @<1527459125401751552:profile|CloudyArcticwolf80> ! let me see if we can reproduce it
We already redesigned the implementation so it should be quite easy to extend to GCP and Azure, what are you planning ?
Unfortunately not yet in venv mode. What would you have put there?