Reputation
Badges 1
25 × Eureka!Hi GrievingTurkey78
How are you getting different version than what is used in run time? it analyzes the PYTHONPATH just as python does ? How can I reproduce it? Currently you can use Task.add_requirements(package_name, package_version=None) This will not force it though, it is a recommendation (if it fails to find the package itself) maybe we can add force ?!What do you think?
Maybe we should add it to Storage Manager? What do you think?
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
but is there any other way to get env vars / any value or secret from the host to the docker of a task?
if this is docker -e/--env as argument would do the same-e VAR=somevalue
That's a very neat solution! maybe there's a way to inject "Task.init" into the code through a plugin, or worst case push it into some internal base package, and only call it when the code is orchestrated automatically (usually there is a an environment variable that is set to signal that, like CI_something )
Mmm well, I can think of a pipeline that could save its state in the instant before the error occurred.
This is already the case, if you clone the pipeline Task change the Args/_continue_pipeline_ to True and enqueue
LazyLeopard18 well done on locating the issue.
Yes Docker on Windows is a bit flacky...
PleasantGiraffe85
it took the repo from the cache. When I delete the cache, it can't get the repo any longer.
what error are you getting ? (are we talking about the internal repo)
will my datasets be stored on the same machine that hosts the clearml server?
By default yes, they will be stored to the files-server (but you can change it, this is an argument for both the CLI and the python interface)
Thanks ShallowCat10 !
I'll make sure we fix it 🙂
If you create an initial code base maybe we can merge it?
Notice the args will be set on the connect call, so the check on whether they are empty should come after
can I mount the s3 bucket as file system on place where
you need to mount it where the file server is storing it's files, correct (notice, not the DBs, just the files server)
Hi DepressedChimpanzee34
if you try to extend it more then the width of the column to the right, it doesn't do anything..
You mean outside of the window? or are you saying you cannot extend it?
Just verifying, we are talking about the latest version of clearml-server ?
Hi CleanPigeon16
You need to pass the private repository docker credentials to the aws instance, I would use the custom bash script option of the aws autoscaler to create the docker credentials file.
That's not possible, right?
That's actually what the "start_locally" does, but the missing part is starting it on another machine without the agent (I mean it totally doable, and if important I can explain how, but this is probably not what you are after)
I really need to have a dummy experiment pre-made and have the agent clone the code, set up the env and run everything?
The agent caches everything, and actually can also just skip installing the env entirely. which would mean ...
I see, let me check the code and get back to you, this seems indeed like an issue with the Triton configuration in the model monitoring scenario.
BattyLizard6 to my knowledge the main issue with fractional GPU, is there is no real restriction on GPU memory allocation (with the exception of MIG slices, which is limited in other ways).
Basically one process/container can consume the maximum GPU ram on the allocated card (this also includes http://run.ai fractional solution, at least from what I understand).
This means that developer A can allocate memory so that developer B on the same GPU will start getting out-of-memory
(Notice in a...
And other question is clearml-serving ready for serious use?
Define serious use? KFserving support is in the pipeline, if that helps.
Notice that clearml-serving is basically a control plane for the serving engine, not to neglect the importance of it, the heavy lifting is done by Triton 🙂 (or any other backend we will integrate with, maybe Seldon)
I could take a look and figure that out.
This will greatly accelerate integration 😉
After removing the task.connect lines, it encountered another error related to 'einops' that is not recognized. It does exist on my environment file but was not installed by the agent (according to what I see on 'Summary - installed python packages'. should I add this manually?
Yes, I'm assuming this is a derivative package that is needed by one of your packages?
Task.add_requirements("einops")
task = Task.init(...)
I mean you can run it with kubeflow, but it kind of ruins the auto detection there
You can however clone and manually edit it back to your code, that would work
BTW:
Task.add_requirements('tensorflow', '2.2') will make sure you get the specified version 🙂
We used subprocess for it, ...
Popen? os.system? fork?
I think I found something, let me dig deeper 🙂
Hi JitteryCoyote63
Is this close ?
https://github.com/allegroai/clearml/issues/283