
Reputation
Badges 1
25 × Eureka!JitteryCoyote63
I am setting up a new machine with two rtx 3070 GPU
Nice! you are one of the lucky few who managed to buy them π
Which makes me think that the wrong torch package is installed
I think that torch 1.3.1 is does not support cuda 11 π
Hi @<1523711619815706624:profile|StrangePelican34>
if I am trying to deploy 100 models on a GPU that can handle 5 concurrently,
Main limitation is Triton's ability to dynamically load / unload models. We know Nvidia is adding this capability, but I think this is still not out, once they support it, it should be transparent
See here:
https://download.pytorch.org/whl/torch_stable.html
cu110/* has no torch 1.3.1 only 1.7.0
I think it fails because it tries to install trains twice. Could you remove the trains package, and test? I'm also curious how do you have both installed?!
Can you try to run the example code, see if that works for you?
yes, or (because I deployed clearml using helm in kubernetes) from the same machine, but multiple pods (tasks).
Oh now I see, long story short, no π the correct way of doing that is every node/pod creates it's own dataset,
then when you are done, you create a new version with the X datasets that you created as parents, the newly created version is just "meta" it basically tells the system how to combine the previously generated datasets (i.e. no data is actually re-uploa...
Hi @<1661542579272945664:profile|SaltySpider22> I'm not sure I understand the answer to my parallel quesion
You described getting a secret key pair from the UI and feeding it back into the compose file. Does this mean it's not possible to seed the secrets in the compose file, starting from clean state? If so, that would explain why I can't get it to work.
Long story short, no. This would basically mean you have a pre-build credentials in the docker, this sounds dangerous π
I'm not sure I'm following the use case here, what exactly are we trying to do?
(or maybe I missed something here?)
Hi ExasperatedCrocodile76
It seems like it is using conda package manager, were you using conda when you run the code manually ?ERROR: This cross-compiler package contains no program /home/ivan/miniconda3/envs/clearML/bin/x86_64-conda_cos6-linux-gnu-gfortran
Why is it trying to install from source code?
BTW: can you test with the latest agent RC? ( pip install clearml-agent==1.4.0rc4
)
yes, I do, I added a
auxiliary_cfg
and I saw it immediately both in CLI and in the web ui
How many Tasks do you see in the UI in DevOps project with the system Tag SERVING-CONTROL-PLANE
?
TBH our Preprocess class has an import in it that points to a file that is not part of the preprocess.py so I have no idea how you think this can work.
ConvolutedSealion94 actually you can add an entire folder as preprocessing, including multiple files
See example des...
When you are running the base-task, are you proving any arguments to it?
Can you share the "execution" Tab? and the Args tab of the base-task ?
Okay great, so we do have the Args section there.
What do you have in the "Execution" tab?
I mean you can run it with kubeflow, but it kind of ruins the auto detection there
You can however clone and manually edit it back to your code, that would work
This part is odd:SCRIPT PATH: tmp.7dSvBcyI7m
How did you end with this random filename? how are you running this code?
Notice the args will be set on the connect
call, so the check on whether they are empty should come after
The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it
I think this is the main issue, how come it does not catch it? Are you using argparser ?
Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?
Hi RoughTiger69
Is the pipeline in question based on decorators or is it based on existing Tasks?
It would be nice to have some documentation proclaiming how randomness behaves when running tasks (in all their variations). E.g. Should I trust seeds to be reset or should I not assume anything and do my own control over seeds.
That is a good point, I'll make sure we mention it somewhere in the docs. Any thoughts on where?
Decorators are good π
Something along the lines of
` @PipelineDecorator.pipeline(...)
def pipeline(skip_a=False):
if not skip_a:
a = step_a()
else:
# somehow get a previous A?
# let's call it cached A
a = "replace with real'
step_b(a)
... `Is this the gist?
If it is, this looks like, "how can I control whether A is cached or not", is that correct?
No I mean configure the files_server
in the clearml.conf
I hope you can do this without containers.
I think you should be fine, the only caveat is CUDA drivers, nothing we can do about that ...
Do you need to control the cuda drivers ?
Awesome ! thank you so much!
1.0.2 will be out in an hour
Hi JealousParrot68
clearml tracking of experiments run through kedro (similar to tracking with mlflow)
That's definitely very easy, I'm still not sure how Kedro scales on clusters. From what I saw, and I might have missed it, it seems more like a single instance with sub-processes, but no real ability to setup diff environment for the diff steps in the pipeline, is this correct ?
I think the challenge here is to pick the right abstraction matching. E.g. should a node in kedro (w...
And your ~/clearml,conf ?
In Windows settingΒ
system_site_packages
Β toΒ
true
Β allowed all stages in pipeline to start - but doesn't work in Lunux.
Notice that it will inherit from the system packages not the venv the agent is installed in
I've deleted tfrecords from master branch and commit the removal, and set the folder for tfrecords to be ignored in .gitignore. Trying to find, which changes are considered to be uncommited.
you can run git diff
it is essentially...
Okay Now I get it!
Let me think about it for an hour or two π