Reputation
Badges 1
25 × Eureka!So you mean 1.3.1 should fix this bug?
Yes it should see the release notes, there are a few "disappearing" UI fixes:
https://github.com/allegroai/clearml-server/releases/tag/v1.3.0
RoughTiger69 yes I think "Scale" tier covers it π
JitteryCoyote63 sure, this is how it was designed to work π
Hi DeliciousKoala34
Happened when cloning and running a task on an agent on a different machine. I
sounds like torch internal issue, can you send the full log of the remote Task ?
I still have name
my_name
, but the project name
my_project/.datasets/my_name
rather than
my_project/.datasets
Yes, this is the expected behavior
And I don't see any new projects / subprojects where that dataset creation Task is stored
They are marked "hidden" hence by default you cannot see them in the UI (so they will only appear in the Dataset page),
you can turn the UI hidden flag by going to your settings page and selecting "Con...
I think I was not able to fully express my point. Let me try again.
When you are running the pipeline Fully locally (both logic and components) the assumption is this is for debugging purposes.
This means that the code of each component is locally available, could that be a reason?
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?
No -- that section is blank,
This is the main issue, it should be filled with the requirement being auto detected.
The entire script was executed from within vscode, and the Task was created but it was not prefilled with anything ?
Just making sure, you called Task.init inside your code ?
Hi Team,Can i clone experiment shared by some one, via link?
You mean someone that is not in your workspace ? (I'm assuming app.clear.ml ?)
Hi @<1566596960691949568:profile|UpsetWalrus59>
All correct with the exception of " ...or 1GB Metric" this is a limit, since metrics (and meta data) is always stored on the clearml-server, so it is metered. There is also an API limit, basically anti abuse, which of course resets every month, but if you are running tens of experiments at the same time you will hit this limit. Make sense ?
Hi JitteryCoyote63
could you check if the problem exists in the latest RC?pip install clearml==1.0.4rc1
your account has 2FA enabled and you must use a personal access token instead of a password.I'm assuming you have created the personal access token and used it, not the pass
DrabOwl94 how many 1M files did you end up having ?
Thanks @<1523702652678967296:profile|DeliciousKoala34> I think I know what the issue is!
The container has 1.3.0a and you need 1.3.0 this is why it is re-downloading (I'll make sure the agent can sort it out, becuase this is Nvidia's version in reality it should be a perfect match)
We are planning an RC later this week, I'll make sure this fix is part of it
Hi ComfortableHorse5
Yes this is more of a suggestion that you should write them using the platform capabilities, the UI implementation is being worked on, as well as a few helpers classes, I thin you'll be able to see a few in the next release π
Great!
I'll make sure the agent outputs the proper error π
Hi RobustRat47
My guess is it's something from the converting PyTorch code to TorchScript. I'm getting this error when trying the
I think you are correct see here:
https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/examples/pytorch/train_pytorch_mnist.py#L136
you have to convert the model to TorchScript for Triton to serve it
I was using clearml == 0.17.5 and I also had this issue
I think it was introduced when we moved to subprocess reporting, with 0.17.5
You can disable it with the following in clearml.conf:sdk.development.report_use_subprocess = false
Correct π
You can spin it in two modes, either venv or docker (notice that even in docker mode, it will still clone the code into the docker and install the packages inside the docker, but it also inherits from the docker preinstalled system packages, so that the installation process is a lot faster, but you have the ability to change packages without having to build an entire new docker image)
So what youβre saying is to first kick off a new run and then rename the underlying Pipeline Task, which will cause that particular run to become a new pipeline name?
Correct, basically you are not changing the "pipeline" per-se but the execution name of the pipeline, if that makes sense
What would be most ideal would be to be able to right-click on a pipeline run and have a βcloneβ option, like you can with a task, where you can start a new run with a new name in a single step.
...
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
Will do π give me 5
the parent task ids is what I originally wanted, remember?
ohh I missed it π
JitteryCoyote63 could you test the latest RC πpip install clearml-agent==0.17.2rc4
Hi GreasyPenguin66
So the way clearml can store your notebook is by using the jupyter-notebook rest api. It assumes, that it can communicate with it as the kernel is running on the same machine. What exactly is the setup? is the jupyter-lab/notebook running inside the docker? maybe the docker itself is running with some --network argument ?
Hi ObnoxiousStork61
Is it possible to report ie. validation scalars but shifted by 1/2 iteration?
No π these are integers
What's the reason for the shift?
I'm also curious π