Reputation
Badges 1
25 × Eureka!Thanks FrothyShark37
I just verified, this would work as well, I suspect what was missing is the plt.show
call, this is the actual call that triggers clearml
Hi @<1607909176359522304:profile|UnevenCow76>
followed the below documentation to implement the clearml monitoring using prometheus and grafana
Did you try following this example, it includes both deploying a model and adding grafana metrics:
None
Hi ShinyWhale52
Every execution of the pipeline (by definition) will create a new job based on the pipeline steps
This is the reason you see all the steps twice (the default assumption is you wish to re-run the step, as this is part of the processing workflow (e.g. training a model)
the model has been overwritten. I guess this is due to this instruction:
This is because you are storing it locally to the same path, it just reflects the fact you just overwrote your model.
To create a...
@<1523715429694967808:profile|ThickCrow29> this is odd... how did you create the pipeline? can you provide code sample?
Hmm I guess that now that you mention it, not that obvious when I'm on a Mac as well, maybe we should have the archive button at the bottom as well..
SteadyFox10 What do you think?
Yeah I can write a script to transfer it over, I was just wondering if there was a built in feature.
unfortunately no 😞
Maybe if you have a script we can put it somewhere?
LazyLeopard18 are you using the StorageManager to access azure:// links?
Try to manually edit the "Installed Packages" (right click the Task, select "reset", now you can edit the section)
and change it to :-e git+ssh@github.com:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package
(assuming " pip install -e
mailto:git+ssh@github.com :user/...
" will work, should solve the issue )
Using agent v1.01r1 in k8s glue.
I think a fix was recently committed, let me check it
But from the log it seems that:
you are not running as root in the docker? Python3.8 is installed (and not python 3.6 as before)
Hi FunnyTurkey96
Any chance you can try to run with the latest form GitHub (i just tested your code and it seemed to work on my machine).pip install git+
and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
Oh ...
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...
Hi VexedCat68
txt file or pkl file?
If this is a string , it just stored it (not as a file, this is considered a "link")
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/clearml/binding/artifacts.py#L521
Hi JitteryCoyote63 ,
upload_artifacts was designed to upload pre made artifacts, which actually covers everything.
With register_artifacts we tried to have something that will constantly log PD artifact, the use case was examples used for training and their order, so we could compare the execution of two different experiments and detect dataset contamination etc.
Not Sure it is actually useful though ...
Retrieving an artifact from a Task is done by:
` Task.get_task(task_id='aaa').artifact...
Specifically your error seems to be an issue with nvidia Triton container upgrade
Hi @<1557899668485050368:profile|FantasticSquid9>
There is some backwards compatibility issue with 1.2 (I think).
Basically what you need it to spin a new one on a new session ID and rergister the endpoints
Hi @<1576381444509405184:profile|ManiacalLizard2>
Yeah that should work, assuming credentials are set in your clearml.conf
Hi @<1523702000586330112:profile|FierceHamster54>
Nope 🙂 nothing to worry about.
That said do notice the open-source file-server is not secure, this does not mean it will spill data on the server, but it does mean that you should probably put it behind a VPN or use S3/GCP/Azure if this is open to the public internet
I "think" I have a clue on the issue that is lost here in the translation:
Specifically to me it all comes down to the definition of "pipeline"
From the clearml perspective:
Manual Task - code that is executed by the user (or any other mechanism Outside of the agent)
Remote Task - code that is executed by the Agent
Pipeline is a Task
Pipeline can be "manual task" but also "remote task"
Pipeline generates "remote tasks"
Task status (e.g. pipeline status as it is also a Task) can be: draft, a...
Hi @<1523702932069945344:profile|CheerfulGorilla72>
Please tell me what RAM metric is tracked by ClearML?
Free RAM is the entire machine free RAM
Yeah htop shows odd numbers as it doesn't "count" allocated buffers
specifically you can see the code here:
None
ImmensePenguin78 this is probably for a different python version ...
It is http btw, i don't know why it logged https://
This is odd could it be it automatically forwards to https ?
I would try the certificate check thing first
Oh, so is it a bug and you should have seen two series on each graph? (I think it is... not sure how to actually name the second instance other than running number)
But the artifacts and my dataset of my old experiments still use the old adress for the download ( is there a way to change that ) ?
MotionlessCoral18 the old artifacts are stored with direct links, hence the issue, as SweetBadger76 noted you might be able to replace the links directly inside the backend databases
Hi AbruptHedgehog21
How i can add S3 credentials to S3 bucket in example.env for clearml-serving-triton? I need to add bucket name, keys and endpoint
Basically boto (s3) environment variables would just work:
https://clear.ml/docs/latest/docs/clearml_serving/clearml_serving#advanced-setup---s3gsazure-access-optional
Actually you cannot breakpoint at "atexit" calls (or at least doesn't work with my gdb)
But I would add a few prints here:
https://github.com/allegroai/clearml/blob/aa4e5ea7454e8f15b99bb2c77c4599fac2373c9d/clearml/task.py#L3166