AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hello, I Have A Question Regarding Creating A Clearml Pipeline Using Pytorch Lightning. I Am Not Really Sure Where To Begin. Should I Create A Task For Each Pytorch Lightning Class In My Pipeline? Is There A Demo Or Clearml Project That Specifically Uses

Hi @<1547028031053238272:profile|MassiveGoldfish6>
What is the use case? the gist is you want each component to be running on a different machine. and you want to have clearml do the routing of data and logic between.
How would that work in your use case?

one year ago

0 Good Morning Folks, I Am Setting Up Clearml On A (Self-Hosted) K8S Cluster Using The

SarcasticSquirrel56

if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?

Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?

EDIT:

I guess to achieve what I w...

2 years ago

0 Hi, We Saw 2 More Issues With Images Logging: 1. You Need To Use Tf.Summary.Image And Not Summary_Ops_V2.Image 2. Image Needs To Be In Range [0, 1] And Not [0, 255] (Matplotlib And Tensorboard Can Handle Either One) Is That Expected?

So it seems to get the "hint" from the type:
This will work
tf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)wdyt, should it actually check min/max and manually cast it ?

3 years ago

0 Can Anyone Recommend Some Good Ai Deployment Frameworks For Kubernetes? (Better If They Have/Can Be Integrated With Clearml)

So good news (1) Dashboard is being worked on as we speak. (2) we released clearml-serving doing exactly that, the next release of clearml-serving will include integration with kfserving (under the hood) essentially managing the serving endpoints on top of the k8s cluster , wdyt?

3 years ago

0 I’M

yes 🙂

one year ago

0 How Can I Run A New Version Of A Pipeline, Wait For It To Finish And Then Check Its Completion/Failure Status? I Want To Kick Off The Pipeline And Then Check Completion

Hi @<1523701079223570432:profile|ReassuredOwl55>

I want to kick off the pipeline and then check completion

outside

of the pipeline task. (edited)

Basically the pipeline is a Task (of a certain type).
You do the "standard" thing, you clone the pipeline Task, you enqueue it, and you wait for it's status

task = Task.clone(source_task="<pipeline ID here>")
Task.enqueue(task, queue_name=services)
task.wait_for_status(...)

wdyt?

one year ago

0 Colors Of Cm Reporting Are Strange... Is It Possible To Adjust The Default Ones

Could you maybe send a screenshot? This is very strange? Also what's the trains version?

4 years ago

0 Hi Everyone! I Am Using Clearml-Serving When I Am Trying To Add New Endpoint Like This

I'm happy tp hear you found a work around
Seems like there is something wrong with the way the pbtxt is being merged, but I need some more information

{'detail': "Error processing request: object of type 'NoneType' has no len()"}

Where are you seeing this error?
What are you seeing in the docker-compose log.

one year ago

0 Hi All, I Have An Issue With The Way Hyper Parameters Are Logged Under Configuration, The Values That Are Stored Seem To Add Unnecessary Escape Characters To The Original Values.. Is It A Known Issue? Is There A Way To Change It? Thanks

this topic is about the issue with reporting a configuration with a string inside a tuple that has backslash

So the encoding itself is done YAML style, and based on your example \b Has to be encoded to \b because this is string encoding, like \n will become "new line"
Make sense ?

3 years ago

0 Does The New 2.0 Helm Charts (App Ver 1.1.0) Not Support Nfs?

I think this is the discussion you are after:
https://clearml.slack.com/archives/C01H5VAUZ8R/p1612452197004900?thread_ts=1612273112.002400&cid=C01H5VAUZ8R

3 years ago

0 I’M

@<1541954607595393024:profile|BattyCrocodile47> you mean like environment variables?

one year ago

0 Hello Folks! We Have Started Using Clearml In Kubernetes. The Trainings Are Run In K8S With Help Of K8Sintegration And Some Custom Coding. Now For The Clearml-Session Tasks, A Port-Forward Should Be Done Each Time If I Need To Access The Jupyter Notebook

Hi DisgustedDove53

Now for the clearml-session tasks, a port-forward should be done each time if I need to access the Jupyter notebook UI for example.

So basically this is why the k8s glue has --ports-mode.
Essentially you setup a k8s service (doing the ingest TCP ports) then the template.yaml that is used by the k8s glue should specify said service. Then the clearml-session knows how to access the actual pod, by a the parameters the k8s glue sets on the Task.
Make sense ?

3 years ago

0 Hi Everyone! I Have A Short Question That You Can For Sure Help Me With. Is There A Way To Avoid Each Task To Create A New Environment? I'D Like To Specify Which Env To Use. I Tried With

ERROR: Could not install packages due to an EnvironmentError: 
[Errno 28] No space left on device

BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error

2 years ago

0 Hello All, I Have A Question Regarding Showing Of Debug Samples Within An On-Prem Clearml Instance. I Am Logging Debug Images Via Tensorboard (Via

ZanyPig66 is this reproducible? This sounds like a bug, whats the TB version and OS you rae using?
Is this example working for you (i.e. you see debug images)
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py

one year ago

0 Hi, I Tried To Setup Clearml Serving And Ran The Example Given

Can you post here the docker-compose.yml you are spinning? Maybe it is the wring one?
Step 4 here:
https://github.com/thepycoder/asteroid_example#deployment-phase

2 years ago

0 I .

Correct

3 years ago

0 Is There An Elegant Way To Download All Images Posted In “Debug_Samples” From The Trains Server?

TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?

3 years ago

0 Trying To Setup A Trains-Agent Worker On A Remote Machine; When I Run Trains-Init And Follow The Steps To Give It Credentials For Our Trains Server I Get This

corporate firewall... let's start with http 🙂

3 years ago

0 Hi, I Noted That Clearml-Serving Does Not Support Spacy Models Out Of The Box And That Clearml-Serving Only Supports Following;

Hi SubstantialElk6

noted that clearml-serving does not support Spacy models out of the box and

So this is a good point.

To add any pissing package to the preprocessing docker you can just add them in the following environment variable here: https://github.com/allegroai/clearml-serving/blob/d15bfcade54c7bdd8f3765408adc480d5ceb4b45/docker/docker-compose.yml#L83
EXTRA_PYTHON_PACKAGES="spacy>1"
Regrading a custom engine, basically this is supported with --engine custom
you c...

2 years ago

0 Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

Multi-threaded multi-processes multi-nodes 🙂

3 years ago

0 Hi, When Migrating From The Clearml Server To A Self Hosted Server Is There A Way To Transfer All The Data/Training Tasks Between Them?

Hi @<1523701295830011904:profile|CluelessFlamingo93>
What do you mean? what's the difference between ClearML server and self hosted? both are self hosted no?

one year ago

0 Does Clearml Have A Testing Api? I'M Setting Up Stack To Enque Work With Clearml. Is There A Way I Can Simulate Queue And Worker Execution?

Hi @<1535069219354316800:profile|PerplexedRaccoon19>
What do you mean by simulate?
You can manually setup and run a Task if you need,
'clearml-agent execute --id task_id' add --docker for docker mode.
This will setup the env and run the task

one year ago

0 Hi, We Would Like To Make A Copy Of The Base Public Ami

Yes the clearml-server AMI - we want to be able to back it up and encrypt it on our account

I think the easiest and safest way for you is to actually have full control over the AMI, and recreate once from scratch.
Basically any ubuntu/centos + docker and docker-compose should do the trick, wdyt ?

3 years ago

0 Hello, We Have A Self Hosted Clearml Server Connected To Different Queues And Use It To Launch Remote Experiments (Clearml==1.9.3, Clearml-Agent==1.5.2Rc0). It Is Working Really Well For Us Unless One Workflow :) We Would Like To Abort An Experiment And E

Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?

We are able to

set_initial_iteration

to 0 but not

get_last_iteration

.

Are you saying that if your code looks like:

Task.set_initial_iteration(0)
task = Task.init(...)

and you abort and re-enqueue, you still have a gap in the scalars ?

one year ago

0 Hello. I Have A Question Regarding Jupyter Notebook Execution With Clearml Worker. Long Story Short: I Turn A Remote Machine Into Worker With Clearml-Agent. Then I Made A Task With The Local Machine By Executing Notebook Cells, Which Imports A Training M

Hi @<1555362936292118528:profile|AdventurousElephant3>
I think your issue is that Task supports two types of code,

single script/jupyter notebook
git repo + git diffIn your example (If I understand correctly) you have a notebook calling another notebook, which means the first notebook will be stored on the Task, but the second notebook (not being part of a repository) will not be stored on the task, and this is why when the agent is running the code it fails to find the second notebook....

one year ago

0 How Can I Ensure Tasks In A Pipeline Have The Same Environment As The Pipeline Itself? It Seems A Bit Counter-Intuitive That The Pipeline (Executed Remotely) Captures The Local Environment, But The Tasks (Executed Remotely) Do Not Use That Same Environmen

Then the type hints are not removed from helper and the code immediately crashes when being run

Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved

one year ago

0 Thanks For Releasing This Awesome Experiment Manager! I Was Logging A Single Training Session On Multiple Gpus (Using Detectron2), And Torch.Mp Is Called For Each Gpu. This Creates A Separate Task In Trains For Each Gpu, And Only One Of The Tasks Has The

Hi VexedKangaroo32 , funny enough this is one of the fixes we will be releasing soon. There is a release scheduled for later this week, right after that I'll put here a link to an RC containing a fix to this exact issue.

4 years ago

0 Hi

send the agent's logs to log management and monitoring service,

These are stored into ELK, it was built to store large amounts of logs, I cannot see any reason why one would want to remove it?

Maybe if there would be a way to change their format, it could also help filtering them from my side.

You mean in the UI?

one year ago

0 The

Do you think this is better ? (the API documentation is coming directly from the python doc-string, so the code will always have the latest documentation)
https://github.com/allegroai/clearml/blob/c58e8a4c6a1294f8acec6ed9cba81c3b91aa2abd/clearml/datasets/dataset.py#L633

3 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

https://github.com/allegroai/trains/blob/master/docs/trains.conf#L47

4 years ago

Show more results