AgitatedDove14

49 Questions, 8124 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Questions 49
Answers 8124

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Omg Look Who Just Joined The Pytorch Ecosystem

OMG Look who just joined the PyTorch EcoSystem None Yes! it is TRAINS 🚆 🎉 🎈

clearml

5 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi :robot_face: , humans We have the new documentation site up and running 🎉 None 🎊 This is still a work in progress, so we keep the previous version alive...

clearml

4 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Well To Be Honest, We Kind Of Thought It'S Redundant. Basically Storing Artifacts In Experiments And Having Them Retrieved Quickly From The Code Itself Was Way More Convenient For Us Then To Manually Have To Do Clone/Pull Of The Data... Example: Create Da

Well to be honest, we kind of thought it's redundant. Basically storing artifacts in experiments and having them retrieved quickly from the code itself was w...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Gals, Guys &

Gals, Guys & :robot_face: , if you want to checkout the Hyper-Parameters automation (Using Bayesian Optimization Hyper-Band) We have an example on the demo s...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

<!here> New video is out :slightly_smiling_face: Cloud Autoscalers are awesome <https://www.youtube.com/watch?v=j4XVMAaUt3E>

New video is out 🙂 Cloud Autoscalers are awesome https://www.youtube.com/watch?v=j4XVMAaUt3E

clearml

3 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Is You Server Using Https ?!

Is you server using https ?!

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi ClearML v0.17.1 and ClearML-Agent v0.17.0 are now the official packages & repositories 🎉 🎊 👋 🛤️ This new name brings on many changes, mainly replace a...

clearml

4 years ago

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Lstmeow Is Back! Bots/Gals/Guys Feel Free To

LSTMeow is back! Bots/Gals/Guys feel free to 👍 None

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Guys! I Have Great News, We Finally Fully Implemented Support For Continuing Previously Trained Models

Hi Guys! I have great news, we finally fully implemented support for continuing previously trained models 🎉 Here is a quick example (this is torch, but any ...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Quick Note: V1.3.1 Caused Pipelinedecorator Tasks To By Default Disable The Automagic Frameworks Connection, This Bug Is Solved In The Latest Rc

Quick note: v1.3.1 caused PipelineDecorator Tasks to by default disable the automagic frameworks connection, this bug is solved in the latest RC pip install ...

clearml

3 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

<!channel> *important notice* : it seems Nvidia broke some of their PPA's security :confused: , causing `apt-get updates` to fail inside containers. This in term will cause `clearml-agent` to fail with specific Nvidia containers. _If you are seeing simila

important notice : it seems Nvidia broke some of their PPA's security 😕 , causing apt-get updates to fail inside containers. This in term will cause clearml...

clearml

3 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi Gals / :robot_face: / Guys <!here> Quick update, we will be updating the GitHub repository tomorrow with the new ClearML version, together with the accompanying python packages, ETA Noon(ish) PT time. `trains` , `trains-agent` and the docker images a

Hi Gals / :robot_face: / Guys Quick update, we will be updating the GitHub repository tomorrow with the new ClearML version, together with the accompanying p...

clearml

4 years ago

0 Votes

7 Answers

1K Views

0 Votes 7 Answers 1K Views

Thank You All For Taking The Time To Answer Our Survey (If You Haven'T Already, We Urge You To

Thank you all for taking the time to answer our survey (If you haven't already, we urge you to do so ). Your feedback has a major impact on what we build, do...

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

YEY!!!! *Download as CSV* :exploding_head:

YEY!!!! Download as CSV 🤯

clearml

3 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Lol, I Wonder What The Adblock Rule Was ;)

Lol, I wonder what the adblock rule was ;)

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

docs are up

clearml

5 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

We Recently Released A New Version Of

we recently released a new version of clearml-session with Persistent Workspace support! 🚀 🎉 Finally you can develop on remote machines with workspace fold...

remote-ssh

one year ago

Show more results

0 After Trying To Execute A Task From The Queue The Agent Fails Installing The Environment:

That is odd, can you send the full Task log? (Maybe some oddity with conda/pip ?!)

3 years ago

0 When I Tried To Create A Clearml Serving Inference Endpoint For Yolov8, I Received The Following Error:

This line 🙂
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it

2 years ago

0 Here I Am Again... Can'T Find How To Create A Custom Queue

trains-agent RC (which they tell me will be out tomorrow) will have a switch to do that, just so it is easier 🙂

5 years ago

0 Hello, I Am Using Clearml In Docker Mode. I Have A Simple Script That Runs Locally, Runs On The Target Machine Running The Same Tensorflow Container, But Doesn'T Run When I Deploy It Using Clearml. Here'S The Log Of The Error:

TroubledHedgehog16

but doesn't run when I deploy it using clearml. Here's the log of the error:

...

My guess is that clearml is reimporting keras somewhere, leading to circular dependencies.

It might not be circular, but I would guess it does have something to do with order of imports. I'm trying to figure out what would be the difference between local run and using an agent
Is it the exact same TF version?

2 years ago

0 I Am Getting This Specific Message When Trying To Run Hyper Parameters Optimization (Running Remotely My Task). Does It Affect My Flow? Do I Have Something To Worry About?

Although I didn't understand why you mentioned

torch

in my case?

Just a guess 🙂 other frameworks do multi-process as well,

I would guess it relates to parallelization of Tasks execution of the

HyperParameterOptimizer

class?

Yes that might be it, it's basically by product of using python "Process" class for multiprocessing. we are working on a fix, not a trivial unfortunately

3 years ago

0 Hello Dear Community! Do You Also Experience That Plots And Scalars Are Not Visible On Clear.Ml, Whenever

However I'm quite confident, that plots and scalars are not visible online only when 'git diff to large to store' appears.

These should be unrelated, are you seeing console outputs ?

3 years ago

0 Hello! I'M Running Clearml-Server On Kubernetes, And It Seems My Models Are Not Really Saved. I See That Doing Task.Init(Output_Uri=True) Should Send Models To Fileserver. The Models Are Visible In The Ui But The Download Button Is Greyed Out And When I D

but i still think the same should be possible using the Task.init

This is the part the I find confusing:
Task.init(..., output_uri=True) is working for me, what is that setup that caused this line to "fail"?

3 years ago

0 Is There A Reason

Now that we have the free tier (a.k.a community server) we might change the default behavior.
The idea is always to allow an easy way to on-board and test the system.
ReassuredTiger98
BTW: what's the scenario where your machine reverted to the default configuration (i.e. no configuration file) ?

4 years ago

0 Hi Guys, I Am Trying To Upload And Serve A Pre-Existing 3-Rdparty Pytorch Model Inside My Clearml Cluster. However, After Proceeding With The Suggested Sequence Of Operations By Official Docs And Later Even Gpt O3, I Am Having Errors Which I Cannot Solve.

Hmm I just noticed:

'--rm', '', 'bash'

This is odd this is an extra argument passed as "empty text" how did that end up there? could it be you did not provide any docker image or default docker container?

6 months ago

0 Hi, I Am Trying To Execeute My Code On Nvidia/Cuda Docker, But It Keeps Running, It Is Not Failed Or Not Aborted. The Last Log Message Is

MysteriousBee56 Okay, let's try this one:
docker run -t --rm nvidia/cuda:10.1-base-ubuntu18.04 bash -c "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean && apt-get update && apt-get install -y git python3-pip && python3 -m pip install trains-agent && echo done"

5 years ago

0 One More Follow-Up Still; We'Re Trying To Run Non-Gpu Scaler, And I'Ve Finally Sorted Out Subnet And Security Groups Issues, Only To Run Into This:

hmm this might help:
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables
basically you might be able to define:
PIP_NO_USE_PEP517=1

3 years ago

0 How Can I Modify The Line Executed By The Agent At The Beginning

MotionlessCoral18 so did it solve the issue ?

3 years ago

0 After Trying To Execute A Task From The Queue The Agent Fails Installing The Environment:

ERROR: torch-1.12.0+cu102-cp38-cp38-linux_x86_64.whl is not a supported wheel on this platform
TartBear70 could it be you are running on a new Mac M1/2 ?

Also quick question, any chance you can test with the latest RC?
pip3 install clearml-agent==1.3.1rc6

3 years ago

0 Hi, I Have A Problem With "Dataset" Module. I Create Dataset And Uploaded Few Files:

Hmm this is odd, when you press on the parent dataset in the UI, and go to full-details, then the INFO tab. Can you copy here everything ?

3 years ago

0 Hi Everyone! I Have A Short Question That You Can For Sure Help Me With. Is There A Way To Avoid Each Task To Create A New Environment? I'D Like To Specify Which Env To Use. I Tried With

ERROR: Could not install packages due to an EnvironmentError: 
[Errno 28] No space left on device

BTW: @<1523703080200179712:profile|NastySeahorse61> this sounds like docker out of space on the Main disk '/var/` where it stores all the images and temp file systems
This will cause you code to fail as any runtime change to the container file system will raise this out of disk space error

3 years ago

0 Hello Everyone! I'D Like To Ask Clearml Devs And Maintainers If Adding Support For

Hi @<1846360404628869120:profile|HelpfulBadger74>
Is pixi a drop in replacement for pip? is it like UV?

2 months ago

0 More Clarification On Documentation (Clearml Data):

Hi UnevenDolphin73

This differentiable storage - does it only work on file additions/removal, or also on intra-file changes?

This is on a file level, meaning you change a single byte in the file, the entire file will be packaged in the new version.
Make sense ?

3 years ago

0 Hi, I Noted That Clearml-Serving Does Not Support Spacy Models Out Of The Box And That Clearml-Serving Only Supports Following;

These are maybe good features to include in ClearML:

or

.

Sure, we should probably add a section into the doc explaining how to do that

Other approach is creating my own API on the top of clearml-serving endpoints and there I control each tenant authentication.

I have to admit that to me this is a much better solution (then my/bento integrated JWT option). Generally speaking I think this is the best approach, it separates authentication layer from execution ...

3 years ago

0 Hello All, I'M Trying To Figure Out How Can I Log Outputs With Pytorch Lightning. I Used Tensorboard As Clearml Claims To Auto-Capture Tensorboard Outputs, But It Was A No Go.

Hi ZanyPig66

I used tensorboard as clearml claims to auto-capture tensorboard outputs, but it was a no go.

The auto TB logging should work out of the box, where is it failing ?

Also,
task = Task.current_task()Why aren't you using Task.init in the original script?
The idea is that you run your code on your machine (where the environment works), ClearML auto detects code + python packages + args etc.
Then you clone it in the UI and launch it on a remote machine.
What am I missing ...

3 years ago

0 My Nth Question For The Day

Is there a way to do this all elegantly?

Of yes there is, this is how TaskB code will look:

` task = Task.init(..., 'task b')
param = {'TaskA' :'TaskAs ID HERE'}
task.connect(param)
taska_model = Task.get_task(param['TaskA']).models['output''][-1]
torch.load(taska_model.get_local_copy())

train

torch.save('modelb') `I might have missed something there, but generally speaking this will let you:
Select TASKA as a parameter of TaskB training process Will register automagically Tasks'A...

4 years ago

0 Is There A Reason

throw an error when running without

clearml.conf

which tells the user to run clearml-init first?

I would like potential users to be able to just run the example code and get the experience, or even integrate with their code, without the need to run a single configuration
(Basically to alleviate as many potential hurdles from getting users on board clearml)

4 years ago

0 After Trying To Execute A Task From The Queue The Agent Fails Installing The Environment:

Could not find a version that satisfies the requirement open3d==0.15.2 .. from versions: 0.10.0.0, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.13.0)

This points to the agent installing using a different python version that you run the original code, I would guess python3.6

3 years ago

0 Hey Guys, I Am Setting Up A New Machine With Two Rtx 3070 Gpus Where I Created Two Agents (One For Each Gpu). On Both Agents, My Experiments Fail With Error:

JitteryCoyote63 any chance you have a log of the failed torch 1.7.0 ?

4 years ago

0 I’M Trying To Use Minio With Clearml As A External Storage. I Am Having Problems With The Configuration File For The Clearml Client When I Use The Output_Uri Parameter Of Task.Init What Do I Put There? I Am Currently Doing Task.Init(… Output_Uri=“S3://I

with ?

                     multipart: false
                     secure: false

If so, can you post here your aws.s3 section of the clearml.conf? (of course replacing the actual sensitive information with *s)

2 years ago

0 Port Remapping Of The Webserver Is Not Supported (Documentation Only Mentions

Hi DefeatedCrab47
You should be able to change the Web server port , but API port (8008) cannot be changed. If you can login to the web app and create a project it means everything is okay. Notice that when you configure trains ( trains-init ) the port numbers are correct 🙂

5 years ago

0 Does The New 2.0 Helm Charts (App Ver 1.1.0) Not Support Nfs?

the storage configuration appears to have changed quite a bit.

Yes I think this is part of an the cloud ready effort.
I think you can find the definitions here:
https://artifacthub.io/packages/helm/allegroai/clearml

4 years ago

0 We Are Facing Performance Issues Of Our Self-Hosted Clearml Server Looking At The Cpu Utilization \ Memory \ Networking We Couldn'T Identify A Bottleneck We Are At The Moment Using ~100 Workers For Some Hpo, And The Main Performance Issues We Observe Are

Hi DepressedChimpanzee34 , took me a while but I think there is a solution:
In your docker file, replace:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L5
with
entrypoint: /bin/bash command: -c "mkdir -p /var/log/clearml && cd /opt/clearml/ && python3 -m apiserver.apierrors_generator && gunicorn -w 4 -t 600 --bind=0.0.0.0:8008 apiserver.server:app"

3 years ago

0 Hi, We Have Been Using Clearml In Our Development Environment To Train Our Models And Benchmarking Them. I Was Wondering What Is Clearml'S Role In Transition To (Production. Two Specific Points, Deployment, And Automated Retraining Pipeline.

Hi SubstantialElk6

Generically, we would 'export' the preprocessing steps, setup an inference server, and then pipe data through the above to get results. How should we achieve this with ClearML?

We are working on integrating the OpenVino serving and Nvidia Triton serving engiones, into ClearML (they will be both available soon)

Automated retraining

In cases of data drift, retraining of models would be necessary. Generically, we pass newly labelled data to fine...

4 years ago

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

PanickyMoth78

and I would definitely prefer the command

executing_pipeline

to

not

kill

the process that called it.

I understand why it would be odd from a notebook perspective, the issue is that the actual code is being "sent" to the backend to be execcuted on a remote machine. It is important to understand, that this is the end of the current process. Does that make sense ?
(not saying we could not add an argument for that, just trying to ...

3 years ago

0 Hello Periodically Under High Load, We Are Facing Too Long(>1 Sec) Processing Times For Requests Such As: Workers.Status_Report Events.Add_Batch Queues.Get_Next_Task. Also There Are Warnings "Connection Pool Is Full, Discarding Connection: Elasticsearch-S

Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...

4 years ago

Show more results