AgitatedDove14

49 Questions, 8094 Answers

Active since 10 January 2023

Last activity 10 months ago

Reputation

Badges 1

25 × Eureka!

Questions 49
Answers 8094

0 Votes

1 Answers

1K Views

0 Votes 1 Answers 1K Views

Gals, Guys &

Gals, Guys & :robot_face: , if you want to checkout the Hyper-Parameters automation (Using Bayesian Optimization Hyper-Band) We have an example on the demo s...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of Trains :smile_cat: ) <https://twitter.com/PyTorch/status/1272919483980500999>

Gals, Guys & :robot_face: If you want to get some inspiration on building DL Continuous Integration pipelines, I suggest this post (obviously built on top of...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Slack Security ... Go Figure

Slack security ... Go figure 😉

clearml

5 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

@YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo server, and do get the Scalars without any issues...

YummyWhale40 you are saying the example code is not working when running with the demo server? Also I think I was able to view your experiment on the demo se...

clearml

4 years ago

0 Votes

3 Answers

767 Views

0 Votes 3 Answers 767 Views

We Recently Released A New Version Of

we recently released a new version of clearml-session with Persistent Workspace support! 🚀 🎉 Finally you can develop on remote machines with workspace fold...

remote-ssh

11 months ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Well To Be Honest, We Kind Of Thought It'S Redundant. Basically Storing Artifacts In Experiments And Having Them Retrieved Quickly From The Code Itself Was Way More Convenient For Us Then To Manually Have To Do Clone/Pull Of The Data... Example: Create Da

Well to be honest, we kind of thought it's redundant. Basically storing artifacts in experiments and having them retrieved quickly from the code itself was w...

clearml

4 years ago

0 Votes

1 Answers

746 Views

0 Votes 1 Answers 746 Views

There Is No V1.0 Release Without A Prompt V1.0.1 Following It, And We Are No Different

🙏 There is no v1.0 release without a prompt v1.0.1 following it, and we are no different 😊 pip install clearml==1.0.1

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

:confetti_ball: :champagne: Happy new year <!everyone>! :fireworks: :sparkler: We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to see users enjoying the product you build, and y

🎊 🍾 Happy new year ! 🎆 🎇 We wanted to thank you all for the great feedback, contribution and general support you guys give us. It is truly fulfilling to ...

clearml

4 years ago

0 Votes

7 Answers

708 Views

0 Votes 7 Answers 708 Views

Thank You All For Taking The Time To Answer Our Survey (If You Haven'T Already, We Urge You To

Thank you all for taking the time to answer our survey (If you haven't already, we urge you to do so ). Your feedback has a major impact on what we build, do...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Lol, I Wonder What The Adblock Rule Was ;)

Lol, I wonder what the adblock rule was ;)

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!channel> *important notice* : it seems Nvidia broke some of their PPA's security :confused: , causing `apt-get updates` to fail inside containers. This in term will cause `clearml-agent` to fail with specific Nvidia containers. _If you are seeing simila

important notice : it seems Nvidia broke some of their PPA's security 😕 , causing apt-get updates to fail inside containers. This in term will cause clearml...

clearml

2 years ago

0 Votes

6 Answers

684 Views

0 Votes 6 Answers 684 Views

Hi :robot_face: , humans We have the new documentation site up and running 🎉 None 🎊 This is still a work in progress, so we keep the previous version alive...

clearml

3 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

you set it :slightly_smiling_face:

you set it 🙂

clearml

5 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!here> Gals/Guys/:robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : <https://github.com/allegroai/trains/issues/161> For example: generate an alert if my experiment reaches a certain

Gals/Guys/ :robot_face: If you have ideas on improving the Slack Monitoring service, please add them on the dedicated Github Issue : https://github.com/alleg...

clearml

4 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

Is It A One Time Thing? Or Recurring?

Is it a one time thing? or recurring?

clearml

5 years ago

0 Votes

4 Answers

175 Views

0 Votes 4 Answers 175 Views

Happy New Year Everyone!

Happy new year everyone! 🥂 🎆 Last minute 🎁 v2.0 is now out, with a new UI design! now finally supporting light & dark mode 🤩 Lot's more to come this year...

clearml

one month ago

0 Votes

9 Answers

1K Views

0 Votes 9 Answers 1K Views

Hi https://github.com/allegroai/trains/releases/tag/0.15.1 / https://github.com/allegroai/trains-server/releases/tag/0.15.1 / https://github.com/allegroai/tr...

clearml

4 years ago

0 Votes

6 Answers

1K Views

0 Votes 6 Answers 1K Views

Hi ! ClearML Server + SDK v1.9.0 is out! 🎉 🚀 🎊 Happy Holidays and Happy New Year! ❇️ 🎇 🎄

clearml

2 years ago

0 Votes

0 Answers

1K Views

0 Votes 0 Answers 1K Views

<!everyone> Trains v0.14.2 is out (<https://github.com/allegroai/trains/releases/tag/0.14.2|Change log>) Highlights: <https://github.com/allegroai/trains/blob/master/trains/storage/manager.py#L13|trains.storage.StorageManager> - with caching for any http

Trains v0.14.2 is out ( https://github.com/allegroai/trains/releases/tag/0.14.2 ) Highlights: https://github.com/allegroai/trains/blob/master/trains/storage/...

clearml

4 years ago

Show more results

0 Hi Everyone, I Have A Few Questions To Understand The Clearml-Serving A Little Better (And How Much Resources To Allocate For The Serving Pods): Are The Models I Defined To Be Served E.G. Via The Cli Downloaded To The Serving Pod? So That They Are Physica

Hi @<1649221394904387584:profile|RattySparrow90>

: Are the models I defined to be served e.g. via the CLI downloaded to the serving pod

Yes this is done automatically and online (i.e. when you update the using CLI/API) , based on the models/endpoints you set

So that they are physically lying there as a file I can see in the filesystem?

They are, and cached there

Or is it more the case that the pod gets the model when needed/when an API call for this model is incoming?

I...

11 months ago

0 Hello, Is It Possible To Run Trains Offline Where There'S No Http Connection Between The Node Running The Job And Where The Web Ui Runs? I See In Your Diagram The Connection Between Training Machine And Trains Server (Which Contains The Web Ui) Is Over Ht

I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)
task.get_offline_mode_folder()You can also have a soft link of the offline folder (if you are working on a linux machine:
ln -s myoffline_folder ~/.trains/cache/offline

4 years ago

0 Hey All! Ive Gone Through The Doco And Not Found Anything At The Moment, But Does Clearml Have Model Versioning And Staging (Similar To Mlflow).

LudicrousParrot69
I "think" I have a better handle on what you wish to do.
Is it kind of generic "serving" solution?
FYI:
Model artifact is, usually, a weights/model file. The idea that later you will be able to access it and serve it. Now the problem is (and I think this is what you are referring to) there is usually a specific piece of code tied to that model that can use it (a.k.a pyfunc)
A few ideas:
These days everyone is trying to build their models with generic interface, so that scik...

4 years ago

0 Hi I Saw This On The Clearml-Agent Docs But Other Than The Docker Image, I'M Not Sure How To Integrate This With Clearml Py And Clearml-Server. Please Advise.

SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...

3 years ago

0 Can Someone Help Me With Deploying This Example Model (From Triton Inference Server) Deployed In Clearml-Serving? Too Many Random Errors For Me To Figure It Out

Do we launch multiple gorups of these in different projects?

Actually Triton can serve multiple models and the endpoints/models are controlled from the clearml-serving.
The only issue is adding a load-balancer in front of multiple nodes to balance the requests between them. wdyt?

3 years ago

0 Hi, With Clearml-Agent 1.5.1, I Tried To Run An Experiment Within A Docker With Image Python3:8 And It Failed Executing The Task While Trying To Call Python3.9. I Am Not Sure Why It'S Using Python3.9, Since The Agent.Default_Python Is 3.8 And The Image Is

packages are updated, and I don't know which python version I get, + changing the python version of the OS is not really recommended

Wait I'm confused, this is inside a container, no?

and the python version running my code should not depend of the python version running the clearml-agent (especially for experiments running in containers)

Generally speaking you are correct, but some packages will not have the same version for all python versions

Specifically in this case I think...

2 years ago

0 Hi, I Have Another Problem

This is also set in the command line.
--cpu-only or maybe without any --gpus flag at all

4 years ago

0 Hello Everyone, I’M Newcomer For Clearml. I Have Question Related To

That’s the question i want to raise too,

No file size limit
Let me try to run it myself

3 years ago

0 Let’S Imagine I’M Building A Pipeline With Five Consecutive Steps, Where Some Of The Steps Are Non Ml/Dl Based. Using Clearml I Run A Lot Of Experiments To Find The Right Pipeline Configuration. After I Found The Right Algorithms And Parameters For My Pip

Hi ClumsyElephant70

s there a way to run all pipeline steps, not in isolation but consecutive in the same environment?

You mean as part of a real-time inference process ?

3 years ago

0 Hi, I Have A Small Issue About Gpu Monitoring. I Run My Training Inside A Singularity Container And I Set The Cuda_Visible_Devices Variable. However, I Get The Following Message:

Yes, that means the nvidia drivers are present (as you mentioned the GPU seems to be detected).
Could you check you have libnvidia-ml.so.1 inside the container ?
For example in /usr/lib/nvidia-XYZ/

4 years ago

0 Hi, I Have Been Getting The Following For A While. Is There A More Detailed Log I Can Look Into? This Happens On Both Https And Http.

Setting the credentials on agent machine means the users cannot use their own credentials since an k8s glue agent serves multiple users.

Correct, I think "vault" option is only available on the paid tier 😞

but how should we do this for the credentials?

I'm not sure how to pass them, wouldn't it make sense to give the agent an all accessing credentials ?

3 years ago

0 When I Do

is this a config file on your side or something I can change, if we had enterprise version?

Yes, this is one of the things you can configure

2 years ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

ConfusedPig65 could you send the full log (console) of this execution?

3 years ago

0 Hey, I'M Running A Pipeline, And 1 Stage Passed - But The Next One Failed. I Fixed The Bug For The Second One - Is There Any Way To Retry The Pipeline From The Failure?

Is there an option to do this from a pipeline, from within the

add_step

method? Can you link a reference to cloning and editing a task programmatically?

Hmm, I think there is an open GitHub issue requesting a similar ability , let me check on the progress ...

nope, it works well for the pipeline when not I don't choose to continue_pipeline

Could you send the full log please?

3 years ago

0 Hi All, We’Re Interested In Using Trains For A New Ml Project. This Project Is An Early Proof Of Concept So We’D Like To Start With The Open Source Version. One Question We’Re Finding Difficult To Answer Is: What Tools Do People Successfully Combine With

EnchantingWorm39 you have great timing ;)

4 years ago

0 Is This An Expected Behaviour? Trains Version 0.16.4, Not Able To Upgrade Now To Latest Version But I Doubt This Was Changed

New version will contain much more advanced search (including all the task fields)

are there any more fields in this function with partial matching? for example project? tags?

Yes they can all be filtered (basically everything you see in the UI)
notice: tags are strings (you can provide list of tags), project is an ID of the project
(Use Task.get_project_id, I think)

4 years ago

0 Hi All, I'M A New User With Clearml-Agent. I Know It'S Supposed To Automatically Replicate The Environment Of A Task, Based On Installed Packages List. However, Installed Packages Of My Task Is Misses Many Of Installed Packages (Any Idea Why?) How Do I Co

Hi @<1523702969063706624:profile|PoisedShark13>

However, INSTALLED PACKAGES of my task is misses many of installed packages (any idea why?)

It automatically detects the directly imported packages, literally analyzing your code base and looking for imports
The derivative packages (i.e. the one that any of the "main" packages need, will be listed after the first time the agent installs everything)
If something specific is missing, you can manually add it with:

Task.add_requiremen...

one year ago

0 Hi Everyone! I Have A Short Question That You Can For Sure Help Me With. Is There A Way To Avoid Each Task To Create A New Environment? I'D Like To Specify Which Env To Use. I Tried With

@<1523703080200179712:profile|NastySeahorse61> / @<1523702868694011904:profile|AbruptCow41>

Is there a way to avoid each task to create a new environment?

You can just define CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1 it will just use whatever you have there (notice it will totally ignore requirements.txt and "installed packages" on the Task)

BTW I would recommend turning on the venv caching, this is per docker/python/packages caching so the next time you are using th exact requi...

2 years ago

0 Assuming I Have A

A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...

4 years ago

0 Hi Anyone

That sounds like an internal tritonserver error.
https://forums.developer.nvidia.com/t/provided-ptx-was-compiled-with-an-unsupported-toolchain-error-using-cub/168292

3 years ago

0 Hello, I'M Trying To Save A Keras Model As A Task Artifact, And Then Upload It From Another Task. Does Anyone Know The Syntax For That? What I'Ve Seen Is Not Quite Working.

Okay ConfusedPig65 I found the problem. For some reason the latest TF.keras.load_model . save_model is not tracked.
I'll make sure we push a fix later today

3 years ago

0 Hello! Since Today I Get

Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked

/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
-...

3 years ago

0 Hey! Is There A Way To Ignore The Spammy Output Of Progressbars Like

Long story short, work in progress.
BTW: are you referring to manual execution or trains-agent ?

4 years ago

0 Hello Everyone! The Question About Dataset.Squash(). The Squash Operation Copies All The Data And Is No Longer Linked To Previous Commits? I Thought This Operation Is Like Git Squash But It Seems To Me That Clearml Dataset.Squash() Create Just A Copy Of S

The Squash operation copies all the data and is no longer linked to previous commits?

Yes, basically the idea is if you have data version that relies on many parents that needs to be merged, the squash will create a merged copy and push it all as a single version, and then yes the parent versions are no longer needed

I thought this operation is like git squash but it seems to me

yeah... we did not want to actually delete the parents because unlike git, the operation is done ...

11 months ago

0 Well, This Is My Question... I'M Trying To Adapt Clearml To Aws Using Basically Ecs Fargate + Documentdb + Aws Es + Elasticache + Efs. I Could Start The Fileserver Component, But Now I'M Trying To Start The Api Server And Is Not Working, Before Stop The T

BTW: Full RestAPI reference here
https://allegro.ai/clearml/docs/rst/references/clearml_api_ref/index.html

3 years ago

0 I Have Set

Hi Guys, just curious here, what's was the final issue?
Also out of curiosity, what does that mean? "1.12.2 because some bug that make fastai lag 2x" ?

9 months ago

0 Hi Everyone! Is There A Way To Specify The Working Directory In A Pipeline Component? I’M Using Pipelines From Decorators, I Can Set The Repo Url Just Fine, But I’M Running Everything From A Subfolder, And The Working Dir Is Set To

Hi @<1570220858075516928:profile|SlipperySheep79>

Is there a way to specify the working dir from the decoratoe

not directly, but why would that change anything? I mean the coponent code will be created in the git root, and you can still access files inside the subfolders

from .subfolder import something

what am I missing?

one year ago

Okay this is a bit hacky but will work

@PipelineDecorator.component(...)
def step(...)
  import sys
  import os
  sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__)), "projects", "main" ))
  
  from file import something

one year ago

0 Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

This seems to only work for a single file (weights_path implies a single file, not multiple ones). Is that the case?See update_weights_package actually packages an entire folder as zip and will do the extraction when you get it back (check the function docstring, I think you can also specify wildcard etc if needed)

Why do you see this as preferred to the dataset method we have now?

So it answers a few requirements that you raised
It is fully visible as part of the project and se...

2 years ago

0 When I Setup My Local Virtual Environment I Use A Combination Of Conda And Pip. I Use Conda As My Environment Manager, And Then Use Pip For Packages That Are Not In The Conda Repositories.

Hi VivaciousPenguin66
Seems like a CUDA/CUDNN issue.
You argent is configured to work in venvmode, which mean it will pull the correct pytorch version based on the detected CUDA driver support. Speicifally you can see in the log "agent.cuda_version = 111" which means CUDA 11.1 and from the log it found the correct pytorch version:
` Torch CUDA 111 download page found
Found PyTorch version torch==1.8.1 matching CUDA version 111
Found PyTorch version torchvision==0.9.1 matching CUDA version 1...

3 years ago

Show more results