AgitatedDove14

48 Questions, 8049 Answers

Active since 10 January 2023

Last activity 6 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8049

0 Hi There. I'M Trying To Switch Pipeline Code From A Local Run Using

I think that clearml should be able to do parameter sweeps using pipelines in a manner that makes use of parallelisation.

Use the HPO, it is basically doing the same thing with some more sophisticated algorithm (HBOB):
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py

For example - how would this task-based example be done with pipelines?

Sure, you could do something like:
` from clearml import Pi...

2 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

CourageousLizard33 so you have a Linux server running Ubuntu VM with Docker inside?
I would imagine that you could just run the docker on the host machine, no?
BTW, I think 8gb is a good recommendation for a VM it's reasonable enough to start with, I'll make sure we add it to the docs

4 years ago

0 Hi, I'M Trying To Install A New Server, This Is A Fresh Ubuntu 18.04 Install. When I Try To Run The Docker Composer Up Command I Get Error Messages Like This One:

Probably less secure though :)

4 years ago

0 Hi. Is This Line In The Roadmap Article Still Valid, Is It Showing Up In Clearml-Serving?

Hi SubstantialElk6
ClearML-Serving is already out with a new version, the ETA for the next ClearML-serving full 1.0 (which is the new redesign version) is the end of May

2 years ago

0 Hey Everybody - I Am Using The Pipelinecontroller With Add_Function_Step To Add Different Step To The Pipeline. Is There A Way To Specify A Callback Upon An Abort Action From The User ? I Tried Using The Post_Execute_Callback Or The Status_Change_Callback

Hi @<1523715429694967808:profile|ThickCrow29>

Is there a way to specify a callback upon an abort action from the user

You mean abort of the entire pipeline?
None

one year ago

0 Hello I'M New Here, I Found This Error When Running This Command "Docker-Compose --Env-File Example.Env -F Docker-Compose-Triton.Yml Up". Actually, When I Run This Command For The First Time, It Worked. And Then When I Try To Change To My Friend'S Workspa

Hi MoodyCentipede68 , I think I saw something like it, can you post the full log? The triton error is above, also I think it restarted the container automatically and then it worked

2 years ago

0 Hello I'M New Here, I Found This Error When Testing My Tensorflow / Keras Model. I Already Create The Model Endpoint By Running Command 'Clearml-Serving --Id <Service_Id> Model Add --Engine Triton --Endpoint "Model_Name"... '. Also My Tensorflow / Keras M

NICE! MoodyCentipede68 this is awesome 🙂

2 years ago

MoodyCentipede68 could it be that the model is on one account (workspace) and your credentials (the ones provided to the docker compose) are from another workspace?
The error itself point to the triton helper failing to get the model ID from the backend. The models are uploaded to a a specific workspace, and it looks like a mismatch (I.e. the model Id is nowhere to be found) wdyt?

2 years ago

0 Hey, I'M Trying To Set Up A Clearml Server On Docker As Per Documentation. Everything Goes Well Until The Docker-Compose Up Step, That'S When I Get This Error; Error: Error Pulling Image Configuration: Download Failed After Attempts=6: X509: Certificate

WickedElephant66 this seems like a general network issue, like the docker service is missing your companies firewall certificate.
Can you pull any container from docker hub ?

2 years ago

0 Hi Again, I Tried To Upgrade Trains Package To 15.1 From 13.1 That I Was Using For A While.. After The Upgrade My Code Stuck When Trying To Use "Pool" (From Multiprocessing Import Pool) The Code Snip:

CooperativeFox72 you can you start by checking the latest RC :)
pip install trains==0.15.2rc0

4 years ago

0 Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

in clearml.conf we could have:
azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }Then in AzureContainerConfigurations :
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
...

3 years ago

0 What Is The Suggested Way Of Running Trains-Agent With Slurm? I Was Able To Do A Very Naive Setup: Trains-Agent Runs A Slurm Job. It Has The Disadvantage That This Slurm Job Is Blocking A Gpu Even If The Worker Is Not Running Any Task. Is There An Easy Wa

HealthyStarfish45 We are now working on improving the k8s glue (due to be finished next week) after that we can take a stab at slurm, it should be quite straight forward. Will you be able to help with a bit of testing (setting up a slurm cluster is always a bit of a hassle 🙂 )?

3 years ago

0 Is There A Way To Control How Many Parallel Connections Are Used When Downloading From

as i also noticed that uploads are sometimes slow, and i see here max_connections=2

Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object and an additional variable on the AzureContainerConfigurations class.
Could you PR a tested draft ? we will be able to take from there

3 years ago

0 Hello All, I'M Trying To Adapt Clearml With My Workflow. I Installed A Server At My Server, With Workers Attached To It. I'M Trying To Execute A Task From My Local Within One Of My Workers. Trying To Use Docker Mode And A Custom Image. I Also Have A Local

ZanyPig66 it sounds like you need to add the docker args for binding, just add to the Task.create the argument: 'docker_args="-v /mnt/host:/mnt/container"'

2 years ago

Hi CooperativeFox72 trains 0.16 is out, did it solve this issue? (btw: you can upgrade trains to 0.16 without upgrading the trains-server)

4 years ago

0 Hello! I Have A Quick Question About The Clearml Hyperparameter Optimizations Module. Is It Possible To Use It Without Using The Clearml Agent System? In Other Words, Launch A Script From A Few Machines Manually But The Hyperparameters Are Given From Cle

That's not possible, right?

That's actually what the "start_locally" does, but the missing part is starting it on another machine without the agent (I mean it totally doable, and if important I can explain how, but this is probably not what you are after)

I really need to have a dummy experiment pre-made and have the agent clone the code, set up the env and run everything?

The agent caches everything, and actually can also just skip installing the env entirely. which would mean ...

one year ago

0 Hi Everyone, Thx So Much For This Awesome Tool! I Was Wondering, Is There A Way To Define For Trains, Which Variable In The Project Is The Kpi, And Then Cluster And Plot Experiments With The Same Hyper Parameters?

Hi UptightMouse31
First, thank you 😊
And to your question:

variable in the project is the kpi,

You mean like add it to the experiment table and get kind of leader-board ?

4 years ago

https://allegro.ai/docs/webapp/webapp_project/#customizing-the-experiments-table

4 years ago

0 After I Have Create A Task And Closed It In A Notebook, Any Activity Seems To Trigger Another Task. For Example:

Okay that actually makes sense, let me check I think I know what's going on

3 years ago

0 After I Have Create A Task And Closed It In A Notebook, Any Activity Seems To Trigger Another Task. For Example:

Verified, and already fixed with 1.0.6rc2

3 years ago

0 Hi Everybody. I Have Problem When Logging Model In A Specific Case. If Model Has Parameter That Is A Dict Than It Is Not Saved To Clearml Even Tho It Is Saved In A Model Folder Normally. I Have Also Attached Example When This Is Happening As A Snippet. D

Hi OutrageousGiraffe8

Does anybody knows why this is happening and is there any workaround, e.g. how to manually report model?

What exactly is the error you are getting? and with which clearml version are you using?
Regrading manual Model reporting:
https://clear.ml/docs/latest/docs/fundamentals/artifacts#manual-model-logging

2 years ago

0 Hi There, I'M Training A Pytorch Model And Save It Every Epoch. It Seems Like The Model Wights Are Overridden And I Can'T Choose The Best Model After The Experiment Ends. This Feature Is Missing Or I'M Not Using The Library Well?

SuccessfulKoala55 please post here once the code is available in your pytorch_ignite 🙂

4 years ago

0 After I Have Create A Task And Closed It In A Notebook, Any Activity Seems To Trigger Another Task. For Example:

btw: any specific reason to call current_task after you closed the main Task ?

3 years ago

0 I Assume I Can Ask A Question Here. The Clearml Orchestrator Looks Interesting. But The Website Suggests That K8S Is Required. We Have A Linux Training Box (Lambdabox) Where We Want To Run Training. Can We Place The Clearml Orchestrator Agent On The M

Hi RobustFlamingo1

The ClearML Orchestrator looks interesting. But the website suggests that K8S is required

No k8s is not a must, only an option 🙂

We have a Linux training box (LambdaBox) where we want to run training. Can we place the ClearML orchestrator agent on the machine without needing K8S?

Yes should be quite easy.
If you intent to use containers, make sure you have docker installed.
Then just pip install clearml-agent and configure it:
https://clear.ml/doc...

2 years ago

0 Hi! Is There Any Reason Why Integer/Float Values Are Casted To String When Connecting Arguments Dictionary To Task And Then Retrieve Them Using

Only the dictionary keys are returned as the raw nested dictionary, but the values remain casted.

Using which function ? task.get_parameters_as_dict does not cast the values (the values themselves are stored as strings on the backend), only task.connect will cast the values automatically

3 years ago

0 What Is The Recommended Way To Stop The Execution Of A Specific Agent? This Command Doesn'T Allow Me To Specify The Agent Ip I Want To Stop:

GiganticTurtle0 adding --stop to the exact daemon execution will stop it (meaning if you have multiple agents on the same machine launched with different parameters, just add the --stop to retire the specific one)

3 years ago

0 Hello! I'M Trying To Make A Simple Eval.Py Script That Will Go Pull The Best Model Of A Given Experiment, Load It Locally And Evaluate It On Whatever Data I Give. Question 1: Is There A Standard Way Documented Somewhere To Do This? Question 2: I'M Loadin

Fixed in pip install clearml==1.8.1rc0 🙂

one year ago

0 Is There An Easy Way To Add A Link To One Of The Tasks Panels? (As An Artifact, Configuration, Info, Etc)? Edit: And Follow Up Regarding The Dataset. As Discussed Somewhere Previously, The Datasets Are Now Automatically Moved To A Hidden "Sub-Project" Pr

The current implementation (since 1.6.3 I think) creates the issues in the linked comment (with images to visualize).

Understood, basically the moment we add nested project view to the dataset (and pipelines for that matter, and both are already being worked on), it should solve everything. Is that correct?

2 years ago

Hi UnevenDolphin73

Is there an easy way to add a link to one of the tasks panels? (as an artifact, configuration, info, etc)?

You can add a link as an artifact, that is probably the easiest:
tasl.upload_artifact(name="just link", artifact_object=" ")

EDIT: And follow up regarding the dataset. As discussed somewhere previously, the datasets are now automatically moved to a hidden "sub-project" prefixed with

.datasets

. This creates several annoyances that I...

2 years ago

0 In Order For A New Worker To Come Online In My K8 Cluster, Do I Need To Have An Ec2 Startup Script Init The Agent/Config, And Then Start The Daemon? Do I Have To Do This Manually Is This A Better Way?

So I'd create the queue in the UI, then update the helm yaml as above, and install? How would I add a 3rd queue?

Same process?!

Also I'd like to create the queues pragmatically, is that possible?

Yes, you can, you can also pass an argument for the agent to create the queue if it does not already exist, just add --create-queue to the agent execution commandline

2 years ago

Show more results