JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In The Context Of Multi-Gpu Training, Is

Hi, in the context of multi-gpu training, is Model.get_local_copy() multi-process safe? or should make sure only the first process calls it first, then others

clearml

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Hello, I tried the clearml-session CLI to start a jupyter instance on an agent, but an error with the password, here is the full CLI log: $ clearml-session -...

aws remote-ssh

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi There, Congrats For Releasing V1

Hi there, congrats for releasing v1 😄 I observed that with pytorch ignite (4.2.0), the metrics of the validation engines are delayed by one epoch. I am not ...

pytorch

4 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Hey, I have one question regarding the cleanup_service task in the DevOps project: Does it assume that the agent in services mode is in the trains-server mac...

mlops

5 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi, Another Idea For Clearml Web Ui: In The Projects View, If I Have Several Experiments Being Enqueued And I Sort By “Started” Ascending (Newest On Top), I Expect To See Enqueued Experiments At The Very Top, While They Are Shown At The Very Bottom - Woul

Hi, another idea for ClearML web UI: in the projects view, if I have several experiments being enqueued and I sort by “Started” ascending (newest on top), I ...

clearml

4 years ago

0 Votes

8 Answers

2K Views

0 Votes 8 Answers 2K Views

Hi! I Have A Question Regarding Performances Of The Clearml-Server: Are The Calls From The Agents Made Asynchronously/In A Non Blocking Separate Thread? Is The Connection To The Clearml-Server Expected To Be A Bottleneck If The Clearml-Server Is Far From

Hi! I have a question regarding performances of the clearml-server: are the calls from the agents made asynchronously/in a non blocking separate thread? is t...

clearml

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hey There, I See That In The Autoscaler Configuration, The

Hey there, I see that in the autoscaler configuration, the queues param accept dictionaries with values of type list of lists (see eg below.) What does it me...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

5 years ago

0 Votes

17 Answers

2K Views

0 Votes 17 Answers 2K Views

Hi, I Updated To Clearml-Server 1.4.0 And I Am Uncomfortable With The New Table/Detail View, Is There A Way To Disable It And Use The Previous One (On Click -> Open Details)?

Hi, I updated to clearml-server 1.4.0 and I am uncomfortable with the new Table/Detail view, is there a way to disable it and use the previous one (on click ...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi There, It Seems Like There Is A Bug With The Visualization Of Debug Samples On The Ui (Server V1.2.0, Self-Hosted): When Clicking On A Debug Sample Then On The Download Button, If The Sample Is Stored In S3, The Download Button Opens A Blank Page With

Hi there, it seems like there is a bug with the visualization of debug samples on the UI (server v1.2.0, self-hosted): when clicking on a debug sample then o...

clearml

3 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi, Coming Back With The Venv Caching: With The Following Setting:

Hi, coming back with the venv caching: with the following setting: I call Task._update_requirements(["."]) setup.py has the following install_requires=["my-p...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hi, There Is A Small Bug With Auto-Refreshing In The Debug Samples Tab Of The Web Ui: If It Is On, Then It Will Always Force The First Series To Be Displayed, Regardless Of What Has Been Selected By The User. Eg: If I Have Two Metrics,

Hi, there is a small bug with auto-refreshing in the DEBUG SAMPLES Tab of the Web UI: If it is ON, then it will always force the first series to be displayed...

clearml

3 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey There, Since Which Version, Clearml Stops Connecting To The Demo Server By Default?

Hey there, since which version, clearml stops connecting to the demo server by default?

clearml

4 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Often I Want To Compare Scalars Of Two Experiments With The Same Name But With Different Tags. In The Scalars Comparison Tab, I Cannot See Which Experiment Is Which Because I Don’T See The Tags. Usually, I Rename The Experiments So That I Can Identif

Hey, often I want to compare scalars of two experiments with the same name but with different tags. In the SCALARS comparison tab, I cannot see which experim...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

Hi guys, I got a very unexpected error today on in one of my agents: ... Collecting tqdm Using cached tqdm-4.48.2-py2.py3-none-any.whl (68 kB) Processing /ro...

clearml

5 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hey, I Moved My Trains-Server To Another Machine, Zipping The /Opt/Trains/Data Folder As Described In The Docs

Hey, I moved my trains-server to another machine, zipping the /opt/trains/data folder as described in the docs https://allegro.ai/docs/deploying_trains/train...

mlops

5 years ago

0 Votes

26 Answers

2K Views

0 Votes 26 Answers 2K Views

Hi, I Would Like To Follow-Up In This

Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...

aws mlops

3 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Would Like To Bring Awareness

Hi, I would like to bring awareness on this issue , this impacts my work as I cannot install the older version of torch (1.11.0)

clearml

2 years ago

0 Votes

0 Answers

2K Views

0 Votes 0 Answers 2K Views

Hi, I Encountered A Bug On Clearml-Server 1.0.1: I Tried To Add In A Project Page A Custom Column In +Hyper Parameters > Args > Queue And Got An Error Pop Up With The Following Message:

Hi, I encountered a bug on clearml-server 1.0.1: I tried to add in a project page a custom column in +HYPER PARAMETERS > Args > queue and got an error pop up...

clearml

4 years ago

0 Votes

30 Answers

3K Views

0 Votes 30 Answers 3K Views

Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Hi, I am giving another try to clearml-session and I am blocked at the current error shown when the CLI try to establish the tunneling: Starting SSH tunnel W...

remote-ssh

3 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hello, In The Following Context:

Hello, in the following context: controller_task = Task.init(...) # This will clone the parent task, enqueue and wait for finished status data_processing_tas...

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, In The Aws Autoscaler, Is It Possible To Specify Multiple Regions (Availability_Zone)? I Currently Use Eu-West-1A, And Would Like To Start Using Eu-West-1B And Eu-West-1C. I Tried Specifying A List In Availability_Zone Parameter, But Without Success:

Hi, in the aws autoscaler, is it possible to specify multiple regions (availability_zone)? I currently use eu-west-1a, and would like to start using eu-west-...

aws

4 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hi, Just Want To Report A Small Bug In The Clearml Dashboard: After Queuing An Experiment, If I Change The Experiment Queue, Then Go Back To The Experiment Info Tab, The Queue Property Still Shows The Previous Queue

Hi, just want to report a small bug in the clearml dashboard: after queuing an experiment, if I change the experiment queue, then go back to the experiment I...

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, Is There A Way To Stop A Clearml-Agent From Within An Experiment? Or Block It To Prevent It Running Any Other Task?

Hi, Is there a way to stop a clearml-agent from within an experiment? Or block it to prevent it running any other task?

clearml

4 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

4 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey, What Is The Exact Difference Between

Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?

clearml

5 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi, If I Am Starting My Training With The Following Command:

Hi, if I am starting my training with the following command: python -u -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/tra...

clearml

3 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi Again, Is There A Way To Pass Secrets As Parameters Of A Task? I Have An Experiment That Requires Connecting To A Database, And I Need To Be Able To Pass The Creds As Task Params (Or In Another Way, I Don'T Know Yet). But I Don'T Want To Expose My Cred

Hi again, is there a way to pass secrets as parameters of a task? I have an experiment that requires connecting to a database, and I need to be able to pass ...

clearml

4 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Would It Possible To Add An Option To Make

Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There Again, I Am Not Sure To Understand What Is The Difference Between Storagemanager And Storagehelper And Which One To Use?

Hey there again, I am not sure to understand what is the difference between StorageManager and StorageHelper and which one to use?

clearml

5 years ago

Show more results

0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

Alright, thanks for the answer! Seems legit then 🙂

5 years ago

0 Hi, I Am Trying To Use The Clearml-Agent In Docker Mode To Run An Experiment, But It Seems To Fail Passing The Clearml.Conf File To The Docker Container:

in an ec2 machine

2 years ago

0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

TimelyPenguin76 , no, I’ve only set the sdk.aws.s3.region = eu-central-1 param

4 years ago

0 Hi, Together With

Seems to works, I started a last one to confirm!

5 years ago

0 Hi, Where Can I Find The Server Parameter To Control When The Server Is Unregistering An Agent After Not Receiving Updates? Currently It'S Quite Long (30Mins) And This Prevents The Autoscaler From Launching A New Agent

, causing it to unregister from the server (and thus not remain there).

Do you mean that the agent actively notifies the server that it is going down? or the server infers that the agent is down after a timeout?

2 years ago

0 Hi, Is It Possible To Pass Environment Variables To Agents Created By The Aws Autoscaler Service?

extra_configurations = {'SubnetId': "<subnet-id>"}with brackets right?

4 years ago

0 Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

it worked for the other folder, so I assume yes --> I archived the /opt/trains/data/mongo, sent the archive via scp, unarchived, updated the rights and now it works

4 years ago

0 Hi Guys, Is A Task Updating Its Status To 'Complete' Before Finishing To Upload Its Artifacts/Metrics In The Background?

I was asking to exclude this possibility from my debugging journey 😁

5 years ago

Yes it would be very valuable to be able to tweak that param, currently it's quite annoying because it's set to 30 mins, so when a worker is killed by the autoscaler, I have to wait 30 mins before the autoscaler spins up a new machine because the autoscaler thinks there is already enough agents available, while in reality the agent is down

2 years ago

0 Hi, It Seems That The

Thanks SuccessfulKoala55 for the answer! One followup question:
When I specify:
agent.package_manager.pip_version: '==20.2.3'
in the trains.conf, I get:
trains_agent: ERROR: Failed parsing /home/machine1/trains.conf (ParseException): Expected end of text, found '=' (at char 326), (line:7, col:37)

5 years ago

0 Hello

Looking forward to seeing the clearml-deploy 🤩 you guys rock 🚀

4 years ago

0 Hi Guys For The Aws Auto-Scaler I Need To Access Aws Ssm Or Create .Env File Locally When Using The Init Script. Has Anyone Done This?

ok, what is your problem then?

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

So this message appears when I try to ssh directly into the instance

3 years ago

0 Hi Guys, Since I Am Done With Implementing The Aws Autoscaler, I Would Like To Share Some Pain Points That I Encountered In The Process With The Hope That They Can Be Documented To Help Other Users:

Sure, I opened an issue https://github.com/allegroai/clearml/issues/288 unfortunately I don't have time to open a PR 🙏

4 years ago

0 Does Trains 0.16 Supports Pip >=20.2?

I would like to try it to see if it solves some dependencies not found eventhough they are installed when using --system-site-packages

5 years ago

0 Hi, Together With

(It would be nice to have all the Pypi releases tagged in github btw)

5 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

After I started clearml-session

3 years ago

0 Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

I sneaked a PR > https://github.com/allegroai/clearml/pull/319

4 years ago

0 Hi, I Am Trying To Use Omegaconf With Task.Connect_Configuration And I Get The Following Error:

But I am not sure it will connect the parameters properly, I will check now

3 years ago

0 Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

Hi, /opt/clearml is ~40Mb, /opt/clearml/data is about ~50gb

3 years ago

0 Hi, I Am Using Clearml With Pytorch-Ignite And Its Earlystopping Handler. I Would Like To Log The Counter Of The Patience Of This Handler, How Can I Do That?

I didn’t use ignite callbacks, for future reference:
` early_stopping_handler = EarlyStopping(...)

def log_patience(_):
clearml_logger.report_scalar("patience", "early_stopping", early_stopping_handler.counter, engine.state.epoch)

engine.add_event_handler(Events.EPOCH_COMPLETED, early_stopping_handler)
engine.add_event_handler(Events.EPOCH_COMPLETED, log_patience) `

4 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Ha nice, good one! Thanks!

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

The main issue is the task_logger.report_scalar() not reporting the scalars

3 years ago

0 Hi, I Would Like To Bring Awareness

and I didn't have this problem before because when cu117 wheels were not available, the agent was trying to get the wheel with the closest cu version and was falling back to 1.11.0+cu115, and this one was working

2 years ago

0 Hi, Coming Back With The Venv Caching: With The Following Setting:

Yes, I guess that's fine then - Thanks!

4 years ago

0 Hey There, Is It Possible For A Clearml Pipeline Step To Log A Folder Instead Of Numpy/Pickle Objects? Looking At The Docs,

I guess I can have a workaround by passing the pipeline controller task id to the last step, so that the last step can download all the artifacts from the controller task.

3 years ago

0 Hello, What Is The Default Limit For Global Context ?

Thanks! (Maybe could be added to the docs ?) 🙂

5 years ago

0 Hi, I Am Trying To Update The Aws_Autoscaler To The Latest Version On The Master Branch. I Simply Changed The Commit Id In The Experiment And Run It, This Gave Me The Following Error:

Indeed, I actually had the old configuration that was not JSON - I converted to json, now works 🙂

4 years ago

0 Hi, Kudos For The 0.15 Guys! I Am Having An Issue Related To Git Auth: I Have An Issue With Trains-Agent (0.15): It Does Not Use Git Creds While Trying To Clone A Private Repo:

both are repos for python modules (experiment one and dependency of the experiment)

5 years ago

0 Hi, I Would Like To Switch From The Elastic-Search Service In The Docker-Compose Of The Clearml-Server To An Externally Managed, Scalable Elastic-Search Cluster. I Have Two Questions:

ha wait, I removed the http:// in the host and it worked 🎉

4 years ago

Show more results