JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

27 Answers

2K Views

0 Votes 27 Answers 2K Views

Hi, similar to Task.set_offline(True), is there a way to simulate an execution in an agent? (for testing purposes)

clearml

3 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, A Small Bug (Not Really A Bug) In The Autoscaler: I Have P3.2Xlarge Instances That Take A Long Time To Shutdown. With

Hi, a small bug (not really a bug) in the autoscaler: I have p3.2xlarge instances that take a long time to shutdown. With polling_interval_time_min=1 , the a...

mlops

4 years ago

0 Votes

14 Answers

2K Views

0 Votes 14 Answers 2K Views

Hi, When I Use Task.Get_Logger().Report_Table, I Go The Ui After The Experiment Finishes And I Download The Table (Under Results > Plots), It Gives Me A Json File. How Can I Use It? It Seems To Follow A Structure Specific To Clearml, How Can I For Example

Hi, when I use task.get_logger().report_table, I go the UI after the experiment finishes and I download the table (under RESULTS > PLOTS), it gives me a json...

clearml

4 years ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, Is There A Way To Stop A Clearml-Agent From Within An Experiment? Or Block It To Prevent It Running Any Other Task?

Hi, Is there a way to stop a clearml-agent from within an experiment? Or block it to prevent it running any other task?

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hey Again

Hey again 😁 I am migrating my trains-server to AWS and I would like now to have secure accounts (with password). But I don't want to loose the current users...

clearml

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi, I Have A Question Regarding The Aws-Autoscaler: Am I Understanding Correctly That:

Hi, I have a question regarding the aws-autoscaler: am I understanding correctly that: max_idle_time_min=5 max_spin_up_time_min=10 polling_interval_time_min=...

mlops

4 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hi Guys, With The New Venv Caching Available In Clearml, I Have The Following Problem: I Force My Pip Requirements To Be:

Hi guys, with the new venv caching available in clearml, I have the following problem: I force my pip requirements to be: torch==1.7.1 pytorch-ignite clearml...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, Is It Possible To Get An Artifact From A Task And Force Not Using Local Cache? The Task Itself Updated The Artifact In The Meantime And I Cannot Get The Latest Version Of The Artifact. I Saw That

Hi, is it possible to get an artifact from a Task and force not using local cache? The task itself updated the artifact in the meantime and I cannot get the ...

clearml

4 years ago

0 Votes

9 Answers

2K Views

0 Votes 9 Answers 2K Views

Another Strange Behavior Of The Python Sdk Cli: After Executing Python My_Task.Py, Where My_Task.Py Creates And Send To The Queue An Experiment, The Command Returns But After Some Time Some Messages Are Printed In The Console, Such As

Another strange behavior of the python SDK CLI: after executing python my_task.py, where my_task.py creates and send to the queue an experiment, the command ...

clearml

4 years ago

0 Votes

13 Answers

2K Views

0 Votes 13 Answers 2K Views

Hi, I Update Recently To Clearml-Server 1.2 (Self Hosted), Great Job! I Am Seeing The Popup Asking For S3 Creds Often When Navigating In Debug Samples. I Set Them Multiple Times Under Settings > Configuration > Web App Cloud Access, But For Some Reason It

Hi, I update recently to clearml-server 1.2 (self hosted), great job! I am seeing the popup asking for s3 creds often when navigating in debug samples. I set...

clearml

3 years ago

0 Votes

10 Answers

2K Views

0 Votes 10 Answers 2K Views

Hey, What Is The Exact Difference Between

Hey, what is the exact difference between agent.package_manager.system_site_packages and trains-agent --install-globally ?

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi There, I Just Updated Clearml-Server To 1.8.0 And I See The Following But In The Comparison Of Scalars: All The Graphs Are Compressed To The Left When The Experiment Name Is Too Long In The Legend. I Will Now Try In 1.7.0 (It Was Not The Case In 1.6.0)

Hi there, I just updated clearml-server to 1.8.0 and I see the following but in the comparison of Scalars: All the graphs are compressed to the left when the...

clearml

2 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, In The Aws Autoscaler, I Am Getting The Following Warning:

Hi, in the AWS AutoScaler, I am getting the following warning: Warning! exception occurred: APIError: code 400/1004: Worker is not registered: worker=aws:A10...

clearml

4 years ago

0 Votes

12 Answers

2K Views

0 Votes 12 Answers 2K Views

Hey, Would It Possible To Add An Option To Make

Hey, would it possible to add an option to make task.upload_artifact() blocking? (Not running in background)

clearml

5 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Small Error In Doc:

Small error in doc: https://allegro.ai/docs/references/trains_agent_ref/#daemon The detach parameter is shown in the command as --detached while it is listed...

clearml

5 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, I Have A Configuration File That I Read And Connect To My Training Tasks. I Cannot Use

Hi, I have a configuration file that I read and connect to my training tasks. I cannot use config = task.get_parameters_as_dict()["General"]["param"]["nested...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi All, I Updated From Clearml-Server 1.14.1 To 1.15.0 And I Am Getting The Following Error While Trying To Start The Server After Running Docker-Compose Pull:

Hi all, I updated from clearml-server 1.14.1 to 1.15.0 and I am getting the following error while trying to start the server after running docker-compose pul...

clearml

one year ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hello There! I Have A Question Regarding The Web Ui, On The Project Page: I Have The Following Use Case: I Need To Add Two Custom Columns, Each Reporting One Metric. Currently, This Shows Me The Best (Min/Max) Values Reached By The Model, But Not Necessar

Hello there! I have a question regarding the Web UI, on the project page: I have the following use case: I need to add two custom columns, each reporting one...

clearml

3 years ago

0 Votes

30 Answers

2K Views

0 Votes 30 Answers 2K Views

Hello, I Tried The Clearml-Session Cli To Start A Jupyter Instance On An Agent, But An Error With The Password, Here Is The Full Cli Log:

Hello, I tried the clearml-session CLI to start a jupyter instance on an agent, but an error with the password, here is the full CLI log: $ clearml-session -...

aws remote-ssh

4 years ago

0 Votes

3 Answers

2K Views

0 Votes 3 Answers 2K Views

Hi, In The Clearml-Server Web-Ui, Under Debug Sample, Would It Be Possible To Improve The Logic For Fetching The Images? If I Have Say 200 Iteration, It Will The Last By Default. If I Want To See Iteration 50, I Will Have To Manually Click On The Arrow Un

Hi, in the clearml-server web-ui, under DEBUG SAMPLE, would it be possible to improve the logic for fetching the images? If I have say 200 iteration, it will...

clearml

2 years ago

0 Votes

7 Answers

2K Views

0 Votes 7 Answers 2K Views

Hi, I Am Currently Using

Hi, I am currently using CLEARML_AGENT_GIT_USER and CLEARML_AGENT_GIT_PASS when starting my clearml-agent and I would like to switch to using a single auth t...

clearml

3 years ago

0 Votes

4 Answers

2K Views

0 Votes 4 Answers 2K Views

Hi There, I Think There Is A Bug With Clearml Sdk V0.17.5Rc2: When Running A Task Locally, The Dashboard Doesnt Not Shows The Task As Finished Once The Task Is Finished

Hi there, I think there is a bug with clearml sdk v0.17.5rc2: when running a task locally, the dashboard doesnt not shows the task as finished once the task ...

clearml

4 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Hi, I Would Like To Report Something Else Weird In The Clearml-Agent 1.5.1 Running In Docker Mode: In The Logs, When It Dumps Its Config, It Writes:

Hi, I would like to report something else weird in the clearml-agent 1.5.1 running in docker mode: In the logs, when it dumps its config, it writes: docker_c...

clearml

2 years ago

0 Votes

23 Answers

2K Views

0 Votes 23 Answers 2K Views

Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

Hi, I started a trains-agent (0.15) in services mode (full command: trains-agent daemon --services-mode --detached --queue services --create-queue --docker u...

mlops

5 years ago

0 Votes

2 Answers

1K Views

0 Votes 2 Answers 1K Views

Hi All, How Can I Have A Global Variable Used In A Pipeline Step? I Have To Define Them In Each Pipeline Step, Otherwise They Are Not Included In The Pipeline Step

Hi all, how can I have a global variable used in a pipeline step? I have to define them in each pipeline step, otherwise they are not included in the pipelin...

clearml

one year ago

0 Votes

6 Answers

2K Views

0 Votes 6 Answers 2K Views

Hi, I Am Using The Aws Autoscaler And Getting The Following Error While Trying To Spin Up Spot Instances:

Hi, I am using the aws autoscaler and getting the following error while trying to spin up spot instances: 2021-08-16 17:18:48 Spinning new instance type=v100...

aws mlops

4 years ago

0 Votes

26 Answers

2K Views

0 Votes 26 Answers 2K Views

Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

Hi, I attached an IAM role to an ec2 instance to grant access to an s3 bucket. The ec2 instance is running a clearml-agent (v1.1.0). I didn’t specify any key...

aws

4 years ago

0 Votes

11 Answers

2K Views

0 Votes 11 Answers 2K Views

Hi Guys, Following Up On This

Hi guys, following up on this https://allegroai-trains.slack.com/archives/CTK20V944/p1599135173096200?thread_ts=1599125260.076600&cid=CTK20V944 : I have a pi...

clearml

5 years ago

0 Votes

26 Answers

2K Views

0 Votes 26 Answers 2K Views

Hi, I Would Like To Follow-Up In This

Hi, I would like to follow-up in this https://clearml.slack.com/archives/CTK20V944/p1646123127790389 happening on clearml server 1.2.0 (self hosted on a sing...

aws mlops

3 years ago

0 Votes

18 Answers

2K Views

0 Votes 18 Answers 2K Views

Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

Hello there, I would like to do run cleanup code in case the user aborts one task from the dashboard (the agent is not using the task in docker). What signal...

mlops

5 years ago

Show more results

0 Hi, I Have Another Bug To Report For Clearml-Server 1.2 (Self Hosted) In The Console Logs Of An Experiments, I Cannot See The Latest Logs. Eg My Experiment Is Done, But I Can Only See The Logs Of To The Installation Of The Packages. If I Download The Log

Hi CostlyOstrich36 , this weekend I took a look at the diffs with the previous version ( https://github.com/allegroai/clearml-server/compare/1.1.1...1.2.0# ) and I saw several changes related to the scrolling/logging:
apiserver/bll/event/ http://log_events_iterator.py apiserver/bll/event/ http://events_iterator.py apiserver/config/default/services/_mongo.conf apiserver/database/model/ http://base.py apiserver/services/ http://events.pyI suspect that one of these changes might be responsible ...

3 years ago

0 Hi

Awesome! (Broken link in migration guide, step 3: https://allegro.ai/docs/deploying_trains/trains_server_es7_migration/ )

5 years ago

No, they have different names - I will try to update both agents to the latest versions

3 years ago

I think it comes from the web UI of the version 1.2.0 of clearml-server, because I didn’t change anything else

3 years ago

Hi CostlyOstrich36 , one more observation: it looks like when I don’t open the experiment in the webUI before it is finished, then I get all the logs correctly. It is when I open the experiment in the webUI while it is running that I don’t see all the logs.
So it looks like there is an effect of caching (the logs are retrieved only once, when I open the experiment for the first time), and not afterwards (or rarely). Is that possible?

3 years ago

0 I Am Wondering Is It Possible To Schedule A Task To Run At Certain Time In Periodic Fashion Aka. Cron Style... Thinking Of Having A Monitoring Task To Be Run Routinely ... I Could Use A Cron On One Of The Server But Prefer To Run It On Trains As Then I Am

I don't think there is an example for this use case in the repo currently, but the code should be fairly simple (below is a rough draft of what it could look like)
` controller_task = Task.init(...)
controller_task.execute_remotely(queue_name="services", clone=False, exit_process=True)

while True:
periodic_task = Task.clone(template_task_id)
# Change parameters of {periodic_task} if necessary
Task.enqueue(periodic_task, queue="default")
time.sleep(TRIGGER_TASK_INTERVAL_SECS) `

5 years ago

0 Hi, I Face A Strange Behavior From The Clearml-Agent: It’S Running In Services Mode, Not In Docker Mode, Cpu Only. I Want To Execute Two Tasks On This Service Agent. One Works, The Other Always Fails After Being Enqueued And Picked By The Agent With The E

yes exactly

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

I opened an https://github.com/pytorch/ignite/issues/2343 in ignite’s repo and a https://github.com/pytorch/ignite/pull/2344 , could you please have a look? There might be a bug in clearml Task.init in distributed envs

3 years ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Thanks SuccessfulKoala55 !
Maybe you could add to your docker-compose file an option for limiting the size of the logs, since there is no limit by default, their size will grow for ever, which doesn't sound ideal https://docs.docker.com/compose/compose-file/#logging

4 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

There is no way to filter on long types? I can’t believe it

4 years ago

0 Hello, I Would Like To Use Spot Instances Together With The Aws Autoscaler To Train Models With Pytorch/Ignite And I Am Wondering How To Support Interruptions During The Training (In Case The Instance Is Terminated By Aws). Is There Anything Already Built

Mmmh unfortunately not easily… I will try to debug deeper today, is there a way to resume a task from code to debug locally?
Something like replacing Task.init with Task.get_task so that Task.current_task is the same task as the output of Task.get_task

4 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Thanks a lot, I will play with that!

4 years ago

0 Hi, One More Question: When Creating A Task With Task.Init(), We Can Specify The

I still don't see why you would change the type of the cloned Task, I'm assuming the original Task had the correct type, no?

Because it is easier for me that I create a training task out of the controller task by cloning it (so that parameters are prefilled and I can set the parent task id)

5 years ago

0 Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

So the migration from one server to another + adding new accounts with password worked, thanks for your help!

4 years ago

0 Hi, How Does

There was no possible cache, the agent was running on a new ec2 instance

2 years ago

0 Hi, I Attached An Iam Role To An Ec2 Instance To Grant Access To An S3 Bucket. The Ec2 Instance Is Running A Clearml-Agent (V1.1.0). I Didn’T Specify Any Key/Secret For Clearml. The Tasks Fail With The Following Error:

I am doing so

4 years ago

0 Hello, Is It Possible For The Clearml-Agent In Docker Mode To Not Pull A Specific Docker Image, But To Build One From The Experiment Repository Using The Dockerfile And .Dockerignore Of The Experiment Repository?

I don’t have a registry to push my image to.I think I can get around it actually - Will it work if I just build the image locally once, then start the agent? Docker would recognise that image locally and just use it right? I won’t need to update that image often anyway

3 years ago

it worked for the other folder, so I assume yes --> I archived the /opt/trains/data/mongo, sent the archive via scp, unarchived, updated the rights and now it works

4 years ago

but most likely I need to update the perms of /data as well

4 years ago

You are right, thanks! I was trying to move /opt/trains/data to an external disk, mounted at /data

4 years ago

Ok, now I would like to copy from one machine to another via scp, so I copied the whole /opt/trains/data folder, but I got the following errors:

4 years ago

0 Hi There, I Used

AgitatedDove14 So I copied pasted locally the https://github.com/pytorch-ignite/examples/blob/main/tutorials/intermediate/cifar10-distributed.py from the examples of pytorch-ignite. Then I added a requirements.txt and called clearml-task to run it on one of my agents. I adapted a bit the script (removed python-fire since it’s not yet supported by clearml).

3 years ago

So I created a symlink in /opt/train/data -> /data

4 years ago

Ho nice, thanks for pointing this out!

3 years ago

Yes, but I am not certain how: I just deleted the /data folder and restarted the server

4 years ago

0 Hi Again, My Clearml Api-Server Is Having A Memory Leak. Each Time I Restart It, Its Ram Consumption Grows Until Getting Oom, Is Not Killed And Make The Ec2 Instance Crash

Here I have to do it for each task, is there a way to do it for all tasks at once?

4 years ago

0 Hi There

Here is the minimal reproducable example.
Run test_task_a.py - It will register a dummy artifact, create a new task, set a parameter in that task and enqueue it test_task_b will try to retrieve parameter from parent task and fail

5 years ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Ok, after:

4 years ago

0 Hi, If I Am Starting My Training With The Following Command:

I fixed, will push a fix in pytorch-ignite 🙂

3 years ago

0 Hi, I Want To Upgrade Clearml Server From 1.1 To 1.2 (Self Hosted). I Have The Following Setup:

--- /data ---------- 48.4 GiB [##########] /elastic_7 1.8 GiB [ ] /shared 879.1 MiB [ ] /fileserver . 163.5 MiB [ ] /clearml_cache . 38.6 MiB [ ] /mongo 8.0 KiB [ ] /redis

3 years ago

Show more results