JitteryCoyote63

215 Questions, 1023 Answers

Active since 10 January 2023

Last activity 3 months ago

Reputation

Badges 1

981 × Eureka!

Questions 215
Answers 1023

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hi, I Recently Updated My Clearml To 1.1.2 And A Code That Was Working Before Now Behaves Completely Differently: I Am Using The Following To Log Debug Samples:

Hi, I recently updated my clearml to 1.1.2 and a code that was working before now behaves completely differently: I am using the following to log debug sampl...

clearml

4 years ago

0 Votes

2 Answers

2K Views

0 Votes 2 Answers 2K Views

Hey There

Hey there 🙂 Still my journey to deploy the aws-autoscaler with spot instances, I have another question: I would like to limit the amount of time spent setti...

mlops

4 years ago

0 Votes

1 Answers

2K Views

0 Votes 1 Answers 2K Views

Hey, Just Wanted To Mention: In Docs, Task.Get_Parameter Does Not Say:

Hey, just wanted to mention: in docs, Task.get_parameter does not say: Different sections with key prefix "section/" , as Task.get_parameters do. Also there ...

clearml

5 years ago

0 Votes

5 Answers

2K Views

0 Votes 5 Answers 2K Views

Quick Question: How Can I Clone A Task And Change The Cloned Task Type? I See No Task.Set_Type() Function

Quick question: How can I clone a task and change the cloned task type? I see no Task.set_type() function

clearml

5 years ago

0 Votes

15 Answers

2K Views

0 Votes 15 Answers 2K Views

Hi, I Restarted My Clearml-Server (1.1.0) And The Login Page Always Redirects Me To The Login Page. I Am Using Fixed Users In Config Files. In The Logs Of The Api Server I Can See:

Hi, I restarted my clearml-server (1.1.0) and the login page always redirects me to the login page. I am using fixed users in config files. In the logs of th...

clearml

4 years ago

Show more results

0 Hi, If I Am Starting My Training With The Following Command:

Also, this is maybe a separate issue but could be linked, if I add Task.current_task().get_logger().flush(wait=True) like this:
` def log_loss(engine):
idist.barrier()
device = idist.device()
print("IDIST", device)
from clearml import Task
Task.current_task().get_logger().report_text(f"{device}, FIRED, {engine.state.iteration}, {engine.state.metrics}")
Task.current_task().get_logger().report_scalar("train", "loss", engine.state.metrics["loss"], engine.state.itera...

3 years ago

0 Hello There, I Would Like To Do Run Cleanup Code In Case The User Aborts One Task From The Dashboard (The Agent Is Not Using The Task In Docker). What Signal Should I Listen For In The Task?

The task requires this service, so the task starts it on the machine - Then I want to make sure the service is closed by the task upon completion/failure/abortion

4 years ago

0 Hi, I Just Updated Clearml Server 1.0 Using

under /opt/clearml/config/apiserver.conf

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

Well not really

3 years ago

0 How Can I Do The Following? (Basically, Filtering By Task Type)

I found, the filter actually has to be an iterable:
Task.get_tasks(project_name="my-project", task_name="my-task", task_filter=dict(type=["training"])))

5 years ago

0 Hi, In One Of My Agents With Cuda Version: 11.1 (From Nvidia-Smi), Clearml Agent 0.17.1 Detects Version 100 (I Can See From Experiments Logs:

yes

4 years ago

0 Hi, I Started A Trains-Agent (0.15) In Services Mode (Full Command:

The task with id a445e40b53c5417da1a6489aad616fee is not aborted and is still running

5 years ago

0 Hi, In The Metric Snapshot Section Of The Overview Tab Of A Project Page, Would It Be Possible To:

no it doesn't! 3. They select any point that is an improvement over time

3 years ago

0 Hi, Just Want To Report A Small Bug In The Clearml Dashboard: After Queuing An Experiment, If I Change The Experiment Queue, Then Go Back To The Experiment Info Tab, The Queue Property Still Shows The Previous Queue

no, at least not in clearml-server version 1.1.1-135 • 1.1.1 • 2.14

4 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

AppetizingMouse58 After some thoughts, we decided to install from scratch 0.16, with no data migration, because we believe this was an edge case not worth spending efforts on. Thank you very much for your help there, very appreciated. You guys rock! 🙂

5 years ago

no, I think I could reproduce with multiple queues

4 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

sure, will be happy to debug that 🙂

5 years ago

0 Got Some Errors While Running Migration Script From Es5 To Es7:

here it is

5 years ago

0 Trains-Elastic | {"Type": "Server", "Timestamp": "2020-12-07T15:19:11,101Z", "Level": "Error", "Component": "O.E.B.Elasticsearchuncaughtexceptionhandler", "Cluster.Name": "Trains", "Node.Name": "Trains", "Message": "Uncaught Exception In Thread [Main]",

So the migration from one server to another + adding new accounts with password worked, thanks for your help!

4 years ago

0 Hello, I Have Some Problems With Allegro. I Run A Programm And Then I Saw It On The Trains Server. But Now I Change Something With The Code And I Pushed It Again. Now I Cloned It. But The Old Code Was Executed. How Can I Run The New Code I Pushed?

Make sure the cloned task is in Draft mode, if not, reset it
Then in the Execution tab of th task, in the Source Code section (first one), you can edit the values

4 years ago

0 Hi, I Am Giving Another Try To Clearml-Session And I Am Blocked At The Current Error Shown When The Cli Try To Establish The Tunneling:

AgitatedDove14 Yes with the command you shared I can now ssh again manually to the agent, but I still clearml-agent will raise the same error

3 years ago

0 Hi, I Have An Agent That Is Running Two Experiments At The Same Time: One That Was Running For A Long Time (11H) And One That The Agent Picked Up Afterwards, While The First One Was Still Running. Context: I Have 3 Agents Up (Not In Docker Mode) And All O

So it looks like the agent, from time to time thinks it is not running an experiment

5 years ago

0 Hi There, I Have A Problem With Pyjwt: I Am Using

hooo now I understand, thanks for clarifying AgitatedDove14 !

4 years ago

0 Hello, ~3 Months Ago I Created A Trains-Server In A Machine With 30Gb Of Disk Space. Today I Wasn'T Able To Connect To Trains-Server, So I Checked The Server And Found That The Disk Full. I Ran:

Stopping the server Editing the docker-compose.yml file, adding the logging section to all services Restarting the serverDocker-compose freed 10Go of logs

4 years ago

0 Does Trains 0.16 Supports Pip >=20.2?

Yes, but a minor one. I would need to do more experiments to understand what is going on with pip skipping some packages but reinstalling others.

5 years ago

0 Hi There, I Used

AgitatedDove14 So I copied pasted locally the https://github.com/pytorch-ignite/examples/blob/main/tutorials/intermediate/cifar10-distributed.py from the examples of pytorch-ignite. Then I added a requirements.txt and called clearml-task to run it on one of my agents. I adapted a bit the script (removed python-fire since it’s not yet supported by clearml).

3 years ago

0 Hi There, I Used

and this works. However, without the trick from UnevenDolphin73 , the following won’t work (return None):
if __name__ == "__main__": task = Task.current_task() task.connect(config) run() from clearml import Task Task.init()

3 years ago

0 Hey There, I Would Like To Increase The

So actually I don’t need to play with this limit, I am OK with the default for now

4 years ago

0 Hey There, I Would Like To Increase The

it actually looks like I don’t need such a high number of files opened at the same time

4 years ago

0 Hey There, I Would Like To Increase The

yes please, I think indeed that’s the problen

4 years ago

0 Hi There, I Used

AgitatedDove14 , my “uncommitted changes” ends with
... if __name__ == "__main__": task = clearml.Task.get_task(clearml.config.get_remote_task_id()) task.connect(config) run() from clearml import Task Task.init()

3 years ago

0 Hey There, I Would Like To Increase The

mmmh it fails, but if I connect to the instance and execute ulimit -n , I do see
65535while the tasks I send to this agent fail with:
OSError: [Errno 24] Too many open files: '/root/.commons/images/aserfgh.png'and from the task itself, I run:
import subprocess print(subprocess.check_output("ulimit -n", shell=True))Which gives me in the logs of the task:
b'1024'So nnofiles is still 1024, the default value, but not when I ssh, damn. Maybe rebooting would work

4 years ago

0 Hi There

Could be also related to https://allegroai-trains.slack.com/archives/CTK20V944/p1597928652031300

5 years ago

0 Hi There, I Am Running A Clearml-Agent In Services Mode (With Docker) On A Machine With Two Disks: One With The Os (8Go, 91% Space Used) And One For The Data (100Go, 40% Space Used). When Executing The Auto-Scaler Task In This Agent, I Get The Following E

Worked like a charm 👌

4 years ago

0 Hi There

AgitatedDove14 I cannot confirm at 100%, the context is different (see previous messages) but it could be the same bug behind the scene...

5 years ago

Show more results